The other day, I was wading through my unreasonably active email and decided to look at a subfolder I haven't checked in maybe a year. It's populated with inbound email matching certain parameters I generally don't care much about -- not spam, but system messages back to postmaster from a busy mail server. I suppose I should have been more conscientious about checking this folder, but generally speaking, most of the stuff coming into this mailbox is spurious.
Lo and behold, I discovered more than 20,000 emails, the vast majority of which were returns from a cronjob that someone else had implemented years ago. This cronjob was now failing, and the report the cronjob created couldn't be delivered because the recipient domain no longer existed, and the mailer error came back to me, the postmaster.
The fact of the matter is that in any infrastructure of reasonable size and age, there are many examples of code such as this that was originally written to fulfill a need, then abandoned or forgotten. These zombie scripts will continue to run, performing tasks that now make little or no sense. In some cases, they can be extremely problematic when circumstances change. Though they're simple to fix (just ax the cronjob), they can be difficult to uncover.
Obviously, the best way to control these things is to provide full and careful documentation, but even if complete documentation is available, that doesn't mean it's checked when infrastructure changes are made. While a cronjob that synchronizes data from one server to another might well be documented somewhere, when the source server is replaced, that script may still run in cron, trying in vain to connect to a mothballed server and do its job, for all eternity. In most cases, nobody will ever notice unless and until the script becomes a problem.
There are many cases where zombies can become big problems. One I discovered during a particularly interesting troubleshooting session was a cronjob that caused a server to assign a secondary IP address to an interface, conduct business with that source IP, then remove the secondary. This was ostensibly written for security purposes, as the script used a mapping through the firewall to the secondary address. As such, when its job was done, removing the secondary took out a valid target IP, preventing traffic coming through the firewall from reaching an actual server.
Naturally, even if that cronjob was documented, nobody would have pored through the documentation to make sure the secondary IP used by this server for 30 minutes a day was actually engaged. Thus, after the framework requiring this component was disassembled, the script had no more purpose, but continued to assign that IP address every night. Of course, that IP address was later assigned to a production server, causing intermittent outages that couldn't easily be explained -- at least, not until we wrote a small script to capture the MAC address of that IP throughout the day. Then we were able to identify the server that was magically assuming the IP address at certain intervals.
Sign up for CIO Asia eNewsletters.