While the reservation system was down, Delta was not able to sell tickets, even on flights that were about to depart with empty seats. When an unavailable computer system is the only repository of operational data, very little can be accomplished manually, and the enterprise (along with its revenue) grinds to a halt.
- Architectural weaknesses can cause additional problems during an outage. Most large carriers still operate their reservation systems based on Transaction Processing Facility (TPF), a specialized IBM operating system. Originally created in the 1960s and still supported by IBM, TPF can handle tens of thousands of transactions per second. Although it is highly reliable, it takes time to master and is not well understood beyond the airlines, some hotels and a few credit card processors.
Delta’s passenger service system, Deltamatic, handles ticketing, reservations, standby-lists and other passenger-centric functions. The 52-year-old system is closely integrated with TPF. According to Bob Edwards, United Airlines’ former CIO, many airlines continue to operate TPF and other older systems, but have developed modern user interfaces to make things easier for agents. This allows the older systems to continue to assign seats, change reservations and perform other functions for passengers without forcing agents to memorize hundreds of two-to-four-letter codes.
Unfortunately, when the Delta reservation system failed, TPF and Deltamatic failed to synchronize properly with the newer user interface. Employees were forced to use Deltamatic directly for multiple hours until resynchronization was complete.
- Funding must be allocated for DRP and BCP. Unfortunately, recovery plans are expensive, and they create no new services that either increase revenue or cut costs. This often makes it difficult to get adequate funding, especially during times of financial hardship. According to Scott Nason, American Airlines’ former CIO, during the years when the airlines were close to bankruptcy, most carriers demanded that any investments have short paybacks. Understandable, but unfortunate. Frankly, executive management in most enterprises expects these plans will be in place, but (similar to insurance) doesn’t want to pay for them until they are needed. This is always too late to be effective.
- Recovery plans should continue to evolve. As business and IT systems become ever more complex, it becomes increasingly difficult to conceive of every possible problem and to test all potential scenarios thoroughly. However, without careful planning and testing, timely recovery from an outage is virtually impossible. Since both business and IT environments are constantly changing, the enterprise’s recovery plans must be funded, updated, and tested on a regular basis.
Low-probability, high-impact events can happen to any enterprise. And they do. So don’t fall prey to the platitude, “It can’t happen here.” The more prepared you are, the less your business will suffer.
Sign up for CIO Asia eNewsletters.