Delta Airlines Outage Highlights Importance of Mission Critical Design
- Aug 10, 2015
A reported “power surge” knocked out the Delta Airlines reservation and scheduling system, resulting in cancellation of nearly 2,000 flights and massive problems for travelers. It will take several more days of travel disruption to get planes and crews where they belong and return the system to normal. According to Delta, the power surge knocked out a transformer and backup systems failed. This recalls the NSA Data Center fiasco.
The biggest fallacy in the Data Center industry is that redundancy equals protection. Nothing could be further from the truth. Operations often make massive investments in redundant power and cooling systems that still leave them exposed.
“Mission Critical” design is a specialty. Redundant equipment is only a part of it. What makes a “redundant” design truly “Mission Critical” is how that equipment is configured, installed and tested. It’s very different for the IT industry than for an office building. If the backup fails in a conventional building electrical system, it’s an inconvenience until power is restored. Lights come back on and PC’s generally come back to life, at worst with the loss of work that wasn’t saved. An air conditioning failure is uncomfortable, but when it’s restored the building cools down again.
Not so in a Data Center. When back-ups fail there, not only is business disrupted until the cause is corrected, but it can take days to restore servers and networks to full operation. As the Delta outage demonstrates, the ripple effect can be even more devastating.
True “Mission Critical” design takes time, thorough analysis and experience. No one set of eyes can catch all the flaws. Specialized expertise and “peer review” are an absolute necessity. SM&W published an article on Avoiding Single Point of Failure Design Flaws. Every example is from a real project in which our analysis caught the flaws before they found their way into construction and caused a disaster. As Delta and many others have learned the hard way, the costs of doing it right the first time pale compared with the business losses from an outage.