I love when somebody else has a disaster. It serves as a great example of how not to do things, and furthermore, it debunks the myth that big providers are always better then smaller regional providers.
On July 1st, 2009 there was an electrical fire at a datacenter in Seattle, WA operated by Fisher Communications. The electrical fire took out the incoming power feeds, and the generators were not allowed to run because they were too close to the fire. End result, the entire facility was dark for quite some time. This facility was also the home of Authorize.Net and their primary credit card processing portal. As a result Authorize.Net transactions were significantly hampered for almost 3 days.
Why did Authorize.Net’s DR fail?
Simple. People. Authorize.net has a full DR site in San Jose, CA but it took 3 days to migrate and get back online. Why? Well, according to their official announcement they blamed it on the coincidence of many factors and what they called the perfect storm. The problems happened over the July 4th holiday week and weekend, so none of their engineers were around, and the few that were around on-call were not skilled enough to handle the event. Clearly, their DR plan existed in paper form only. You dont get exemptions or a pass because it was a holiday.
Lets not let Fisher of the hook…
I am amazed when I hear about electrical fires in datacenters. For starters, if you have a properly designed FM200 or Halon system, the fire should be killed in minutes. The fact that the fire department had to come out and help put out the fire only hammers home the internal issues of fire control systems. Its ironic too, just last week I wrote an article about why its important to be in a datacenter that follows 80% electrical rating limits. If you recall, the key problem of going above 80% usage is an increased risk of… you guessed it…. electrical fires. I would not be surprised if Fisher was pushing 90% load through their core load panels. And the lack of transparency is a joke. A week after the fire, and there is no posted press release on their website. The only reason why I know about it is because we use Authorize.net for credit card processing, and we received official outage summaries. I’d hate to be a customer at a Fisher datacenter trying to get a credit for the outage.
Big business mentality, and the economy…
Its a known fact that I am very anti big business. I hate when people assume that they get better service and redundancy from a large telecommunications firm, as compared to a smaller regional firm. In many environments, the smaller firms have better track records of performace and better quality control. Its simple. Its easier to manage a small environment than a big environment. Big companies are also more directly effected by economic issues. Some large telecomms run in the red for years, and never turn a profit. This stress causes everyone to pass the buck, cut on quality staffing, cut on preventive maintenance contracts, or turn a blind eye and let the blame fall elsewhere.
At Quonix, a small group of dedicated professionals maintain operations. Nobody can slack off and pass the buck. It doesn’t hurt that we’re profitable too!