Archive for July, 2009

The secret to datacenter cooling…

Thursday, July 16th, 2009

At the Quonix Data Center in Philadelphia, PA people are always shocked by how cool it is in our facility. A stable 72 degrees, year round. How do we do it? Whats the secret?

Well, its simple. 3.4 BTUs per Watt equals sensible AC tonage based on power load. Just add a volumetric factor to account for the cubic volume of the space, I use approximately 1 ton of AC per 6000 cubic feet. So a 1500sqft data center with 12 foot high ceiling, needs about 3 tons of AC just for the size of the space alone without power load.

A data center is sized on the incoming power rating. So if your facility is built with 400amp 480v 3ph service, your 80% max power draw load will require the following Air Conditioning capacity:

480v * 400amp * 1.73 * 80% = 266KW or 266,000 Watts

266,000 Watts * 3.4 = 904,400 BTU

904,400 BTU / 12,000 BTU per ton = 75 tons of AC

If its so simple, why do so many datacenters get it wrong?

Well, mostly because the AC is not built for full utilization. Alot of customers may only use 50% of their power feed. Over time, an operator will get used to this level of under utilization, and scale back the AC infrastructure to save money.

The other possible cause is improper delivery of cooling. Remember, air conditioning is not about adding cold air into an environment, rather its about removing heat from the air – heat rejection is a better way to think of it. To optimize heat rejection, you want your CRAC system to pull in the warmest air possible. This is why at Quonix we use a raised floor plenum with downflow cooling. As a result, incoming return air to our CRAC units is normally 78 degrees, with supply air exiting at 62 degrees. Thats a 16 degree drop across the evaporator coil – very efficient. This is why raised floor downflow configurations are so optimal. Data centers that dont have a raised floor plenum end up implementing open floor cooling.

What are the problems to open floor cooling?

Open floor cooling is a term generally applied to data centers that dont have any duct work, or have minimal supply duct work. Instead, large CRAC units are placed on the floor around the perimeter of a data center. The supply plenum is on top, and the return grilles are on the bottom. As a result, there is a high rate of air re-circulation, where some of that cooled air is pulled back into the unit. This yields a lower degree drop over the coil, and prevents you from getting the full cooling power out of your AC unit.

The ironic thing is a raised floor installation, though an initial added cost, will save you money over time. In addition to eliminating the need for expensive duct work, it makes power and network cabling much easier and much more affordable. The dotcom fallouts of years past created a large amount of high quality used floor tiles. Most raised floor installations can be done for around $8 to $10 per square foot. I high recommend Access Computer Floors of New Jersey.

Authorize.net Outage and Fisher Plaza Fire – Lessons Learned

Wednesday, July 8th, 2009

I love when somebody else has a disaster. It serves as a great example of how not to do things, and furthermore, it debunks the myth that big providers are always better then smaller regional providers.

On July 1st, 2009 there was an electrical fire at a datacenter in Seattle, WA operated by Fisher Communications. The electrical fire took out the incoming power feeds, and the generators were not allowed to run because they were too close to the fire. End result, the entire facility was dark for quite some time. This facility was also the home of Authorize.Net and their primary credit card processing portal. As a result Authorize.Net transactions were significantly hampered for almost 3 days.

Why did Authorize.Net’s DR fail?

Simple. People. Authorize.net has a full DR site in San Jose, CA but it took 3 days to migrate and get back online. Why? Well, according to their official announcement they blamed it on the coincidence of many factors and what they called the perfect storm. The problems happened over the July 4th holiday week and weekend, so none of their engineers were around, and the few that were around on-call were not skilled enough to handle the event. Clearly, their DR plan existed in paper form only. You dont get exemptions or a pass because it was a holiday.

Lets not let Fisher of the hook…

I am amazed when I hear about electrical fires in datacenters. For starters, if you have a properly designed FM200 or Halon system, the fire should be killed in minutes. The fact that the fire department had to come out and help put out the fire only hammers home the internal issues of fire control systems. Its ironic too, just last week I wrote an article about why its important to be in a datacenter that follows 80% electrical rating limits. If you recall, the key problem of going above 80% usage is an increased risk of… you guessed it…. electrical fires. I would not be surprised if Fisher was pushing 90% load through their core load panels. And the lack of transparency is a joke. A week after the fire, and there is no posted press release on their website. The only reason why I know about it is because we use Authorize.net for credit card processing, and we received official outage summaries. I’d hate to be a customer at a Fisher datacenter trying to get a credit for the outage.

Big business mentality, and the economy…

Its a known fact that I am very anti big business. I hate when people assume that they get better service and redundancy from a large telecommunications firm, as compared to a smaller regional firm. In many environments, the smaller firms have better track records of performace and better quality control. Its simple. Its easier to manage a small environment than a big environment. Big companies are also more directly effected by economic issues. Some large telecomms run in the red for years, and never turn a profit. This stress causes everyone to pass the buck, cut on quality staffing, cut on preventive maintenance contracts, or turn a blind eye and let the blame fall elsewhere.

At Quonix, a small group of dedicated professionals maintain operations. Nobody can slack off and pass the buck. It doesn’t hurt that we’re profitable too!

Why is Spam getting worse?

Tuesday, July 7th, 2009

I am not a doctor, but Spam is like diabetes, its a progressive illness. The longer you have it, the worse it gets. This may sound anecdotal, but its true. At Quonix, we have been filtering email for people since 2000. Some of customers have great results and they get very few junk emails to their inbox, yet others still get 20-30 junk emails post filter.

Why is there such a large variance in spam filtering performance?

The amount of time your business has lived on the internet directly effects the total amount of junk email targeted towards you. We have customers that have been online using the same emails for over 10 years, and we have some customers that just came online last year. Those 10 year customers will always have a large amount of spam slipping through the filter. Even if we block 99%, because of the age of their “progressive” illness, they are exposed to more daily email prior to filtering? We have some users that get 1000 emails a day, even with 99% effectiveness, they will still get 10 junk emails to their inbox.

Alas, there is a cure!

Its sounds obvious, but most fixes are. Change your email address every few years and never publish it. Its not that bad, trust me. For customers that have chronic spam issues, we recommend they re-issue all users new email addresses. The old email addresses we leave active, but we have them return an aut0-response that simply states that this address has been modified and you need to contact us at the number below to obtain the new address. After a few months, the old addresses are deactivated. The new address schema is now “clean” and will remain clean for a few years.

Switching to another filtering product will not solve the problem, it will only waste time and money. Bite the bullet, and change your addresses! It will be well worth the initial headache.

Why are breaker ratings so important? Should I care if my datacenter does not follow them?

Wednesday, July 1st, 2009

Those who use datacenter services for their IT operations know much about OS technologies, databases, and server hardware, but unfortunately, many IT professionals are not familiar with safe power practices and the ramifications if they are not followed.

The most common area of confusion is the 80% breaker rating limit. Some datacenters will only allow you to run continuos duty equipment up to 80% of the breakers amp rating. For example, on a 20amp 120V breaker power feed, that means your limited to 16amps of continuous power draw. These datacenters are following national fire and electrical code standards, and they are operating in a safe environment.

The sad thing is many datacenters do not enforce the 80% rule, and instead allows their users to run all the way to 100% breaker rating of continous draw. In addition to running an unsafe environment, these datacenters further propagate the misnomer of how to effectively run your power environment. When you allow someone to do something that is incorrect, that person assumes it is correct since you let them do it. Time and time again we have seen new customers come into our facility and say, “my previous provider allowed us to do…”. We then have to break down that myth and explain that what they were doing was unsafe and a code violation.

Why is it dangerous to exceed 80% breaker rating?

Lets start with the obvious. First, your more likely to trip your breaker. Second, your more likely to degrade the lifetime of the breaker causing it to fail or false trip. And more importantly, you run the risk of an electrical fire. NEC standards for wire sizing are based on 80% limits as well, so if you run 100% breaker rating power through those wires 24×7, the wires will overheat. Over time, the excess heat can cause the jacket to degrade and eventually fail, causing a bare wire scenario which could trigger an electrical fire.

Additional concerns involve the slow degradation of your power system as a whole, including your UPS, invertors, and transformers. All electrical equipment is designed by the manufacturerto not exceed 80% max rating for more then 3 hours. If you do exceed the limit, you violate the warranty and run the gear in a fashion that it was not designed for.

Why would a datacenter violate the 80% power rule?

There are many reasons. One is money. By allowing customers to pull a few extra amps, they can get a bit more business since their offering will appear cheaper. Another reason is they lack the ability to monitor power usage. A more serious reason, though, is there is a lack of communication between sales people, customer service, and operations. The operations people may know its wrong, but the sales people dont, and they allow it to occur. Its not caught due to the communication breakdown.

What does this all mean?

Simple. Stay away from datacenters that dont enforce 80% breaker limits when looking for a colocation provider. If they dont, they run an unsafe environment. By itself, it may not seem like a big deal, but that environment usually will fail to meet other criteria of a quality provider. In some ways, the handling of power infrastructure is a litmus test for quality datacenter operations as a whole. If the operations group of a datacenter provider has a very good handling of power, more often then not, they have a good handle on everything else!