Archive for the ‘Main’ Category

Why did my Data Center UPS Fail?

Friday, October 7th, 2011

I hear this all the time. Most people move out of a datacenter because something bad happened, and its usually a major power failure that causes the most trouble. In this article, I am going to outline and analyze a power failure event that occurred at an unnamed facility. This is a true story.

About 2 years ago I fielded a call from someone who lost power at their current data center provider. In addition to being down, they also had some equipment failures (power supplies and some RAM went bad in a few systems). Their provider told them that nothing was wrong with the UPS, rather, it was an issue with the utility caused by a brown out. As soon as I heard this, I told the person that this explanation was completely bogus.

Lets recap the cardinal rules of a good UPS:

1. An online UPS setups should always provide clean line power regardless of supply.

2. If an online UPS fails, an auto-sync transformer bridges line power and utility within 1 Hz and no power is lost, only backup capability is lost.

And lets recap what you need to do in order to make sure the above rules always apply:

1. Check your batteries every 3 months.

2. Replace a battery as soon as its internal resistance rises by 10%

3. Replace a battery as soon as its 4 years old, even if its internal resistance is still within spec.

4. Provide suitable cooling to the UPS.

5. CHECK THE BATTERIES.

I cant stress enough how important batteries are. The entire UPS is built around the concept of having working batteries. Almost every line-effecting outage of a UPS is due to a battery problem. At Quonix, we use Liebert Series 300 UPS systems that have had inverter boards fail, induction coils burn out, and input filter short out, and we NEVER lost output line power. That’s why the Liebert’s cost so much, they are designed to handle failures, but it requires good batteries.

Getting back to the story about the brown out. Any UPS that experiences a brown out or any kind of dirty power, would immediately engage batteries in order to provide clean power while it activates the GENSET cut-over. This requires the UPS to run on batteries for 5-7 seconds. If the batteries cant hold, the UPS will drop offline into bypass mode and auto-sync to utility line power. Once a UPS goes into bypass and syncs to utility power it no longer provides power protection or line conditioning. So all the dirty power goes straight through. If power was lost, GENSET power now comes straight through. And when utility power returns, the GENSET cuts out causing another small blip. This is why the server power supplies and RAM went bad. The dirty, and possibly surging power came right through the UPS into the rack cabinet.

Many providers dont properly maintain their batteries. They just assume the batteries will last 4-5 years. Not the case. I’ve seen brand new battery cabinets have 1 battery go bad after as little as 1 year. Sometimes its just a random manufacturer defect. And in many cases, all it takes is 1 bad battery to foul the entire array.

Want to be sure if your provider is on top of things? Easy, just ask for a copy of their UPS and battery preventative maintenance contract. If they have one, and they should, it should be easy to fax or email you a copy. You can even request a battery report. At Quonix, the vendor we use for our battery maintenance sends us a detailed graphical report with the health of each battery – voltage, impedance, internal resistance, temperature, and age.

Repairing Tate Access Floor Tiles

Thursday, October 6th, 2011

How to repair floor tiles?

For this article I am referring to the newer style of Tate access floor tiles. The newer style has a single piece of laminant that runs from edge to edge. The older style tiles had the laminant end about a quarter inch short of the edge with the remaining space filled with a black edging strip that frequently snapped off.

The new style is great, but over time the laminant will start to pull away, especially in data centers with low humidity. Its simple to repair.

The laminant is held in place by contact glue, similar to a kitchen countertop. Contact glue can be loosened and re-hardened with heat.

To reattach your Tate laminant get a standard clothing Iron – the kind with a non-stick bottom. Set the iron temperature to medium and turn off the steam. Obviously, do this repair work outside the datacenter. Place the iron on the tiles laminant surface and slowing move it around. The laminant surface needs to be heated for at least 2 minutes. After properly heated use a surface roller to apply even pressure over the top of the laminant and press it down hard to the tiles underlying metal frame. Continue to use the roller until the surface has cooled down. At this point your laminant will be 100% re-attached.

 

Why does my datacenter feel warm?

Monday, November 8th, 2010

As soon as November approaches I start hearing this question more and more. Someone will come in from the outside and walk into the datacenter and immediately say, “It feels warm in here, is something wrong?”

The answer is NO, nothing is wrong.

When you drive home from work, and get of your car and walk 10 yards to your front door (in 30 degree F weather), as soon as you get inside, the first thing that probably goes through your head is… “Ahh… Its nice and warm inside”. And that is with your house thermostate set to 70 degrees F.

A good datacenter will hold a solid 72 degrees 35% relative humidity year round. So yes, in the winter months, when you come in from the outside cold (30-40 degrees F) and walk into a 72 degree F room that happens to be a datacenter you should feel warm. But warm is a relative sensation by our bodies. When its 80 degrees F outside and you walk into the same datacenter, the first thing you say is… “Ahh… Its nice and cold in here”.

The datacenter is always 72 degrees F but in the winter that feels comfortably warm to our bodies after we were just exposed to 35 degree F outside temperatures. So relax, its not “warm” in the datacenter.

To ping or not to ping… that is the question!

Friday, May 21st, 2010

The other day I had a customer call and complain about high ping latency between our router and his server. I asked, what are you pinging? The default gateway he replied. Well, there’s your problem. Ping one of our servers, and it will look fine. Customer did not understand, and simply wouldn’t accept my answer that seeing spikes in ping latency on the ethernet handoff between his server and my router is normal.

Unfortunately, many people use ping to diagnose problems, but they dont understand exactly how to interpret the results. First, not all latency is bad. Some devices are slow to respond because there is an issue causing problem. But sometimes, a device is slow to respond because it doesn’t feel like responding right away. Huh? Its called priority queuing. When you ping one server from another server, that ping is treated is high priority by receiving server. The recipient server responds as fast as it can, just as it would for any other request. But when you ping a router, the router can care less about that ping. Routers are designed to treat pings as the lowest priority request, it will get around to it after it finishes the other more important stuff its doing. Two routers right next to other might show 3ms latency, with intermitent spikes to 20ms – perfectly normal.

Interpreting ping data is a balance of latency and packet loss. The two routers might show latency, but upon closer inspection, there is ZERO packet loss, even after 10,000 pings. Though you could have two routers with stable low latency between them, but 3 or 4 percent packetloss. So you have to look at all aspects of the ping result set and the overall environment.

New Cooling Technologies for Data Center – Green or Not?

Thursday, May 20th, 2010

Once again, vendors are ramping up with new and advanced data center cooling technologies, in fact, I have received many calls just in the last 2 months. There is a common thread, they claim up to 80% reduction in energy costs. Wow, thats a big savings, is there a catch? Sort of… There are some technologies that can reduce electrical consumption, it’s not entirely a false statement, but there is a big catch – and its a not a “Green” technology by any means. I will explain.

Typical data center providers use Liebert cooling, basic DX (direct expansion). You have a floor unit that contains a compressor and evaporator coil, heat is rejected to an air cooled or glycol cooled condenser. These units use approximately 1.5KW per 1-ton of cooling. So a 40-ton data center installation, will have about 60KW of electrical usage just for the Liebert AC units. Can we get that down to say 10KW for 40-ton, YES. Here’s how….

Evaporative cooling has been around for years. In fact most large buildings use evaporative cooling instead of air-cooled dry coolers because an evaporative cooling tower takes up much less space. These cooling towers are fairly simple, big fans, lots of airflow, a really big heat-transfer coil (with glycol circulating) and a water source that sprays liquid onto the coil for it to evaporate off. Even in the summer, a 100-ton evaporative cooling tower can easily reduce circulating glycol temperatures from 100 degress F at inlet to 60 degrees F at the outlet.

The low-energy cooling technologies being advertised are basically a non-DX non-compressor solution. They tend to be rack based. So right next to the rack cabinet is a coil with glycol circulating from the evaporative cooling tower. The side cabinet sucks hot air from the rear of the cabinet, cools it across the coil, and supplies the cool air back to the front of cabinet. The coils is about 60 degrees or so, and with enough airflow, that will cool an average size rack. The heat rejected goes to the roof tower and is dissapated through evaporation.

So if this works, and uses less energy, why doesn’t everybody do it? Simple. One piece of information has been left out. Evaporative cooling towers use a HUGE amount of water to perform this kind of cooling. Instead of electricity and freon in a closed DX circuit, they use water and physics, but water is a resource and its not cheap. A 100-ton tower at max capacity (which is where it would be to get glycol outlet down to 60 degrees F) will use about 5,000 gallons of water a day. Not only is that a huge waste of water, but you are only shifting cost. Yes, your electric bill will be lower, but your water bill will be insane, somewhere around $2000/month.

Its common sense, if there were a better cooling solution, we’d have it. Data Center Providers are already using the most efficient system since cost is already a major concern. The fact is, cooling is already as efficient as it can be. These modified systems, may work for some people, for example, if you have a huge underground source of well water that is “unlimited” this may work for you. But most datacenters don’t have access to unlimited, free, clean, non-brackish water.

The Myth about Mid-West Datacenters

Friday, March 19th, 2010

Some articles have been written recently about where the best location for running and building a datacenter is. These reports always pick mid-western states as the ideal locations due to cost. South Dakota or Kansas is a great place to build a cheap datacenter if cost is the number one concern. Labor is cheap, material costs are low, electricity prices are low. But these reports always leave out something that is very important. PEOPLE.

Datacenter operations will always be central to locations with population density. East Coast corridor, Texas, California, and so on. The surrounding population will support the service. Who needs colocation or datacenter services in South Dakota? The only people who can benefit from this are those who do not need to touch their equipment or Fortune 500 firms who can afford to fly out their technicians to a remote site. What people don’t realize is most operations that use significant colocation resources (10U and up) need to touch their equipment on a regular basis. They can’t ship it off 1000 miles into the mid-west.

Furthermore, the reduced electricity costs (which is the most significant operational cost of a datacenter) is only temporary. In a few years electricity prices will start to even out. Its sort of an anomaly that is Nebraska you can get electricity at $0.03 per Kwh – that wont last long. Mid-west locations also do not have the immense diversified telecom and fiber infrastructure that is present in major cities. Besides, content users are located in the major cities – content providers and users should be close to each other.

LED Lighting is a “COOL” idea for Datacenters

Sunday, December 6th, 2009

We all know what CFL bulbs are. But few people know about how LED lighting can bring additional power savings to a datacenter.

For starters, LED lighting is slightly more efficient then CFL bulbs. The hidden value is heat output reduction. LED lights have almost no heat output. In a 5000sqft facility using traditional T8 35W bulbs vs LED, this can add up to a difference of 2000 BTUs per hour.

The only limitation to LED is it is still cost prohibitive. For example, a T8 fluorescent bulb is about $6. An LED T8 bulb is about $90. If you run your lighting 50% of the time, then it will take about 2 years to recover the cost from gains in power saving.

Ecologically speaking LED’s are also superior since they do not contain any Mercury. At Quonix, we are planning to convert to LED lighting systems within the next year for our datacenter.

Quonix Networks Colocation Services

Sunday, December 6th, 2009

A year in review…

Now that 2009 is nearing its end, its time to review where things are and where we are going.

We started the year with an expansion of fractional cabinet services by adding Quarter Cabinet (9U) Secure Colocation. The first quarter cabinet rack sold out in 30-days. Given the immediate interest, we will focus heavily on the quarter cabinet products, especially since VMware environments are getting smaller and smaller. We have full rack customers that can now migrate into a quarter rack running just a few servers.

VPS was also a new service addition for 2009. Our high-end guaranteed non-oversubscribed VPS platforms are doing very well, and they are a nice compliment to our existing web and email hosting segment.

We also saw increasing growth in out T1 and T3 DIA business. The biggest road block has been getting the word out to businesses in Philadelphia that Quonix Networks has the most competitively priced DIA circuits, not to mention the best IP network.

The icing on the cake for 2009 was finalizing plans on our new Harrisburg, PA datacenter. This facility is scheduled to open by the end of 1st quarter 2010. The Central PA location is ideal for Philadelphia based customers looking for a secondary DR location that is 100 miles away, but still convenient enough for daily commute.

2009 was a great year, but we expect 2010 to be even better. Some of our plans for 2010 include lit fiber optic services in Metro Philadelphia and Point-to-Point Metro Wifi.

Why did my colocation provider go out of business?

Saturday, November 21st, 2009

This is actually an interesting question that someone asked me the other day. Colocation services, just like any telecom service, are residual in nature. This means the provider has a pretty steady stream of revenue coming in, baring any major disaster.

So how does your run of the mill colocation provider go under? Simple. Lines of credit. Lines of credit can single handedly destroy a business. Why? Because people use them like loans. A loan has a fixed term. A line of credit does not. In fact, a line of credit can be called at anytime and once called you usually have 90 days to make full payment or the bank will seize assets.

During the boom years of our wonderful financial system, business were getting insanely high lines of credit. For example, $200,000 with an interest only payment of 3 percent or even less. Some colocation providers used these lines of credit to expand. Expansion costs included staff, new equipment, advertising, and other things. When the financial crisis hit, many of these lines of credit were called, instantly putting those companies into the red – forcing bankrupcy and liquidation of assets.

There is not a decline in people who require datacenter services in our current recession. The providers that have gone under mostly did so because of poor financial management.

What to look for when choosing a colocation provider…

Wednesday, November 18th, 2009

Here is a short list of things to do and to look for when choosing your next colocation provider.

  1. Make sure they alone own and operate the facility. No resellers.
  2. Call them at 2:00am and see who answers the phone.
  3. Make sure the facility is accessible 24×7.
  4. Make sure they have at least two onsite Generators.
  5. Make sure they have at least two UPS units.
  6. Make sure they have been in business for at least 3 years.
  7. Request a reference from a customer who has been with them for more then 3 years.
  8. Ask to see copies of their preventative maintenance contracts on power and cooling systems.
  9. Make sure they don’t impose excessively long contract terms.
  10. Insist on an immediate tour.

So the above sounds obvious for the most part. #1 is the most important. Never, ever, take service from a reseller – especially for colocation. If you have a problem, a reseller can’t do anything about it. And white-label datacenter providers are growing in numbers. If you do a search for colocation providers, more then half are just resellers of someone elses space. If your sick, you go to a doctor – not someone pretending to be a doctor!

References are also extremely important, and overlooked these days. Its old fashioned business common-sense to get a reference when starting a new relationship. You demand references when hiring a painter, why not when selecting a colocation provider. And make sure its a reference for a customer that has been at the facility for some time. If a provider can’t get a reference that is 3 years old its means their a new business or have high turn-over – two things you want to avoid.

And lastly, audit their support. Its easy. Call them at 2:00am and pretend to have a problem with a server – even though you dont have a server there. See how they respond, if they even answer the phone at all.