The modern world is sewn together with threads of electronic information, all of which lead back to surprisingly few data centres and information hubs. If one of these fails, communications, TV, defence, commercial and financial systems can go down for days, and information, reputation and profits can be lost.
Recent high-profile data centre failures for BT and Delta Airlines have brought the issue of reliability to the fore. Historically, data centre power cuts have been experienced by many blue-chip banks and telecoms providers, so no-one can claim immunity from such problems.
An all-too-common reason for failure is a loss of power. Of course, vital facilities have uninterruptible power supply (UPS) systems that should cut in instantly – to provide power until backup generators or alternative supplies can be brought online – but these only last for a limited time, or may not cut in at all, and this is where the real problems occur. We only really know that backup systems work when the main system fails, by which time it may be too late. So the only way to have near peace of mind is to test them comprehensively, rigorously and often.
Always keep meticulous records and test new equipment straight away
Unfortunately, there is no such thing as a standard system. Data centres are usually custom-built, with numerous complex interfaces, which can result in cascading sets of events that are extremely difficult to predict. They also tend to be built in small steps; the days of fitting out a complete data centre with double-digit megawatts of infrastructure and waiting 10 years to see it fill up – or not – are long gone. Operators now defer the cost of anything not needed today, and build out in phases. If you are adding a few hundred kilowatts of IT load to a much larger facility, few people would want to pay for full integrated system testing (IST) of the whole facility. This can mean equipment going live with untested critical interfaces.
Few buildings are fitted out exactly the way the designer intended, or with the same equipment that was available for the original build. The greater the difference, and the more complex the addition, the more likely it is that there will be an unforeseen problem. There is also more scope for human error.
The scale of the extension work should inform the level of testing. If you can detail the scope of works clearly and completely on a sheet of A4 paper, you might not need to do a black building test. When an extension consists of five or six power distribution units, fed from four UPS strings with previously tested capacity, there is a strong argument that a full IST might not be necessary.
Bespoke testing of each facility is difficult and time-intensive. IST should be done at least once a month, though rarely is – and if it is not done, each time you turn off the power, there is a higher chance that it will not come back on.
The problem is magnified when a design company is appointed to plan a facility, a separate contractor to build it and another to fit it out. It is further complicated when there is more than one end-user of the facility. Every contributor must get their part right; from the equipment manufacturer to the installer. Wherever you are in the chain, it falls upon you to ensure that everyone upstream has done their bit. As an end user, insist on proof that testing is done regularly, rigorously, consistently and, of course, successfully.
There is no one-size-fits-all answer. Always keep meticulous records and test new equipment straight away, but do not assume that ‘normally reliable’ equipment will still be reliable the next time you need it. Occasionally, it is also worth testing standby equipment beyond what the policy or the insurance company says. There are many examples of emergency generators performing perfectly for their mandatory 15-minute test, but breaking down after half an hour.
Remember, safety nets are only any use if they work.
- Thank you to Andy Harrison of Arup for his help in researching this article
- CIBSE commissioning codes (Codes A, B, C, W, R)
- ASHRAE commissioning guidelines (guidelines 0, 1.1 and 1.5) and ASHRAE commissioning standard 202
- Those new to mission-critical commissioning should refer to the widely referenced five-level approach to structuring the commissioning process
Roberto Mallozzi is the managing director at air conditioning supplier Gree UK