Operating a data center requires more than just good planning – it also calls for effective management and regular process audits
Early this year, a data center located in a desert of all places suffered major damage when a roof drainage channel in an internal wall became blocked and left the IT space knee-deep in water.
Although the center was correctly equipped with subfloor sensors that provided the proper early warning, the management personnel failed to respond quickly enough due to a lack of instructions on what action to take. Specifically, there was no indication in the documentation of who to contact in the event of water ingress.
Before embarking on the costly and specialist planning of secure and reliable data centers with redundant power and climate systems, it is therefore essential – as this example clearly shows – that all operating procedures be fully developed by the time the facility is handed over to the operator. Only then is it possible to ensure that the required availability is actually guaranteed through effective, practical measures.
Components, such as cooling equipment and power supplies, are usually operated via general building management systems (BMS). These integrated control systems and associated alarm consoles are often deployed on a local basis only, i.e., each location is managed separately, with no enterprise-wide control. As a result, it is usually not possible to monitor data centers remotely or develop a hierarchical system that incorporates all locations. The “umbrella” systems developed to address this problem have made very little impact on the market thus far – mainly on account of their technical limitations and cost. It is not readily possible to use conventional, local building management systems either to create and maintain processes with continuous updating of key information, such as contact persons, or to monitor and control IT infrastructure components. Instead, the best available option for running data center facilities is the category of dedicated software solutions known as data center infrastructure management tools (DCIM).
DCIM tools are a pivotal requirement for reliable and efficient management of data center resources. Linking building engineering systems with the IT infrastructure, they provide valid and up-to-date information that enables users to make the right decisions every time.
The documentation and planning of relevant building systems and IT infrastructure using DCIM tools is not an end in itself. Rather, it is the basic prerequisite for reliable data center operation. The primary aim is proactive fault prevention rather than reactive fault correction. Complete and accurate documentation of the entire current path enables data center planners to avoid the accidental overloading of fuses, power supplies, generators, etc. Knowing the location and specifications of all energy-consuming devices in a particular zone allows for accurate evaluation of heat output and thus reliable calculation of cooling requirements. Potential hotspots are identified and avoided before they can become a problem.
Other benefits of using software tools include greater process efficiency and a reduced staff workload as the specifications issued by the relevant departments can be adhered to during every stage in the planning process. The results are secure operation within predefined limits and avoidance of critical load states or uneven load phases – even when changes are created and implemented by less experienced planning staff.
The latest DCIM tools include a number of fault-prevention functions, including maintenance management, preventive maintenance, and predictive analytics. The ability to analyze dependencies and “what-if” scenarios is important when it comes to identifying vulnerabilities. In worst-case scenarios, DCIM offers impact analysis options that help contain the fault while quickly identifying alternative routes or replacement systems. It is also possible to orchestrate the redirection of traffic to other, unaffected systems. And during post-fault recovery, DCIM provides useful restart procedures that minimize overall downtime.
DCIM tools not only ensure that data centers run smoothly, they also go one step further. The desire for efficient operation can often be at odds with the need for system stability. By offering a complete overview of all dependencies and resource utilization, DCIM tools enable users to identify systems that are not running efficiently or for which unnecessary redundancy is provided.
Summary: Good planning combined with deployment of DCIM tools and appropriate processes is a proven way of achieving both efficient operation and the highest possible levels of system availability. All processes must be audited regularly and all changes reflected in the overall concept. Here too, DCIM solutions are the tool of choice.