A typical warm day on South pole is 20 degrees below zero and the irony is that the data centers run by ICE CUBE Neutrino Observatory can still get overheated. A normal day at any data center involves troubleshooting, racking and stacking and, with such enormous data in-flows, that the task becomes tedious for the employees and they are prone to sometimes failing to deliver in real-time. The technicians aren’t to be blamed either because a typical UPS is reactive — it either functions flawlessly or burns out altogether. Machine learning models, on the other hand, are proactive and they work stupendously to forecast failures.
Data centers are modern-day engineering marvels. There is no model to look up to and data center managers build customized devices and come up with original solutions to unforeseen problems.
Data Centers: Think Big
Anyone who has worked on a personal computer, would have struggled with a malfunctioned fan and other cooling issues. And we know how much data a typical household desktop handles. Now imagine thousands of these machines working in parallel. Think about the heat generated every minute and the unwanted power fluctuations. Though cloud storage decouples the physical hardware through network virtualization, the scale at which these storekeepers operate is still colossal.
Data abundance leads to data accumulation and data centers work round the clock to manage millions of bytes of incoming data as well as the previously stored data.
Data Centers Generate Data
Data centers also generate data — server data, power outages report for a particular system and a lot more. Cloud TPUs are designed to run heavy machine learning models. Now, the engineers are harnessing the same to recognize patterns and predict outages. Experts observe that the future of data storage is software defined. Dis aggregation and server simulation is already a thing and individual, off the shelf devices, are emulating multiple servers with virtualization.
Having said that, these virtual instances origin from a Hypervisor somewhere so no matter how much of virtualization we imbibe, hardware systems need maintenance and cooling systems need to get smarter.
Smarter Data Centers
Creating smarter data centers becomes increasingly important as more companies adopt a hybrid environment that includes the cloud, colocation facilities, and in-house data centers and will increasingly include edge sites, Jennifer Cooke, research director of IDC’s Cloud to Edge Data center Trends service, said to a leading online portal which writes about data centers.
Outside air temperature, the data center’s power load and the air pressure in the back of the servers where the hot air comes out from, are some of the few factors that are considered while designing a cooling system. So, where do machine learning models fit in?
Machine Learning To The Rescue
A typical rack may be consuming 10kW or, it may shoot to 15kW. ML models can predict such spikes one hour into the future and provide much-needed breathing space to detect and resolve catastrophic outages.
For example, Google’s TPU 3.0 is power-hungry and it is not a viable way to cool it with air. So, the engineers have retrofitted infrastructure to accommodate direct-to-chip liquid cooling.
Google started deploying machine-learning software in its data centers processors it designed in-house to improve its deep learning capabilities. Its machine learning algorithms automatically adjust cooling plant settings continuously, in real-time, reducing the annual power consumption. Improving efficiency and risk analysis forms the core of any data center management job. Companies with in-house data science expertise pursue their own machine learning initiatives while others are turning to vendors who have built custom software to tackle the same.
Apart from customizing coolant circulation, these ML models can also:
- Analyse servers and detect anomalies, such as ghost servers running applications no longer in use.
- Consolidating data centers and migrating applications and data to a central data center, algorithms can help it determine how the move affects capacity at that facility.
- Bolster cyber security.
Algorithms detect anomalies that show signs of an impending failure, the system alerts customers so they can troubleshoot before the equipment goes down. Incident analysis helps in determining the root cause faster.
Autonomous Data Centers
The economics of data centers are crucial for any data vendor. Optimizing the power usage by employing state-of-the-art cooling systems is a challenge every professional in this industry faces.
Machine learning is expected to optimize every facet of future data center operations, including planning and design, managing IT workloads, ensuring up time, and controlling costs. IDC predicts that, by 2022, 50 percent of IT assets in data centers will be able to run autonomously because of embedded AI functionality.
Companies are now offering solutions that utilize machine learning models. These models skim through the internal reports on the storage and help the engineers design storage space, optimize the rate of cooling, predict the next spike and solve other infrastructural redundancies.
Schneider Electric, Maya Heat Transfer Technologies (HTT), and Nlyte Software are one of the few top companies that offer ingenious solutions to existing problems and which are capable of forecasting a failure.