There is no doubt in my mind that the jobs of IT and data center managers are getting tougher. I’ve witnessed it firsthand. Most are being challenged to maximize system availability and uptime while maintaining increasingly complex hybrid IT environments that include Edge computing infrastructure. Maintaining this balance would be near impossible if it weren’t for a secret weapon that managers can now access – data. Specifically, IT physical infrastructure system performance data.
When interpreted correctly, actionable insights from this data help savvy managers mitigate data center and Edge computing operations risk. But interpreting the data correctly is easier said than done. Today, much of the data surrounding system performance is gathered from sensors residing within various pieces of power and cooling equipment, and to the typical IT support person, that raw data is useless.
It’s not formatted or “clean” and typically stems from many different devices and multiple vendors. It is like trying to interpret multiple languages without any translation help. IT support teams need this data translated, formatted, and presented so that they can use it to make important operational decisions.
Over the years, on-site Data Center Infrastructure Management (DCIM) systems were put into place to address this issue in traditional on-premise data centers by monitoring system infrastructure (e.g., power and cooling) behavior. However, most of these original DCIM tools were designed to be used by experts only. As a result, an average user only implemented a small fraction of the available features due to steep learning curves and system complexity.
These types of systems are also impractical for today’s Edge computing environments where access to on-site IT expertise to monitor the system is either rare or non-existent.
As is often the case, time moves on and technology moves with it. The recent emergence of cloud-based DCIM software technologies has helped to directly address these “translation” issues.
By gathering raw IoT data, centralizing it, and using AI algorithms to identify patterns, actionable information is generated that provides operators with clear visibility to IT and infrastructure asset behaviors and recommended corrective actions.
How cloud-based DCIM delivers real world results
- Improved data gathering – Operators can now ditch manually updated spreadsheets because cloud-based DCIM software both automatically gathers machine data, and also produces easy to interpret reports on how equipment is behaving. The collected machine data can represent a wide variety of critical data points including: temperatures of the critical infrastructure, the quantity of power being consumed, or the level of humidity in the room where servers are housed. Critical infrastructure sensor values are automatically collected on a regular basis and placed into a centralized data lake in the cloud, where that data is pooled with the rest of the data collected from thousands of other sites.
- Correlation of both internal and external data – Once in the data lake, asset behavior, across many brands of equipment, and across multiple sites is compared. In addition, all the actions taken in response to alarms are tracked using data pertaining to equipment behavior before and after an incident. This output provides a clear record of actions and their consequences, positive and negative. Such data pool correlation provides a deeper understanding of root causes of problems and can generate reports that advise operators on how to tackle particular problems.
- Predictive maintenance – Cloud-based DCIM tools are also opening the door to the possibility, in the near future, of providing predictive maintenance for both central data centers and Edge computing sites. In a predictive maintenance approach, cloud-based tools analyze equipment behavior, discern patterns, and issue probability of failure reports. The forecasting ability of these tools would get more precise over time as larger and larger quantities of data are gathered and analyzed. Under such a scenario, end users would only need to schedule a maintenance person when a component actually needs replacement, just before any unanticipated downtime occurs.
- Actionable analytical outputs – Once the AI algorithms within the cloud-based DCIM tool identify the critical patterns of equipment behavior, output reports are generated for stakeholders. Consider, for example, a report on system alarms. In a typical environment alarms are generated all the time and only a small percentage of them are actually relevant. The rest are nuisance alarms. Such activities consume valuable systems administration time. Cloud-based DCIM tools bundle only those alarms that are actually significant based on the historical data. As a result, operators quickly find the root cause of a problem, without having to painstakingly sort through hundreds of thousands of logs a day.
Data center managers and those responsible for managing Edge computing sites are seeking solutions that can help them navigate complexity, optimize performance and attain the peace of mind that leads to a better night’s sleep. Such solutions are available today.
Cloud-based DCIM tools efficiently “translate” data to provide easy to understand and actionable insights that cut through the noise, saving time and freeing up staff to work on other value-add activities.