In a previous post (http://dcim.tumblr.com/post/152857978208/risky-business) we hinted that a secure Data Center could be the safest place to be in case of a zombie attack. That may have been a bit of wishful thinking on our part. Many
datacenters are actually already plagued with zombie servers exhibiting the energy sucking behaviour of a vampire.
A study from 2015 from the Anthesis Group (http://anthesisgroup.com/wp-content/uploads/2015/06/Case-Study_DataSupports30PercentComatoseEstimate-FINAL_06032015.pdf) claims that up to 30% of servers are basically “comatose”. The study considers those servers that rarely exceed 6% CPU usage on average over the course of a year.
While these servers provide little or no useful work, they still consume a fair amount of power. A report from the National Resources Defense Council (https://www.nrdc.org/sites/default/files/Saving-Energy-Server-Rooms-FS.pdf) estimates that a typical server uses on average 5 to 15 percent of their maximum capability, while still consuming 60 to 90 percent of their peak power.
Those findings are, of course, one of the drivers towards virtualisation and consolidation. Such strategies are definitely more efficient in many ways, certainly energy wise.
Why, then, the Anthesis study above estimates over 3 million zombie servers, just in the U.S., cumulatively drawing almost 1.5 gigawatts?
Probably because in many cases, the move towards virtualisation has been gradual, maybe one service at the time was migrated and nobody thought of turning the old server off once all services have been moved. Possibly a new software platform has been deployed and users slowly migrated until nobody is using the old software and the server running it is just waiting there, like an old man hoping for a visit from his busy children living too far away…, so sad!
Alas, unused servers do not have an impact just on space and power utilization, but also on power outlets and network ports that as such become otherwise unavailable.
A fun and exciting option would be to turn the suspected zombie server off and then wait for the phone to ring. But who are we kidding? Us, the IT people, are not renowned for our wit and sense of humour, so we will lean towards a less mischievous approach. Something, like… a plan.
First step is, undeniably, to identify the idle servers. Brute force could be used here. A list, maybe a spreadsheet, probably exists already, enumerating all the servers. From that, we can start removing the obviously operational ones and then connecting to each remaining ones to verify their state. Time consuming at its finest.
An easier method would require the deployment of power monitoring tools. We could measure the servers’ workload over time. After a while, we could have a pretty clear indication of which ones are not pulling their weight. It would have the added advantage of creating a baseline if applied to the totality of the servers. Future idle servers would then be recognised.
This approach, is a reactive one. It would be better to integrate it as part of the decommissioning process of a server.
For a more suitable process, establishing a workflow that integrates the intervening participants becomes necessary.
This is where a DCIM platform can become an important instrument to accomplish these objectives. When fully implemented, it provides a real time view of the assets, their location and ownership information and also where they are connected.
Up-to-date documentation is then available, at the time when a server, or any other asset, is decommissioned. You will also be informed of all the related components that will be impacted by the change as well.
As such, it makes sense to let go of wooden spikes, garlic necklaces or silver bullets and invest instead on a DCIM
deployment project. That way, your team will be able to identify and track all zombie servers in real-time.