The HUGE cost of IT Datacenter downtime and how to cure it!
1. View your solution from the point of view of, “What if it never failed in the first place?”
Cloud architectures have the tail wagging the dog; companies are now rewriting their software to account for failures in the cloud. Not only do you have cost for downtime and additional software development, but service failures can also cost you your reputation. Why pay to run your software and multiple cloud instances at the same time, when you can run on infrastructure that was designed so you never fail in the first place?
2. Engineer and test your infrastructure to be a failover environment.
A 2N (failover) architecture means dual power supplies, redundant hard drives, dual PDUs, dual UPSs, dual generators, dual networks, dual NICs…if it’s possible to put two of something into a system, we have it there. This isn’t theoretical. It’s not an add-on or an option. It’s a basic tenet of our design philosophy. Over a third of the cost of our systems is in redundancy. Why? Because these parts will eventually break, but component failure doesn’t need to lead to system failure. The industry standardizes on a 1N architecture and promises to fix it quickly if it fails (which it will). This really means, “I’ll replace your hard drive with a brand new one,” but how many hours or days are you going to be spending getting your code, your configuration, and your data back on to that system?
3. Extensively test the network
One of the advantages of a 2N network is redundancy. But that redundancy does no good if the redundant items don’t work properly. In the event something does fail, the redundant hardware will take over in less than one second, meaning downtime is essentially zero.
4. Use only top name equipment, such as Dell, Netapp, Cisco, etc…
If your company only cares about buying the cheapest hardware, you’re going to have high failure rates, which mean costly downtime. By purchasing hardware from the best vendors; Dell, Netapp, Cisco with the best reputations, we ensure we have quality products delivered every single time. We can’t stop a motherboard from failing, but we can absolutely stop motherboard failure from interrupting your customers’ users’ experience.
5. Burn in your infrastructure for 72 hours before putting it into production.
This is a big one! Even if you buy from the highest quality vendors like Netapp or Dell , there’s always the possibility of something going wrong. Testing our servers before being released to you ensures you get the highest quality hardware that’s been tested to the fullest extent possible. To accomplish this we run a CPU, memory, and disk-intensive synthetic workload that stresses these components by simulating 100% utilization of the system. This not only detects faulty components, but has the added benefit of “heat cycling” the system to ensure that manufacturing defects (such as weak solder joints) are detected prior to being placed into a production environment. This comprehensive interrogation of the system allows us to discover any issues with the equipment…before your customers do! See Greentec Systems for all of your IT hardware needs and Peak Hosting for cloud and managed services.
By Peak Hostings’ Erin Stadick, Product Engineering Manager .