Go Back |
|||||||||||||||||
|
Go To Articles Directory Home Page To get the current article, - See Below (at the bottom of the page) -. For top news titles, see below. Web sites and videos listed in this page are frequently updated. If you find that this page is useful (quality of web sites, images and videos, ...), you can add it to your favorites. Bookmark Page ! |
The Art of Failure Planning
He that will not apply new remedies must expect new evils; for time is the greatest innovator.
Why Plan A Works Better For Small Problems Plan A shines at keeping minor problems from growing into disasters. Take RAID for example. By having a set of redundant hard drives, you prevent the predictable failure of a single hard drive from causing an entire system to fail. Plan A relies upon predictable outcomes. If you have a backup system in place that automatically takes over in case the primary system fails, that is a predictable outcome. Calling Dell support in case the primary system fails is not a predictable outcome. Most big organizations invest solely in Plan A solutions. They have inline redundant systems, huge knowledge bases of information and pay millions of dollars for support contracts. When it works, Plan A is really invisible. It quietly and efficiently keeps things running. Ironically, most disaster recovery plans are entirely Plan A, probably because they are made by the same big organizations that love Plan A solutions so much. They literally attempt to plan for every eventuality. This is a mistake because Plan A has one major drawback: it either works or it doesn’t. And when it doesn’t you’ve got a real disaster on your hands that somebody has to fix. Why Plan B Works Better for Big Problems Where Plan A fails, Plan B excels. Plan B analyzes the problem and then develops a simple and flexible plan to fix it. Plan B requires common sense and action. The most important choice to make for your Plan B is personnel. The best Plan B starts with a team of talented and experienced people you can count on who work well under pressure and are great at troubleshooting. To be effective, you have to give your Plan B team some room to work. Give them the authority to make decisions and resources to support their efforts. Accept that the solution will be imperfect. Your team must be allowed to make mistakes. This isn’t to say that Plan B should be completely ad-hoc. It still is a plan, after all. Involve technology, when appropriate. For instance, we use backup software from Ultrabac that makes a daily image of our servers. In the event a server fails, we can restore the image on a different server, even if the hardware is not the same as the original. It’s a manual process to restore a server and data since the last backup will be lost. Still, it’s a pretty good solution given the alternative. Most small organizations rely solely on Plan B. They have little or no backup systems or even backups for that matter. In case of failures they endure downtime, hope for miracles and sometimes get them. The Art of Failure Planning If you are a big organization, you have to face the reality that a true disaster recovery plan needs to look more like Plan B than Plan A. Instead of spending a lot of money trying to plan for every possible bad thing that could happen, put together a qualified Plan B team. Then have that team tell you what resources they would need to respond to an emergency. The lessons of Hurricane Katrina demonstrate clearly that trying to force Plan A to work will only create a bigger problem. Had the government gone with Plan B to start with, much of the suffering would have been avoided. If you’re a small organization, you can do with more Plan A. Invest in practical backup solutions and redundant hardware for critical systems. There are proven solutions for many common failures. Take advantage of them and stop being a victim. Every organization should have failure plans that include both Plan A and a Plan B. The art of failure planning is to understand the limitations of your plans and thereby make better decisions about how to respond to failures. About the Author: Glen Kendell is a network architect and owner of Release to Production. He publishes a monthly newsletter called In-Production: Achieving True High Availability.
|