Browse Content by Topic:
My Server Crashed! Someone Call 9-1-1!
Author: Dave LeClair/Stratus Technologies
Copyright: 9-1-1 Magazine, Feature Content
“This program contains true stories of rescues. All of the 9-1-1 calls you will hear are real. Whenever possible, the actual people involved have helped us reconstruct the events as the happened.”
Does that sound familiar? For years TV viewers watched William Shatner introduce Rescue 9-1-1, the show that chronicled just how critical the emergency system was (and is) in our daily lives. Over 15 years after the final broadcast, the U.S. has more than 6,000 Public Safety Answering Points (PSAPs) with more than 240 million 9-1-1 calls made each year. Now, we are moving on to Next Generation 9 -1-1, mobile access and mobile location, dynamic mapping, and a host of other new technology implementations.
Increasingly, we hear more about another E 9-1-1 reality. Just like the rash of Google, Amazon and similar cloud computing outages, we hear and read more reports about CAD/E-9-1-1 service outages. Take residents of Cameron County Texas for instance, who weren’t able to connect with dispatchers due to an outage last month. It took nearly two hours to begin rerouting calls to a nearby police station, and four hours to get the system up and running again.
Technologies, applications and the systems they run on are becoming more complex and more highly integrated, with more sophisticated features than ever before. It’s putting a strain on PSAP’s ability to keep up and manage these systems.
When evaluating new Computer Aided Dispatch systems and related applications, the natural tendency is to focus on the software, the user experience it affords call takers and dispatchers, and how efficient and productive it enables them to be. After all, this is where all the sizzle is. This is what the vendors sell and what departments buy. It doesn’t matter how good the software is, however, if it can’t be accessed.
Server downtime slows emergency-response times, impacts the computer’s ability to capture and disseminate vital information, jeopardizes the safety of first responders when location history or fire inspection data is not available, harms public perception and reputation of your department, and even opens the department up to potential lawsuits.
The more highly integrated, sophisticated and functional the software is, the more reliable the underlying hardware infrastructure it runs on needs to be. The ability of server hardware and server configurations to deliver high availability differs dramatically - minutes of downtime versus a week or more over the course of a year.
Any discussion of server availability will eventually get around to talking about “the nines,” or the percentage of uptime that can be expected from a server environment. For mission-critical applications such as public safety, the goal should be 100 percent availability. Realistically, 99.999 percent, often called continuous availability, is considered the gold standard. Unfortunately everyone doesn’t always seem to be aiming for that gold standard and seems satisfied with delivering less.
Conventional off-the-shelf x86 servers provide an uptime level of 99 percent. While a score this high would be commendable in some situations, it just doesn’t cut it for this industry. Ninety-nine percent uptime is equivalent to an average yearly downtime of 87 hours and 40 minutes. This means that of the 240 million calls placed each year, about 2.4 million will be affected. Stepping up to 99.5 percent is relatively easy, as it only requires what’s called a “hardened” server with redundant power supplies, fans, and a RAID storage array, coupled with good system administration practices. However, this averages out to almost nine hours of downtime and 1.2 million affected calls. Achieving 99.95 percent uptime usually requires failover cluster technology, specialized IT skills, frequent testing and a hefty budget and still you are looking at more than 4 hours of downtime and 120,000 affected calls.
Living with 99.99 percent uptime, or 52 minutes of annual downtime, is quite acceptable for many departments, especially those with lower to moderate call volumes. When Shatner hosted Rescue 9-1-1, the “five-nines” gold standard was reserved only for the largest public safety agencies in major cities, primarily due to cost. But like all computer technologies, prices and complexity have dropped dramatically while power, performance and affordability have rocketed up. Most any department has the means to have uptime reliability of this caliber.
So how do you do that? Here are 4 tips to get you moving in the right direction:
- Get Proactive: Once an outage has occurred, it’s already too late. Detecting and preventing failures before they happen is entirely possible and obviously preferable to recovering from failure. Recovering includes not simply getting computers back up, but also getting the PSAP running smoothly again, entering data collected during the outage, and explaining to officials what happened.
- Virtualization is better but not the best: Virtualization enables much more efficient use of computer resources and simplifies system administration. However, even high-availability virtualization solutions fall short of providing the 99.999 percent availability. Applications running on virtual machines also are running on a physical server or servers. The more VMs and applications per server, the greater the pain downtime inflicts. .
- Not all downtime is a surprise: Planned maintenance such as upgrades and software patching are inevitable. When making the investment into uptime technology, you need to weigh this in as well. In the end all downtime, whether planned or not, can be disruptive. There are solutions that will allow you to avoid downtime during maintenance and they’re definitely worth looking at.
- Services should be a consideration: When making an investment into any technology, look carefully into everything you’re getting, not just what comes in the box, but what comes with the vendor. Are you getting always-on system monitoring and analysis? Does your system handle errors transparently? Are there experts on the back-end that monitor the system 24/7? These are all things that will help lower your chances of downtime, and considerations that need to be a part of the discussion.
Vendor’s Corner is a guest column about product and vendor issues and solutions. Dave LeClair joined Stratus in 2011 as Director of Product Management and Product Marketing. He is responsible for the strategy, delivery and success of global products and service offerings including Stratus ftServer® and Avance®. He has over 20 years of experience developing platforms, devices, software and services in the computing and communications industries with roles in strategy, product management, engineering and business development.
LEARN MORE: Download “Protecting PSAP Applications from Downtime,” a free eBook from Stratus Technologies. Click here for more information
How does Stratus help? For over thirty years, public safety agencies around the world have come to rely on Stratus Technologies to assure the continuous availability of their mission-critical applications. Stratus solutions are designed to prevent downtime and data loss. Advanced, patented, uptime assurance technologies, combined with proactive availability management and monitoring services, provide the highest levels of availability for mission-critical applications. The Stratus ftServer family of systems are engineered with fault-tolerant technology, fail-safe software, and self-monitoring capabilities that prevent unplanned downtime to assure 99.999 percent uptime or higher. Stratus Avance software addresses the needs of smaller PSAPs with limited budgets and support resources. It provides a cost-effective solution that assures the availability of mission-critical applications. Avance software transforms a pair of ordinary servers into a high-availability platform that practically manages itself and delivers greater than 99.99 percent availability.
For more information, see http://www.stratus.com