Mission Critical Service Must-Haves
To run a mission critical service, the network and system architecture must be designed so there is no single point of failure. But, that’s easier said than done, especially if you’re not the most tech-savvy person. Here’s a basic checklist – we’ll start with the obvious:
- Redundant servers
- Redundant network devices (switches, firewalls, routers)
- Virtual machines: useful technology that makes it easier for you to not only extract data from hardware, but also allow for the quick switching of servers if the main server fails
- Database clustering/mirroring/replication technology: a second instance of your database with up-to-date information, in case the main database goes down
- Regular backups made without interrupting services: this allows data to be saved every night periodically to guard against data loss
- Ability to store backups securely in a different location from the data being backed up
- Use of network links from multiple service providers: if your Internet Service Provider (ISP) goes down, this is how you make sure you are still connected
Now, onto the less obvious. These are the items you should double check!
- Different power circuits for redundant systems: if you have two servers, primary and backup, you don’t want to plug them into the same circuit, or else a single circuit breaker tripping would wipe out both
- Having a generator and Uninterrupted Power Supply (UPS): the UPS and generator can keep your systems running if the main power source goes out
- Redundant air conditioning systems: keeps your servers cool even if one unit fails
- Storage that can recover from a disk failure as well as a failure of the storage controller: on a disk level, this ensures that data is stored on multiple disks
- At least one backup data center with functionally equivalent systems: again, this protects against entire data center failure
Of course, ongoing maintenance is just as important as the initial set-up.
- A comprehensive monitoring system capable of checking each critical system and alerting the appropriate people when there’s a problem: this system would be custom developed in order to tailor each of the checks to your organization’s critical application
- Regular software patches and updates: you can thank your system administrators for this one!
- Configuration changes to critical systems scheduled after peak hours
Running a mission critical service is not easy, and it is not a one-off task. There are many components that must be taken into account over the lifetime of the application. Sound intimidating? It is. These “basic” guidelines actually sum up to months and years of hard work by expert IT teams.
It’s why many companies use a cloud service provider who has already built a fault-tolerant system and manages the entirety of the service. By outsourcing this task, you allow experts to take care of the monitoring and maintenance for you, and this helps to ensure maximum uptime.