Stratus Technologies, The Availability Experts, Provides High-Availability Architecture for Mission-Critical Healthcare Applications
There’s little doubt that system availability is more important now than ever. Hospitals are rolling out a host of mission-critical clinical applications. That means downtime directly impacts patient care and caregivers. Those systems usually run on Windows or Linux servers. What can an IT professional do to make sure that technical glitches don’t increase patient risk or clinician dissatisfaction? Alan Gilbert is the healthcare practice leader for Stratus Technologies, which specializes in high-availability architecture for healthcare and other technology-dependent industries.
Give me some background on the company.
Stratus Technologies is the availability expert. That’s a mantle we’ve earned over 28 years of supporting mission-critical and business-critical applications for customers worldwide. When you make an 800 call, swipe a credit card, or trade stocks, there’s an excellent chance you are touching a Stratus product.
Sage Software, formerly WebMD Practice Services/Emdeon, has had our servers for many years. In 2001, we introduced the industry’s first fault-tolerant server for Windows. Later, we added Linux. Now we have applied our experience in continuous availability to establish a solution services group. The group considers availability issues throughout a customer’s IT infrastructure and their business processes end-to-end.
So Stratus works in multiple industry segments?
We sell mostly through the channel. We find the best software solution provider and value-added reseller to do business with in each vertical market. That way, we provide the customer with a complete solution. We do this in banking, financial services, manufacturing systems, telecom, public safety, and a number of other markets where there is a compelling need for unfailing application uptime, including healthcare.
We can bring to bear every bit of expertise supporting critical operations in many industries to work with the best software solution providers serving the healthcare industry. Stratus is one of the very few companies capable of delivering this capability, period. That means we find companies in multiple different industries that have a need for continuous availability.
So why Stratus for healthcare?
This is a perfect market for a continuous availability hardware platform, especially with so many functions moving to digital form and the need for unquestioned data integrity. Name the application: electronic medical records systems, radiology, order entry, pharmacy medication management. When everything is digital, there is no falling back to other systems. If a service is unavailable, the ripple effects throughout a hospital will have serious consequences in patient safety and operational efficiency.
If you believe the Institute of Medicine, and I do, 98,000 patients die a year and 1.5 million avoidable medical errors happen. With our partners, we can seriously help address these and related issues.
Tell me how fault-tolerant servers are different from those usually deployed.
A fault-tolerant architecture is designed to prevent server failure from occurring. Competitors’ solutions are designed to recover as quickly as possible after the failure has already happened. The difference in uptime availability during the course of a year can be measured in hours. On average, Stratus servers will experience less than five minutes of downtime a year.
And your customers are seeing those results in healthcare?
Yes, in healthcare and throughout our installed base, which includes thousands of machines. Partners such as JJWild and MEDITECH sell their HIS solutions to a hospital. The most critical business processes go on Stratus and the everyday applications run on other servers from the likes of Dell or HP.
When Sage sells a practice management software package for electronic medical records to a large clinic or doctors’ practice, they often recommend a Stratus server. Why? Because as good as these applications are – and they are among the clear industry leaders – there are big problems if the server they run on fails and the application goes down.
Let’s say you’re working with Sage. Will Sage bring a prospect to you for a creative solution?
Sage, Agfa, JJWild and MEDITECH all have our systems in their price book. They package a total solution for their end customer. They also get the first responder if there’s a problem. They know there is a problem they need to address because our servers automatically alert them through the Stratus service architecture.
The customer often doesn’t even know the server has a problem. With our servers, the application continues to run and business goes on even when there is a component failure. We design them to monitor their own performance, isolate troublesome components from the rest of the server, and call home to the service center to report exactly what is wrong.
If a new part is needed, the server itself will place the order for next-day delivery. The customer can hot-swap the new component with the failed one without interrupting processing. About 95% of all server issues are fixed remotely over the network without ever setting foot in the customer’s computer room. Stratus customer service network spans the globe.
What does high availability require on the customer’s end?
That’s the beauty of it. Essentially, you pull it out of the box, load your Windows or Linux application as-is, and connect the server to the service network. Some customers describe ftServer as the best employee they have. It takes care of itself, works 24/7, and never takes a break, goes on vacation, or asks for raise.
Does the application vendor need to make any specific changes?
Any standard Windows or Linux application runs on an ftServer without software modification or failover scripting, because there is no failover. When you look at an ftServer, you see one cabinet. Inside is the equivalent of two physical servers, each one a standard 2U (3.5”) high and connected to a backplane.
Don’t mistake this for a cluster. It’s not. These two units operate in complete lockstep, doing the same thing at the same time. In fact, they run together so precisely that only one copy of the operating system and one copy of the application are is needed. The software “sees” only one server, like it was running on any industry standard server.
That’s why if a processor or disk or fan or whatever quits, its companion part is already on the job. The server takes over by isolating the broken part, restarting it if possible, and if not, reporting back to the service center that something needs attending to. Compare that to the work that goes into building a high-availability cluster, writing failure scripts, testing it and retesting it every time a change is made to the configuration, and the heavy dose of staff time that goes into managing it on an ongoing basis.
Why would I want to buy from Stratus as opposed to going through Dell, HP, or IBM?
None of those companies offer a five-nines server. A high-availability cluster, yes, but not a continuous availability system. With one exception — you can actually buy a Stratus server from Dell.
Dell partnered with Stratus in the E911 market because they didn’t have a system with the uptime reliability essential for applications like this. To enhance their product portfolio, Dell is now a Stratus reseller in the public safety marketplace. Over time, that has expanded to include server sales into healthcare.
So you don’t compete with many companies.
Right. In the big scheme of things, critical applications represent a relatively small percentage of all applications in a company, hospital, or government agency. Other companies simply did not invest the time and money to develop this specialized technology.
HP does have a fault-tolerant system, but this solution is typically focused on the HPUX and IBM AIX space, whereas Stratus is focused on the WINTEL space. A clustered solution is as close as other vendors can get to providing higher levels of availability.
You touched on this, but explain more about the diagnostics and problem-alerting capabilities built into the system.
ftServers can diagnose down to the component level. We know whether it’s the NIC card or the memory or the I/O. It’s not like when your “check engine” light goes on, which is kind of a vague thing. We’re able to get down to the component level to diagnose whether it’s something that has stopped one of the two halves of the server.
Also, since we’re monitoring those servers 24 hours a day, 365 days a year, we’re able to diagnose what we’re calling “intermittent errors.” These errors will bring down a mere mortal server because they can’t sort out if it’s minor, major or catastrophic. Those are errors that are going to be a problem some day.
If they happen often enough and they hit certain levels, we’ll “shoot” the half of the server that’s experiencing the problem, while the other half continues to run. If you see a server do things that might be problematic in the future, you may be able to solve those problems before they become catastrophic. It’s sort of preventative care versus episodic care in the healthcare world.
Explain the degree of sophistication in the monitoring. For example, does it have its own first resolution problem reporting; does it automatically know that you need to order this particular board and actually start the ordering process, or does it take manual intervention to read all the signs?
All the above. We have multiple ways to have our systems talk back to our call centers around the world to diagnose.
This is at the core of what makes us different. We don’t tell a customer that their premium service contract means we will respond within four hours. Who can wait four hours to get an important application back online? We reliably distinguish software issues from hardware issues and can see problem resolutions. We are able to do component level diagnostics which other vendors just do not do over the phone.
For customers that require it, like banks, military, or healthcare, we can program in secure access levels to get into organizations in order to communicate with our servers without comprising system or information security. And we have various and redundant communication avenues.
How is server virtualization affecting healthcare?
Every CIO I’ve talked to in the last couple of months is doing some kind of virtualization. It’s pervasive. They all have tremendous costs pressures and they see virtualization as giving them the ability to lower their overall hardware footprint, the overall power and cooling, and possibly the number of people managing servers.
We’ve talked to a small children’s hospital with a couple of hundred beds. They have more servers than beds. The time of one server for every application is behind us. But here is the rub. If you have five servers, each running one departmental applications, and one of those servers goes down, you’ve only affected one application and one department. But virtualized, those five applications are loaded on one physical server. The consequence of server downtime is more dire. Stratus ftServer offers the same uptime advantages to virtual machines that it does for a single application.
Virtualization provides hospitals with another big benefit. It makes disaster recovery more attainable. If you have less hardware and require less attention from IT staff to manage them, then that frees up both financial and human resources to design a DR plan. Implementing a DR strategy with virtualization is more affordable and manageable than attaining the same ability to recover by traditional methods.
Are you offering any type of disaster recovery services?
More than that. We have a solution services group that can assist with everything from availability assessment of an IT organization to designing and implementing a complete availability solution. We can even manage that solution for the hospital.
Our services are built around a framework we call CALM: Continuous Availability Lifecycle Management. Using the CALM methodology, we help IT staff evaluate existing systems, including a disaster recovery solution; establish end-to-end performance metrics; and monitor and manage systems so they achieve those metrics.
In the DR space specifically, we partner with a company called Double Take, which is in the business of moving data from a primary server offsite and putting it somewhere else. DoubleTake is moving saved data to a back-up site every minute or every second. The client defines the parameters. Then, if there is a catastrophic event and your data center is out of business, the secondary site has all the data.
How do you project ROI?
That’s a question that many IT departments don’t consider carefully enough. They only look at the hardware and service price and give little consideration to the other cumulative costs over 18 to 36 months and what the total cost of ownership will be.
Let’s break that down. First, we are talking about critical applications you cannot afford to be without. What is the cost of downtime? $10,000 an hour? $50,000 an hour? Now the difference between a server that will average five minutes of downtime a year versus one that will average one, five, or 10 hours of downtime becomes very meaningful.
We’ve also talked about simplicity. You plug in Stratus servers and they just work. You don’t have to build a cluster, have multiple software licenses, modify applications, write custom scripts, or repeatedly test the system to be sure it will do what it is supposed to when something goes wrong.
We talked about service. What is the value of being able to fix a problem remotely in just a few minutes versus waiting four or more hours for the repair technician to show up? The bottom line is that keeping critical applications running is what Stratus does. No other company dedicates itself to that mission.
Any other areas you’d like to touch on?
Storing data is a huge and growing concern. In addition to EMC products, Stratus offers its own mid-range data storage device called ftScalable Storage. Like our servers, our storage array is built to provide continuous availability. It can also call home to report issues. It also features green, non-battery backup and high-efficiency cache mirroring.
Obviously we can’t only work with the storage products we offer because customers may have IBM’s Shark or a variety of storage devices. We have to be able to work with those. We have a process called the FT-ready program with a list of supported devices. Over time we’ve battle tested many other devices.
What’s interesting is that we’ve actually made different vendors’ storage devices better because of the punishing testing regimens we use. We can uncover bugs in hardware and software that normal testing methods would never reveal. The entire solution – hardware, application, OS, other devices – is more reliable and trustworthy as a result.
Do you have anything new coming up?
We’ll be at HIMSS with exciting announcements. In the meantime, I will tell you that every CIO we’re talking to either has thought about virtualization or is currently involved in a virtualization project. People are virtualizing their test servers and tier three or four applications that don’t really matter if they go down. But for certain, they’re going to be quickly moving up to their tier one and two applications. That’s where we want to be. We’re partnering with VMware and its biggest system integrators to work with them in healthcare.
I just want to stress that Stratus Technologies is a different animal. We’re not a me-too company. We have solutions that other people don’t have. Essentially, all the major players have the same commodity servers and are arguing over price. They all do funny things on price to be able to be the lowest possible. But no one has what we have. We pride ourselves on being the Availability Experts.
Fast Facts
Product
High availability server technology
Company
Stratus Technologies
U.S. Operations
111 Powdermill Road
Maynard, MA 01754-3409
1.800.STRATUS
www.stratus.com
Notable Customers
Orthopedic Center of St. Louis, Hachioji Gastroenterology Hospital, The Ruwaard van Putten Hospital
The Bottom Line
- Fault-tolerant servers from Stratus are essential when caregivers rely on clinical applications that can’t go down without potentially disastrous consequences.
- System uptime = CIO security blanket.
- High availability is a relatively new concept to healthcare, but not to Stratus, which has been delivering critical information system technologies for 25 years.
Article Reprint








No comments yet.