Amazon’s cloud service, which provides services to several well-known companies, such as Quora and FourSquare, suffered an outage for more than 24 hours between April 21-22, at its Northern Virginia Data Center. Amazon spreads out its data cloud infrastructure throughout the country, but the clients who were being supported from this center have experienced downtime – meaning their users couldn’t log into the company’s websites. This brings up the question as to whether one of the strongest selling points of an external cloud service such as the Amazon Cloud Service – massive redundancy, is in fact not as strong as advertised.
Well, the issue is more complex than it seems. For one thing, cloud services such as Amazon offer several types of services, with the services that provide bulletproof redundancy costing much more than more bare bones services. Startups use cloud services so they don’t have to invest massive amounts of money in infrastructure and the staff to support it. While it’s understandable that a start up usually opts for a cheaper cloud solution, companies must understand that if they can’t afford lengthy outages, they must pay top dollar to go for a bullet proof service level, with multiple layers of redundancy.
Some are already talking about the future of cloud computing, in view of this well publicized outage. The fact, though, is that the alternative, building and maintaining your own infrastructure, besides being expensive, is susceptible to the same types of outages as that suffered by Amazon. For many small and medium companies, a cloud solution offers a lot more bang for their buck, both in terms of money spent for building and maintaining the IT infrastructure, as well as the deployment time.
Companies can continue with their current and future cloud outsourcing plans, but they must place more emphasis on disaster recovery and contingency plans. Companies should focus on drawing up stronger service level agreements (SLAs) with the cloud providers, with provisions whereby the cloud service provider compensates companies for failure to keep up with the SLA. Although, Amazon is the news currently because of this massive outage, other prominent cloud providers such as Rackspace have also experienced major outages not too long ago.
Although everyone is looking at Amazon right now, it’s the companies who’re using the Amazon cloud service that are really responsible for the ultimate effect of the outage on their business. These companies simply didn’t have a real disaster recovery strategy, by not maintaining an alternative location from where they could’ve resumed their services, when the primary services failed due to Amazon’s outage. It’s noteworthy that Amazon commits itself to a 99.5 uptime in its SLAs- thus, they are still in compliance with their SLAs. If a company needs a 100% uptime, they must establish a true disaster recovery solution by implementing redundant data architecture to protect them against a system downtime such as what occurred here. Fortunately, setting up a true disaster recovery solution in the cloud is just as easy as setting up the primary service – you just have to pony up additional money.
One thought on “The Amazon cloud outage and the future of Cloud Computing”
I read the story from Amazon attributing the outage to human error during a network capacity upgrade; just another example that no technology can truly be capable of 100% uptime. With that said I am sure there will be safeguards in place by Amazon to prevent that particular occurrence again thus strengthening the network. As with any technology combined with human interaction there is always the possibility of something going wrong. I read a subsequent story that Amazon is making good on the outage with a client credit to some degree, I imagine these larger companies that depend on cloud technology will take a strong look at contingency plans for disaster recovery in the future, and you can never be too safe.