Wednesday, 8 June 2011

99.9% uptime?

I just witnessed 2 days a ago when a customer's production site is out of service for 12 hours or longer. And what made it worse is that we couldn't help technically because everything is in the cloud. So if it ever happens to you, here is a list of questions you may want to ask your cloud provider. It is also a list of things you may want to think about before you sign up to the cloud.

* What caused the issue? If something was changed, what was it?
* What testing is in place to ensure that access to the site is not compromised by the change.
* If the testing is in place, why was this not detected?
* What is being done to prevent this happening again?
* Is there any monitoring in place to check uptime?
* How will I be compensated if SLA is breached?
* Is there anything we can do to mitigate the problem?

I also would like to refer to Jeff Adwood's blog post chaos monkey on the same topic.

No comments:

Post a Comment