Friday, April 29, 2011

Post Mortem: What Happened During the Amazon Outage

Thursday morning, 4/21/2011, at 5:26AM CDT, we intended to very briefly take our site down for a routine maintenance task. Had all gone as planned, the site would have been down for only a few seconds while we applied a patch to the front page of the site. Unfortunately, not everything went as planned.

We use the Amazon AWS cloud hosting service because they are very stable, and keep things running much better than any other hosting company we've worked with before. They also make multiple copies of every bit of data we save, which is normally a very good thing (it would take many many hard drives failing simultaneously for any student data to be lost). See more about our choice in this post.

That morning, at 2:47AM CDT, Amazon performed their own routine maintenance on their AWS cloud hosting service. They've posted their own post mortem, but it all boils down to a couple key points:

  • During the maintenance, someone made a typo, pointing some of their servers to the wrong back up location. This caused the copying of the data to fail, so, each time something changed on a website, that website and the drives it was on locked up. It also caused the system that lets us restart our server to fail, causing our site to stay stuck offline during our few-second switch.
  • Since the servers weren't able to reach the drives they thought they should reach while copying, they reported that something was wrong with those drives. If something is wrong with a drive, Amazon automatically takes that drive offline until they can inspect it and see what was wrong. Many, many drives were reported as broken, so all of those drives went offline. Amazon didn't have enough reserve capacity to handle this drive outage, causing their servers to run out of space.
We were able to get to our data, copy it to another server, and relaunch, but it took a long time for us to do so. Since we keep track of what students enter every time they answer, we have a very large amount of data, and just downloading it and uploading it takes several hours in each direction. We're working to make that less of an issue (see below).

Amazon is still working to make things more bulletproof, but they've already done a lot to prevent these problems from happening in the future. 
  • They're implementing more automation and other safeguards to stop the typo from occurring in the first place.
  • They've already added a lot of reserve capacity, and are adding more. My guess is hard drive salespeople in Virginia, where the servers are located, made a lot of money that day selling high-capacity drives to Amazon.
We are also taking our own steps to avoid similar issues in the future. By the Fall semester, we hope to mirror our data to multiple servers in both the US and Canada, so that, if another Amazon outage occurs, we can quickly move our site to one of these other locations with very minimal downtime.

We have heard of other homework systems being down for multiple days at a time, and we consider that absolutely unacceptable. It took Amazon going down to take us out for those 13 hours, and we don't plan to let that happen again.


  1. I think your blog will easily to reach the correct market place, because its having the useful information and i got a good knowledge to read your informational post.

    Hospitals in Bangalore

  2. I love your blog.. very nice colors & theme. Did you design this website yourself or did you hire someone to do it for you? Plz respond as I'm looking to design my own blog and would like to know where u got this from. thanks a lot singapore web developer

  3. you are really a good webmaster. The website loading speed is amazing. It seems that you're doing any unique trick. Moreover, The contents are masterwork. you've done a excellent job on this topic!
    carbon interactive

  4. คาสิโนออนไลน์ที่น่าเชื่อถือและมีความเป็นมืออาชีพที่สุดในตอนนี้
    โปรโมชั่นGclub ของทางทีมงานตอนนี้แจกฟรีโบนัส 50%
    เพียงแค่คุณสมัคร สล็อตออนไลน์ กับทางทีมงานของเราเพียงเท่านั้น
    สมัครสล็อตออนไลน์ >>> Goldenslot
    สนใจร่วมสนุกกับ คาสิโนออนไลน์ คลิ๊กได้เลย
    มีทั้งคาสิโนออนไลน์ หวยออนไลน์ ฟุตบอลออนไลน์ สล็อตออนไลน์ และอื่นๆอีกมากมาย

  5. The Best Rummy Experience gets even better on your Android mobile & tablet. To start playing your favourite Rummy Game, you need to install the app. Download rummy App now to your Android device.

  6. Whoa! This blog looks just like my old one! It's on a entirely different subject but it has pretty much the same page layout and design. Excellent choice of colors! internet marketing

  7. if you want to learn more things for the setup then you can welcome to my virtual world where i can share my thoughts.

  8. The experts of BookMyEssay have the habit of delivering 100% unique Operations Management Assignment Help services at a Low-cost in Australia. It is effortless to use BookMyEssay for getting any solution from the experts for academic difficulties.

  9. There are various printer issues too that most users experience when they attempt to print, fax, or sweep with their Epson printer. By then, Troubleshooting Epson Printer issues is ideal to deal with twoly. To start with, analyze the issue and afterward apply a powerful answer for resolve the printing issue. Nonetheless, there are different sorts of issues identified with the Epson printer investigating that can emerge when one utilizing it to print, and sadly, he/she neglects to figure everything out totally. Epson printer troubleshooting

  10. Thanks for sharing such informative blog.If anyone wants Law Assignment Help in australia the they can directly get in touch with BookMyEssay.

  11. I've done my CDR already, but I am not sure that if it will pass the initial stage or not, and I want to resubmit this. Please assure me that Coursework Writing Services will really help me in this task.

  12. ngobrol games Tanks play a very important role. He must both safeguard the heart and become a hero if damage from the adversary is to be absorbed. gamesorbit Since a team wins the war easy without a good tank. We will give you with the 5 greatest tank heroes in 2021.

  13. Honey can be used to get rid of a distended stomach because honey contains ingredients that will burn the body indirectly or commonly called fructose. Ngobrol Sehat And ginger can suppress appetite naturally. Generally women would want a sexy body with a small stomach. That's why the way to reduce the stomach naturally for women is so important. Ngobrol Sehat It is recommended to drink a smoothie in the morning, because it can increase energy as well as a way to reduce the stomach naturally. Watermelon smoothies are often eaten because this fruit is rich in amino acids which have been proven to be beneficial in reducing body fat and muscle mass.


  14. تقوم الشركه بنقل الأثاث داخل شاحنات مغلقه لضمان اقصى حماية للأثاث اثناء عملية النقل , فالشركه لديها اسطول كبير من السيارات التى تضمن السرعه والجوده فى نقل الأثاث من والى جمع انحاء المملكه, والشركه ايضاً لديها طاقم من افضل الفنيين والعمال المتخصصين فى نقل الأثاث وتغليفه لحمايته الصدمات والتلف فالشركه هدفها الأول والأخير هو رضاء عملائها
    شركة نقل عفش بالدمام
    عندما نتحدث عن نقل العفش فنحن امام مهمه صعبه جدا على بعض الاشخاص لان بعض الاشخاص يمتلكون العفش الكثير ممما يحتاج الى تغليف وتعبيئه ونقل بشكل مثالى حتى لا يكون فيه خدوش او تلفيات او تكسير ومن هنا تظهر خدماتنا فى القيام بنقل عفش مثالى دون اى تعب او مجهود او تكاليف باهظه
    الاستعانة بأصحاب الخبرات دائما ما يسهل الكثير من الأمور عن القيام بالأعمال المختلفة وخاصة في المجالات التي ترتبط بالمنزل والمعيشة وراحة أفراد المنزل، لذلك عندما يحين وقت الانتقال من المنزل لابد من التخطيط لهذه المرحلة جيداً وتجهيز المنزل بخطوات معينة و الاعتماد على أفضل شركات نقل عفش وهي شركة زهرة الشرقية لنقل العفش
    شركة نقل عفش بالقطيف

    نقل الأثاث الثقيل ليس بالأمر السهل على الإطلاق ، وحتى أقل من ذلك عندما يتعين عليك تحريكه لأعلى الدرج! سواء كنت تنتقل إلى منزل جديد في طابق أعلى في شقة لا تحتوي على مصعد أو تحتاج إلى نقل بعض الأثاث إلى طابق آخر في المنزل ، فمن المهم استخدام الأسلوب الصحيح للقيام بذلك بأمان. اتخذ بعض الإجراءات لحماية الأثاث ومنزلك ونفسك من التلف والإصابة قبل أن تبدأ في نقل أي شيء. احمل قطعًا كبيرة على الدرج باستخدام مساعد أو استخدم عربة لنقل القطع الثقيلة الصغيرة والمتوسطة الحجم بمفردك بينما يكتشفك شخص ما من الأسفل. استفد من المعدات والتقنيات المتخصصة لتسهيل رفع بعض العناصر ووضعها في مكانها.
    تقديم خدمة نقل الأثاث بأفضل خدمة وأقل تكلفة وأسرع وقت وملائم للعملاء لتنفيذ عملية نقل الأثاث وتوفير المال والوقت للعملاء. جميع خدمات شركتنا بمعنى الخدمات الشاملة.

    شركة نقل عفش بالاحساء

  15. This comment has been removed by the author.

  16. Plan and book your Maldives Packages, best case scenario, cost with Travel Triangle. Snap presently to get select arrangements and offers on maldives tour packages with airfare, local area expert, lodging and touring.

  17. Are you looking for Uttarakhand tour package » Find Discounts on the Best Uttarakhand Packages? Travel Triangle Helps You Spend Less. We are providing the best Uttarakhand tour packages from India. This Uttarakhand tour package is the perfect combination of beaches and culture.

  18. Thank you for sharing such a great bog it is very

  19. Take instant support from BookMyEssay Support to solve any query or problem that you are facing with your Expository Essay Help online . Explore more on BookMyEssay online support

  20. Chipotle Menu Prices – Chipotle is a fast-casual dining restaurant mainly serves Mexican-inspired foods such as tacos and burritos. Chipotle Menu Prices

  21. You can also know more about how to reset Natural Gas Login password and many more.
    Natural Gas Login