Amazon Web Services (AWS) had a major outage recently that knocked a number of popular web services offline for almost a day. It turns out that it was caused by ‘human error’, where an AWS engineer was carrying out some routine maintenance and made a typo in his code that impacted a large number of servers that were used for their S3 service (Simple Storage Service). The ability to recover was then delayed as the servers had not been fully restarted in a number of years, so took more time than expected to reboot and come back online.
Amazong use their S3 service as the storage foundation for a range of related Amazon services as well as other, unrelated services. When the storage became and remained unavailable, it impacted all of those services and that is what caused the widespread issues.
Being in the technology business, we recognise that all technology will fail at some point, even if caused by human error. We can sympathise with the position that Amazon found themselves in as they tried to recover from the failure and looked to restore their operations.
Now that a few days have passed and we have had time to look at the issues, one of the key recommendations for you to take notice of is the need to ensure that your cloud storage has suitable redundancy for your needs. It is vital that you have planned for any failure and that you have the systems and processes in place to recover in a time-frame that suits your business. Generally the quicker that you need to recover, the more expensive the solution you need, with complete redundancy being the optimal but most expensive solution. In that case, your users would not even be aware that there has been an issue, either big or small.
Many organisations that use AWS were not impacted because they replicated their storage with other providers. That protected them from the outage and allowed relatively normal business operations to continue.
Ensure that you have redundancy built into your storage solution. Contact us on 0333 123 0360 or contact us online to discover how we can protect you from outages.