For those that don’t know, an Amazon S3 bucket is a Simple Storage Service (S3), that is offered by Amazon Web Services (AWS) – the most popular cloud service in the world. S3 buckets are used by a number of high-profile service providers such as Netflix, Tumblr, and Reddit. They enable people to store large amounts of data at a relatively low cost, provide “99.99% availability”, and are generally easy to manage. However, poorly configured S3 buckets have been the cause of a large number of data breaches.
According to Sky News, two online recruitment firms have exposed more than 200,000 CVs to “anyone who knew the location of the bucket”. An S3 bucket containing details about 31,000 GoDaddy servers was exposed to the public, as was a bucket containing plain text passwords and secret keys for Pocket iNet employees. And these examples are just the tip of the iceberg.
Sure, if a company is planning to store sensitive data in the cloud, we could argue that the onus is on them to ensure that they have read through the documentation, and closely examined the security settings before proceeding. However, many have also argued that Amazon has made it far too easy for users to misconfigure the buckets, thus exposing sensitive data to the public. A common misunderstanding about how to configure S3 buckets relates to the “any authenticated AWS users” setting, which literally means that any authenticated AWS user can access the data in the bucket. Yet, many incorrectly assume that this only applies to those who are authorized to access the account associated with the bucket. Even if the user has an ACL in place, a misconfigured bucket can still leave data exposed to all AWS users, and thus potentially everyone.
Given the popularity of Amazon Web Services, we can safely assume that they have done all they can to address this problem, right? Well, not really. They have made some minor improvements, such as better flagging, monitoring and Identity and Access Management (IAM). However, the buckets are still open to the public by default, and the S3 IAM policies have been criticized for being too complicated, even for experienced users. Why don’t they make the buckets private by default? Because doing so could break existing applications for tens of thousands of customers. As it stands, we must deal with what we’ve got.
In addition to carefully reviewing the security settings of your S3 bucket, which includes ensuring that write access is disabled under the “any authenticated AWS user” group, it is crucial that we are able to detect, alert and respond to anomalous activity associated with our S3 buckets. This includes monitoring changes to security settings and Access Control Lists, as well as monitoring access to sensitive files and folders. There are various third-party solutions which will provide real-time auditing of S3 buckets.
You will also need to ensure that all sensitive data is encrypted. AWS provides native server-side encryption tools which will automatically encrypt data as it is stored in the bucket and will decrypt the data when it is downloaded (assuming the user has access to the decryption key). However, the user will need to enable the default encryption either via the S3 SDK, REST API, or Command Line Interface (CLI), which will require some technical knowledge. Of course, the problem with server-side encryption is that Amazon will have access to the decryption key, which could be a problem for those who are storing highly sensitive data. In such a scenario, the user would be better off using client-side encryption as well.