The five common mistakes on S3
In general the cloud object store Amazon S3 is pretty straightforward to use, but mistakes are easily made. The service itself is proven to be secure (“security of the cloud”), reliable and performant. However, misconfiguration by the owner of the S3 buckets can give a totally different experience (“security in the cloud”). In this blog the five most common mistakes will be addressed as a learning experience.
1. Leaky S3 buckets
Never configure your S3 bucket to allow direct public access to your S3 buckets. The best-known breaches for applications running on AWS were caused by misconfigured S3 buckets (known cases: Verizon, Securitas, Reindeer, etc). Terabytes of confidential data was exposed by accident, just because of one misconfigured configuration switch. This has led to serious damage while it is easy to avoid. The same applies to wrongly configured resource policies and ACLs, e.g. when using wildcards to grant access.
Do: enable the public access block setting on account level,preventing any buckets from becoming publicly exposed. If you do need to expose data hosted in S3 publicly: configure the S3 bucket as the origin for a CloudFront-distribution, configure origin access identity and only giving access to the bucket for your CloudFront-distribution. This way you will have full control over the security controls such as TLS-configuration, logging, DDoS-protection and WAF. It will also give you superb performance. And last but not least use Cloud Security Posture Management to monitor the configuration state of your AWS envirnoment.
2. Not using encryption
As a best-practice all data shall be encrypted at rest and in transit. This principle also applies to data stored in S3. There is no excuse to not encrypt your data, since S3 is providing transparent encryption. Unencrypted data is sensitive to interception and unauthorized modifications.
Do: protect your data using server-side encryption. Preferably use Server-Side Encryption with AWS KMS keys (SSE-KMS) and use your customer managed KMS keys. This will give you full control over the encryption key (resource policy, key rotation, etc). For cost reduction or performance increase, consider configuring the S3 Bucket Key to reduce the number of calls to the KMS-service. For more info read the blog Reduce AWS Costs with S3 Bucket Keys by coworker Vikas Bange.
3. High costs and poor performance
Poorly designed access patterns can lead to high costs and low performance. The most significant cost metric for S3 is retrieving data. For example when synchronizing data to S3 it is best-practice to use metadata to determine if objects have to be updated, instead of downloading the entire object to compare the source and destination.
Do: use metadata when synchronizing data, use prefixes, implement S3 gateway endpoints, avoid overwriting existing data to S3, configure the right storage classes, use the region closest to the consumers etc. More can be found on the Optimizing performance Guidelines page.
4. Using bucket replication as backup
Replication should not be used for disaster recovery. Replication will not protect against accidentally deleting files, file corruption, ransomware, etc. All actions on the source bucket will be replicated to the destination bucket. This makes it impossible to recover after an incident.
Do: Use AWS Backup to backup non-repreducible data. In return you get typical disaster recovery capabilities such as Point-in-Time Recovery, backup plans and data protection with encryption.
5. Not knowing what data is stored in S3
When your organization grows it will be more difficult to know the classification of objects stored in S3. Teams can easily make mistakes and for example, upload sensitive data to a bucket that has not been configured as a bucket for that data classification. At a large scale it is difficult and time-consuming to exactly know everything.
Do: Use Amazon Macie to discover sensitive data at scale. As a big bonus, Amazon Macie will also report on your security posture (misconfigurations).
The points above are just examples of common mistakes and not intended as a guideline for proper use of Amazon S3. The intent of this blog is to make you aware of the common pitfalls on S3.