title
Notes on S3

Notes on S3

S3 Is a global and public service. It's a part of public AWS network
S3 Buckets are region specific, they live in a region.
Bucket's resiliance is a region resiliance.
Bucket name should be globally unique. A name once occupied, can not be reused in any region in any account.
Bucket soft limit: 100 buckets in account upon request it can be increased to 1000
An object size limit is 5TB.
The largest object that can be uploaded in a single PUT is 5 GB. (So for 5TB object we need multiple PUT requests to send 5GB Chunks)
Files are stored as flats, no folders are there.
Folders in UI represented as prefixes. There are no file names. They are keys and values.
The key /jedi/anakin.jpg will represent in UI such a way that Jedi is a folder and anakin.jpg is file.
No limit of number of objects in bucket.

S3 Bucket Policies

Only one bucket policy per bucket, one bucket can have multiple statements.
Combination of IAM Policy + bucket policy applies for authenticated IAM principal, while only bucket policy applies for anonymous access.

Object versioing.

Versioning are disabled by default, can be changed to enabled.
Once enabled can't be changed back to disabled, but can be changed to suspended.
Suspended versioning can be changed back to enabled.

S3 Performance tips

Use multipart upload instead of post request.
Multiplart requires 100 mb of data minimum.

Transfer Accelaration

Use S3 Transfer Accelaration. Which is disabled by default.
Transfer Accelaration uses nearby edge location to transfer the data first and than this Edge location transfers the data to S3 bucket using AWS Infrastructure.
Bucket name must be DNS compatible and can not contain periods.
When Transfer Accelaration is enabled, specific endpoint provided by S3 needs to be used.
Transfer Accelaration state can be changed between Enabled and Suspended. Just like Object versioning, Transfer Accelaration once enabled can not be disabled.

S3 Encryption.

General notes about encryption

In S3 buckets aren't encrypted, only encryption done is on object level.
When a bucket default encryption is mentioned. And Object requires another encryption method. Object encryption method will be prefered.
The communication from client and S3 is 3-way, Client, S3 and S3-Storage.
- S3 is sitting in middle between client and S3-Storage.
- Client sending data directly to S3 which is always encryption in transit.
- S3 storing data to S3-Storage can be encryption at rest.
Two type of encryption. Client side and server side.
Client side encryption: Client encrypts the data from their side before sending object to S3, and S3 sends it to S3-Storage As it is, S3 does not apply encryption to their side when client applies encryption.
Server side encryption: Client sends object to S3 in Plaintext. S3 encrypts it and stores it in S3-Storage using Encryption at rest.

Server side S3 Encryption

Three methods.
- SSE-C (Server Side Encryption with Customer managed key)
- SSE-S3 (Server Side Encryption with S3 Managed key)
- SSE-KMS (Server Side Encryption with CMKs stored in KMS) (CMKs and KMS are mentioned in KMS Document)

SSE-C

Customer manages the key, while S3 manages the process.
Customer sends the key and plaintext to S3.
S3 encrypts the data, attaches the hash of the key and discards key before sending it to S3-Storage.
While decrypting, client sends the key, S3 gets encrypted data and hash. The hash is compared to key for verification. Once verified S3 decrypts the data and discards the key and sends decrypted data back to client.

SSE-S3

S3 manages the master key
Each new object will have it's own key.
Object is encrypted/decrypted by it's own key and key is encrypted/decrypted by master key.
Both are stored side by side.
Role separation is not possible in this encryption.
AES-256 is used as algorithm.
Header: x-amz-server-side-encryption, Value: AES-256.

SSE-KMS

S3 Creates/uses CMK in KMS
This CMK generates DEK (Data Encryption Key) and sends plain text and encrypted key to S3.
S3 uses plain text key to encrypt the data and stores encrypted data + keys to S3-Storage.
When decryption CMK decrypts DEK and decrypted DEK decrypts object itself.
Allows role separation and rotation control.
Offers Envelope Encryption.
Header: x-amz-server-side-encryption, Value: aws:kms.

S3 Object Storage Classes

Six classes

S3 Standard (Default)
S3 Standard IA
S3 One Zone IA
S3 Glacier
S3 Glacier Deep Archive
S3 Intelligent Tiering

IA Stands for Infrequent Access.

S3 Standard.

Default
Replication in 3+ AZs atleast.
Low latency, high throughput.
No minimum, delay or penalty.
Should be default and should have strong reasons to use anything else.
99.99% availability

S3 Standard IA

Less frequent but Rapid Access
Cheaper base storage compared to S3.
128 Kb minimum charge per object. (Object < 128 Kb it still charged as it's 128 Kb)
30 days minimum duration charge per object. (Object stored for one day will be charged for 30 days)
per GB data retrieval fees.
Long term storage, Backups or data store for disaster recovery files.
99.9% availability.

S3 One Zone IA

Same tradeoffs as S3 Standard IA
More cheaper than S3 Standard IA
Data stored only in Single AZ.
Good for secondary copies, easily re-creatable less frequent access data.
99.5% availability.

S3 Glacier

Designed for archival data.
40 Kb minimum charge per object. (Object < 40 Kb it still charged as it's 40 Kb)
90 days minimum duration charge per object. (Object stored for one day will be charged for 90 days)
Expedited retrieval: 1-5 mins
Standard retrieval: 3-5 hours
Bulk retrieval: 5-12 hours
Provisioned Capacity: Ensures retrieval capacity for expedited retrieval is available when needed.
- Each Unit of capacity provides atleast 3 expedited retrieval can be performed every 5 mins and provides upto 150MB/s throughput.

S3 Glacier Deep archive

Almost same as Glacier, it's tape drive replacements for literal long term backups.
180 days minimum duration charge per object. (Object stored for one day will be charged for 180 days)
Expedited retrieval: Not available
Standard retrieval: upto 12 hours
Bulk retrieval: upto 48 hours

Intelligent Tiering

Cost $0.0025 per 1000 objects.
Frequent and Infrequent Access classes.
S3 monitors usage and if an object is not used for 30 days, it's moved to infrequent access class.
Once object in infrequent access is accessed it's moved to frequent access class.

S3 Lifecycle

Set of rules that consist some actions to be performed on bucket or group of objects.
Transition action: Action changes class
Expiration action: Action expires (deletes) the objects.
Transition action is a waterfall model. You can move object from frequent access class to less frequent access class using lifecycle rules but not in the reverse direction

S3 Replication

Cross Region: Replication is done from bucket of one reagin (e.g. ap-southeast-2) to bucket of another region (us-east-1)
Same region: Replication is done between buckets of same regions (This is recent feature)
Replication is not retro-active: Object created before replication is enabled, will not be replicated to new bucket.
Both Source & Destination buckets needs to have versioning on.
One way replication from source to destination.
On replication, encryption using SSE-C (or any other encryption methods where AWS does not have access to key) will not be available.
For other replications, extra key is required.
Owner of the source bucket needs permissions to object being replicated.
No system event, Glacier, Glacier Deep Archive objects can be replicated.
Deletion is not replicated.

Presigned url

User can create url for any object, even if user doesn't have access to said object.
- Though all policies can apply, so even if user can generate the url of the object whose access is not with user, but object can't be accessed because of roles.
When using the url, permissions applies from the identity which generated the url.
When using URL, current permission is being applied. Not the permissions at the time of url being generated.
Generate Presigned URL with IAM principals, not roles. Remember: Roles have temporary credentials, which needs to be renewed, and Presigned URLs can't renew them.

S3 Select And Glacier Select

In S3 or Glacier we interact with full object by default.
So a when we use or download the object, bandwith of full size of the object is used even though we need only parts of it. (E.g. for a 5 gb data dump, we download full 5 gb first and then use part of this data even though the part we need is < 100 mb).
S3/Glacier select lets us use SQL like statements.
And only result of that SQL like statement is returned to user.
Only selected formats.
- CSV, JSON (Along with BZIP2 Compression) or Apache Parquet

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

s3.md

s3.md

Notes on S3

S3 Bucket Policies

Object versioing.

S3 Performance tips

Transfer Accelaration

S3 Encryption.

Server side S3 Encryption

SSE-C

SSE-S3

SSE-KMS

S3 Object Storage Classes

S3 Standard.

S3 Standard IA

S3 One Zone IA

S3 Glacier

S3 Glacier Deep archive

Intelligent Tiering

S3 Lifecycle

S3 Replication

Presigned url

S3 Select And Glacier Select

Files

s3.md

Latest commit

History

s3.md

File metadata and controls

Notes on S3

S3 Bucket Policies

Object versioing.

S3 Performance tips

Transfer Accelaration

S3 Encryption.

Server side S3 Encryption

SSE-C

SSE-S3

SSE-KMS

S3 Object Storage Classes

S3 Standard.

S3 Standard IA

S3 One Zone IA

S3 Glacier

S3 Glacier Deep archive

Intelligent Tiering

S3 Lifecycle

S3 Replication

Presigned url

S3 Select And Glacier Select