Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: implement mock for S3Tables service #8470

Open
wants to merge 57 commits into
base: master
Choose a base branch
from

Conversation

felixscherz
Copy link
Contributor

@felixscherz felixscherz commented Jan 5, 2025

Hi, this is in regards to #8422.

I used scripts/scaffold.py to setup the s3tables service. I had to modify some URL patterns for the S3 service, as they overlap with s3tables.

Supported methods:

  • create_table_bucket
  • get_table_bucket
  • list_table_buckets
  • delete_table_bucket
  • create_namespace
  • delete_namespace
  • list_namespaces
  • create_table
  • get_table
  • list_tables
  • delete_table
  • update_metadata_location
  • get_metadata_location
  • rename_table
  • delete_table_bucket_policy
  • delete_table_policy
  • get_table_bucket_maintenance_configuration
  • get_table_bucket_policy
  • get_table_maintenance_configuration
  • get_table_maintenance_job_status
  • get_table_policy
  • put_table_bucket_maintenance_configuration
  • put_table_bucket_policy
  • put_table_maintenance_configuration
  • put_table_policy

Currently working on supporting the new type of S3 buckets within the existing S3 backend so that data can be written to/read from S3 tables.

Copy link

codecov bot commented Jan 5, 2025

Codecov Report

Attention: Patch coverage is 92.66504% with 30 lines in your changes missing coverage. Please review.

Project coverage is 92.63%. Comparing base (cfaa898) to head (c7888a1).
Report is 14 commits behind head on master.

Files with missing lines Patch % Lines
moto/s3tables/models.py 90.54% 19 Missing ⚠️
moto/s3tables/exceptions.py 89.47% 6 Missing ⚠️
moto/s3/models.py 76.19% 5 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #8470      +/-   ##
==========================================
- Coverage   92.63%   92.63%   -0.01%     
==========================================
  Files        1224     1229       +5     
  Lines      105922   106394     +472     
==========================================
+ Hits        98120    98553     +433     
- Misses       7802     7841      +39     
Flag Coverage Δ
servertests 27.91% <69.68%> (+0.16%) ⬆️
unittests 92.60% <92.66%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@felixscherz felixscherz marked this pull request as ready for review January 18, 2025 17:34
Copy link
Collaborator

@bblommers bblommers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @felixscherz, thank you for the PR! Please see my comments. There's quite a few, but I think they're all minor - let me know if you have any questions.

resp = client.get_table_metadata_location(
tableBucketARN=arn, namespace="bar", name="baz"
)
assert "metadataLocation" in resp
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use assert metadataLocation == "s3://abc" here? Or is the exact value different from what we provided?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be exactly that. Your comment had me check the actual AWS implementation again however. They validate the metadata location to make sure it is part of the storage bucket assigned to the table, so I am going to add this check and then update the test as well:)

)
token = resp["versionToken"]

with pytest.raises(client.exceptions.ConflictException):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be useful to also match on the error, to verify we're throwing the correct error here

s3 = boto3.client("s3", region_name="us-east-2")

bucket_name = resp["warehouseLocation"].replace("s3://", "")
with pytest.raises(s3.exceptions.ClientError):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here for error matching


bucket_name = resp["warehouseLocation"].replace("s3://", "")
s3.put_object(Bucket=bucket_name, Key="test", Body=b"{}")
with pytest.raises(s3.exceptions.ClientError):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here for error matching

@@ -1876,13 +1884,25 @@ def create_bucket(self, bucket_name: str, region_name: str) -> FakeBucket:

return new_bucket

def create_table_storage_bucket(self, region_name: str) -> FakeTableStorageBucket:
# every s3 table is assigned a unique s3 bucket with a random name
bucket_name = f"{str(uuid4())}--table-s3"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use random.uuid4 here, to ensure the names are deterministic (see the docs)

from base64 import b64decode, b64encode
from hashlib import md5
from typing import Dict, List, Literal, Optional, Tuple, Union
from uuid import uuid4
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here - we should be using from moto.moto_api._internal import mock_random as random; random.uuid4()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed both occurrences to use the moto API


return new_table_bucket

def list_table_buckets(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use the @paginate-decorator here? That should simplify the implementation.

See the docs: https://docs.getmoto.org/en/latest/docs/contributing/development_tips/utilities.html#paginator

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that seems a lot cleaner! I changed all the list endpoints accordingly

bucket.namespaces[ns.name] = ns
return ns

def list_namespaces(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here - ideally we paginate using the decorator, instead of rolling our own logic

@@ -2644,6 +2677,8 @@ def list_objects(

MOTO_S3_DEFAULT_MAX_KEYS=5
"""
if isinstance(bucket, FakeTableStorageBucket):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does AWS throw the same error here?

It seems like a useful feature to let users look at the actual data that was stored, but this is the first time that I'm looking at this service, so maybe I'm missing something obvious 🙂

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AWS does throw the same error, I think it has to do with the way they want users to use the service i.e. not rely on list operations. They also disable delete_object which raises the same exception. I think its important to replicate this behaviour since AWS not supporting it can throw a lot of people off:)

key.dispose()
for part in bucket.multiparts.values():
part.dispose()
s3_backends.bucket_accounts.pop(bucket_name, None)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem necessary - we're never adding the bucket to bucket_accounts

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bucket is added to bucket_accounts as part of the FakeBucket.__init__ method that's why we're removing it here. It does need to be part of bucket_accounts to be able to lookup the actual account the bucket is in I think.

@felixscherz
Copy link
Contributor Author

Thank you for the review! I'll start addressing your comments:)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feat: Support for S3 Tables API
2 participants