-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: implement mock for S3Tables service #8470
base: master
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #8470 +/- ##
==========================================
- Coverage 92.63% 92.63% -0.01%
==========================================
Files 1224 1229 +5
Lines 105922 106394 +472
==========================================
+ Hits 98120 98553 +433
- Misses 7802 7841 +39
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
16f75a4
to
6752429
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @felixscherz, thank you for the PR! Please see my comments. There's quite a few, but I think they're all minor - let me know if you have any questions.
resp = client.get_table_metadata_location( | ||
tableBucketARN=arn, namespace="bar", name="baz" | ||
) | ||
assert "metadataLocation" in resp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use assert metadataLocation == "s3://abc"
here? Or is the exact value different from what we provided?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be exactly that. Your comment had me check the actual AWS implementation again however. They validate the metadata location to make sure it is part of the storage bucket assigned to the table, so I am going to add this check and then update the test as well:)
tests/test_s3tables/test_s3tables.py
Outdated
) | ||
token = resp["versionToken"] | ||
|
||
with pytest.raises(client.exceptions.ConflictException): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be useful to also match on the error, to verify we're throwing the correct error here
tests/test_s3tables/test_s3tables.py
Outdated
s3 = boto3.client("s3", region_name="us-east-2") | ||
|
||
bucket_name = resp["warehouseLocation"].replace("s3://", "") | ||
with pytest.raises(s3.exceptions.ClientError): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here for error matching
tests/test_s3tables/test_s3tables.py
Outdated
|
||
bucket_name = resp["warehouseLocation"].replace("s3://", "") | ||
s3.put_object(Bucket=bucket_name, Key="test", Body=b"{}") | ||
with pytest.raises(s3.exceptions.ClientError): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here for error matching
moto/s3/models.py
Outdated
@@ -1876,13 +1884,25 @@ def create_bucket(self, bucket_name: str, region_name: str) -> FakeBucket: | |||
|
|||
return new_bucket | |||
|
|||
def create_table_storage_bucket(self, region_name: str) -> FakeTableStorageBucket: | |||
# every s3 table is assigned a unique s3 bucket with a random name | |||
bucket_name = f"{str(uuid4())}--table-s3" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should use random.uuid4
here, to ensure the names are deterministic (see the docs)
moto/s3tables/models.py
Outdated
from base64 import b64decode, b64encode | ||
from hashlib import md5 | ||
from typing import Dict, List, Literal, Optional, Tuple, Union | ||
from uuid import uuid4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here - we should be using from moto.moto_api._internal import mock_random as random; random.uuid4()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed both occurrences to use the moto API
|
||
return new_table_bucket | ||
|
||
def list_table_buckets( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use the @paginate
-decorator here? That should simplify the implementation.
See the docs: https://docs.getmoto.org/en/latest/docs/contributing/development_tips/utilities.html#paginator
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that seems a lot cleaner! I changed all the list endpoints accordingly
bucket.namespaces[ns.name] = ns | ||
return ns | ||
|
||
def list_namespaces( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here - ideally we paginate using the decorator, instead of rolling our own logic
@@ -2644,6 +2677,8 @@ def list_objects( | |||
|
|||
MOTO_S3_DEFAULT_MAX_KEYS=5 | |||
""" | |||
if isinstance(bucket, FakeTableStorageBucket): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does AWS throw the same error here?
It seems like a useful feature to let users look at the actual data that was stored, but this is the first time that I'm looking at this service, so maybe I'm missing something obvious 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AWS does throw the same error, I think it has to do with the way they want users to use the service i.e. not rely on list operations. They also disable delete_object which raises the same exception. I think its important to replicate this behaviour since AWS not supporting it can throw a lot of people off:)
key.dispose() | ||
for part in bucket.multiparts.values(): | ||
part.dispose() | ||
s3_backends.bucket_accounts.pop(bucket_name, None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't seem necessary - we're never adding the bucket to bucket_accounts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The bucket is added to bucket_accounts
as part of the FakeBucket.__init__
method that's why we're removing it here. It does need to be part of bucket_accounts
to be able to lookup the actual account the bucket is in I think.
Thank you for the review! I'll start addressing your comments:) |
Hi, this is in regards to #8422.
I used
scripts/scaffold.py
to setup thes3tables
service. I had to modify some URL patterns for the S3 service, as they overlap with s3tables.Supported methods:
create_table_bucket
get_table_bucket
list_table_buckets
delete_table_bucket
create_namespace
delete_namespace
list_namespaces
create_table
get_table
list_tables
delete_table
update_metadata_location
get_metadata_location
rename_table
delete_table_bucket_policy
delete_table_policy
get_table_bucket_maintenance_configuration
get_table_bucket_policy
get_table_maintenance_configuration
get_table_maintenance_job_status
get_table_policy
put_table_bucket_maintenance_configuration
put_table_bucket_policy
put_table_maintenance_configuration
put_table_policy
Currently working on supporting the new type of S3 buckets within the existing S3 backend so that data can be written to/read from S3 tables.