Moto uses a single instance of the ...Response class for all requests for some services #6745

dfangl · 2023-08-31T09:18:11Z

Motivation

In LocalStack, we began seeing some strange behavior, especially with s3 buckets, which were not created in the region or accounts the request targets. This leads to test failures, with rather confusing logs.

For example, a bucket in account 123456789012 in region us-east-1 is created twice. Since the create bucket operation is idempotent in us-east-1, it should prove without an error. However, this fails, if some other s3 operation runs at the same time under a different account.

This test can reproduce this issue (it is a test against LocalStack, but the culprit here is indeed moto):

    def test_parallel_bucket_creation(self, aws_client_factory):
        num_threads = 10
        create_barrier = threading.Barrier(num_threads)
        errored = False

        def _create_bucket(runner: int):
            nonlocal errored
            bucket_name = f"bucket-{short_uid()}"
            s3_client = aws_client_factory(
                region_name="us-east-1", aws_access_key_id=f"{runner:012d}"
            ).s3
            create_barrier.wait()
            try:
                s3_client.create_bucket(Bucket=bucket_name)
                s3_client.create_bucket(Bucket=bucket_name)
            except Exception:
                LOG.exception("Create bucket failed")
                errored = True

        thread_list = []
        for i in range(1, num_threads + 1):
            thread = threading.Thread(target=_create_bucket, args=[i])
            thread.start()
            thread_list.append(thread)

        for thread in thread_list:
            thread.join()

        assert not errored

The problem

After some investigation of this issue, which (before above test) was occurring only occasionally, we found the culprit.
For some services (list can be found at the end), moto uses a single ...Response instance for all requests. So, concurrent requests will affect each others state on the setupClass method which is used over all response classes. There are unaffected services, which use the dispatch method, but a rather large part, including S3, is affected.

In S3, for example, these are the problematic lines:

moto/moto/s3/urls.py

Lines 13 to 23 in d3f5e3d

    
           url_paths = { 
        
               # subdomain bucket 
        
               "{0}/$": S3ResponseInstance.bucket_response, 
        
               # subdomain key of path-based bucket 
        
               "{0}/(?P<key_or_bucket_name>[^/]+)$": S3ResponseInstance.ambiguous_response, 
        
               "{0}/(?P<key_or_bucket_name>[^/]+)/$": S3ResponseInstance.ambiguous_response, 
        
               # path-based bucket + key 
        
               "{0}/(?P<bucket_name_path>[^/]+)/(?P<key_name>.+)": S3ResponseInstance.key_response, 
        
               # subdomain bucket + key with empty first part of path 
        
               "{0}/(?P<key_name>/.*)$": S3ResponseInstance.key_response, 
        
           }

They use a single instance, for all requests, which leads to the mentioned behavior.

Impact

The impact is of course only noticeable in parallel scenarios. In our case, we found it out only in terraform tests, which perform a lot of S3 operations for validation, while our lambda provider wanted to create a bucket to store an archive in a different account.
Also, the impact is mostly region/account specific, since this is the state most often used from the ...Response classes. However, other state (like headers) are affected too, but the impact is less noticeable.

Please note that we did not test the behavior against moto itself, but from the import path of urls.py, it seems like moto would be affected equally as LocalStack, if we did not miss something.

Proposed fixes

We have a fix in the pipeline against our moto fork (Fix usage of a single object instances for multiple concurrent requests localstack/moto#70) which I am happy to open here as well. However, we wanted to discuss first if this is the route forward. It basically uses the unbound version of a method as a parameter to a function, which calls this method on a new instance then. It was the least intrusive solution we found quickly, and the type checker should catch wrong usages of the dispatch_method class method. We tested this solution with above test, and it works.
We could choose a version were we do not pass the whole method, but just a string with the method name. Would work the same, but without type checking. However, this would be a tad less "hacky".
A refactoring of the found services to use dispatch. This would be more impactful, and we would basically parse the request url twice, which we might want to avoid.

If you have any more ideas for solution, we would be happy to work on them to get something merged here as well, and remove our downstream fix.

Affected Services

These services:

amp
apigateway
apigatewayv2
appconfig
appsync
awslambda
cloudfront
cognitoidp
databrew
ebs
elastictranscoder
glacier
greengrass
guardduty
instance_metadata
iot
managedblockchain
mq
pinpoint
quicksight
route53
s3
s3control
eventbridgescheduler

use a single instance for all the requests. Unaffected services use the dispatch method. There are some services which use the classmethod dispatch on an instance, but since that does not make a difference (even though it can be confusing), we do not necessarily need to change it.

The text was updated successfully, but these errors were encountered:

bblommers · 2023-08-31T13:08:52Z

Hey @dfangl, thanks for raising this! I had realized the potential problem, and have also seen intermittent failures - I just hadn't connected the dots.

In the long term, I think/hope we can refactor/improve our dispatch method to deal with (most of) these cases. But the method_dispatch-fix sounds like a good and quick solution, so if you want to submit a PR here as well, that would be very welcome.

bblommers added the enhancement label Aug 31, 2023

dfangl mentioned this issue Aug 31, 2023

Fix usage of a single object instances for multiple concurrent requests #6746

Merged

bblommers closed this as completed in #6746 Aug 31, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Moto uses a single instance of the ...Response class for all requests for some services #6745

Moto uses a single instance of the ...Response class for all requests for some services #6745

dfangl commented Aug 31, 2023 •

edited

Loading

bblommers commented Aug 31, 2023

Moto uses a single instance of the ...Response class for all requests for some services #6745

Moto uses a single instance of the ...Response class for all requests for some services #6745

Comments

dfangl commented Aug 31, 2023 • edited Loading

Motivation

The problem

Impact

Proposed fixes

Affected Services

bblommers commented Aug 31, 2023

dfangl commented Aug 31, 2023 •

edited

Loading