Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add common logic for creating and cleaning buckets in tests #277

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

kgwizdz-reef
Copy link

module _test_manager has been created from which an Api class containing common logic for creating and deleting buckets can be imported. This way, in B2-CLI there will be no need to duplicate the code related to name generation, creation and deletion of buckets

much of the logic was used from this PR: Fix leaking buckets in integration tests #829

Copy link

@kkalinowski-reef kkalinowski-reef left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, the change is going in the right direction.

CHANGELOG.md Outdated
@@ -6,6 +6,9 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

## [Unreleased]

### Added

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it more like an infrastructure thing? It's for developers, not the actual users of the SDK, right?

@@ -0,0 +1,9 @@
######################################################################

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this directory named with a leading underscore? I don't believe I saw this notation anywhere else. Have you considered doing another package, if that was supposed to be installed with [dev]?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so it also seems to me that creating a separate package would be a cleaner approach, something like b2sdk-test?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After a moments's thought, I concluded that we could actually go down the most common path I've encountered, which is simply package the test in the main package, so it would be b2sdk/test. Numpy, pandas or django use such a patter. What do you think?


def clean_buckets(self) -> None:
for bucket in self.buckets:
with contextlib.suppress(BucketIdNotFound, v3BucketIdNotFound, NonExistentBucket):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per chance, could you explain to me the difference between v2 and v3 versions of this exception? My memory is foggy and I was wondering whether we could just use one of them.

# Name fraction is used for runners inside the same matrix, time fraction is used for runners in different runs.
# To avoid collision when the same runners are fired in different commits at the same time we also use GITHUB_SHA
random.seed(
environ.get('RUNNER_NAME', 'local') + environ.get('GITHUB_SHA', 'local') + str(time.time_ns())

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we skip all that and use str(uuid.uuid4())? I've checked and Python library uses system uuid.h which in turns is said to be good enough to provide random value. This would remove reliance on the clock entirely.
Also, worth noting that SystemRandom would be nice, but I'm not sure whether it's even available on Windows :/
What do you think?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes in the new version I got rid of seed and used uui4, we will see how it goes

BUCKET_NAME_LENGTH = 50
BUCKET_NAME_CHARS = string.ascii_letters + string.digits + '-'

BUCKET_NAME_PREFIX = 'b2tst'

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a hard limit on the length of the bucket name that we're hitting? If not, why not name it b2test?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

currently after using the uuid it will normally be b2test-{uuid4} :) As far as i know the maximum limit is 63 char

@@ -11,7 +11,7 @@
import io
from typing import Optional

from .fixtures import b2_auth_data # noqa
from .fixtures import * # noqa

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we not, please?

for bucket in buckets:
print(bucket)

def count_and_print_buckets(self) -> int:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A simple search provides me with no usages of this function. Can you point me to an actual usage, please?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was intended to be a bit of statistic in total as you write in the comment below also we can put this function into cleanup_buckets function to always track how many buckets have been created when closing tests

@backoff.on_exception(
backoff.expo,
TooManyRequests,
max_tries=8,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be better to extract file removal as a function/method and put a backoff on it? What do you think?

print(f'Total bucket count: {len(buckets)}')

for bucket in buckets:
if not bucket.name.startswith(BUCKET_NAME_PREFIX):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really want an operation like that? What about other runners running in parallel on the same set of keys?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I understand it, this function is intended as a kind of garbage collector after all the tests have been performed in order to collect the "leftovers". You are right about if several commits run the tests there may be unwanted deletions. It seems to me that it is enough here to add the first 10-15 characters from the SHA of the commit to our BUCKET_NAME_PREFIX. This way we will be sure that after all finished runs if there are any buckets left the garbage collector will remove them. What do you think?

Mixin class for B2Api, which enables bucket tracking.
This mixin will add a `buckets` member to the B2Api instance and will use it track created and
deleted buckets. The main purpose of this are tests -- the `buckets` member can be used in test
teardown to ensure proper bucket cleanup.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about adding a statistic of how many buckets in total is created in tests and how many buckets exists at once when tests are ran? This could help us to determine whether we can use -xdist to parallelise tests or to run several workflows in parallel.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants