-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add common logic for creating and cleaning buckets in tests #277
base: master
Are you sure you want to change the base?
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
###################################################################### | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why is this directory named with a leading underscore? I don't believe I saw this notation anywhere else. Have you considered doing another package, if that was supposed to be installed with There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. so it also seems to me that creating a separate package would be a cleaner approach, something like There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. After a moments's thought, I concluded that we could actually go down the most common path I've encountered, which is simply package the test in the main package, so it would be b2sdk/test. Numpy, pandas or django use such a patter. What do you think? |
||
# | ||
# File: b2sdk/_test_manager/__init__.py | ||
# | ||
# Copyright 2023 Backblaze Inc. All Rights Reserved. | ||
# | ||
# License https://www.backblaze.com/using_b2_code.html | ||
# | ||
###################################################################### |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,188 @@ | ||
###################################################################### | ||
# | ||
# File: b2sdk/_test_manager/api.py | ||
# | ||
# Copyright 2023 Backblaze Inc. All Rights Reserved. | ||
# | ||
# License https://www.backblaze.com/using_b2_code.html | ||
# | ||
###################################################################### | ||
import contextlib | ||
import random | ||
import string | ||
import time | ||
|
||
from datetime import datetime | ||
from os import environ | ||
from typing import Union | ||
|
||
import backoff | ||
|
||
from .bucket_tracking import BucketTrackingMixin | ||
from .._v3.exception import BucketIdNotFound as v3BucketIdNotFound | ||
from ..v2 import NO_RETENTION_FILE_SETTING, B2Api, Bucket, InMemoryAccountInfo, InMemoryCache, LegalHold, RetentionMode | ||
from ..v2.exception import BucketIdNotFound, DuplicateBucketName, FileNotPresent, TooManyRequests, NonExistentBucket | ||
|
||
ONE_HOUR_MILLIS = 60 * 60 * 1000 | ||
ONE_DAY_MILLIS = ONE_HOUR_MILLIS * 24 | ||
|
||
BUCKET_NAME_LENGTH = 50 | ||
BUCKET_NAME_CHARS = string.ascii_letters + string.digits + '-' | ||
|
||
BUCKET_NAME_PREFIX = 'b2tst' | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is there a hard limit on the length of the bucket name that we're hitting? If not, why not name it There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. currently after using the uuid it will normally be |
||
|
||
# RUNNER_NAME is the only variable exposed by the GitHub CI that was changing for each matrix entry. | ||
# Example values are "GitHub Actions N" (with N being a whole number, starting from 2) and "Hosted Agent". | ||
# Here, we're using these names as long as time as seeds to start the random number generator. | ||
# Name fraction is used for runners inside the same matrix, time fraction is used for runners in different runs. | ||
# To avoid collision when the same runners are fired in different commits at the same time we also use GITHUB_SHA | ||
random.seed( | ||
environ.get('RUNNER_NAME', 'local') + environ.get('GITHUB_SHA', 'local') + str(time.time_ns()) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How about we skip all that and use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yes in the new version I got rid of seed and used uui4, we will see how it goes |
||
) | ||
|
||
|
||
def generate_bucket_name() -> str: | ||
suffix_length = BUCKET_NAME_LENGTH - len(BUCKET_NAME_PREFIX) | ||
suffix = ''.join(random.choice(BUCKET_NAME_CHARS) for _ in range(suffix_length)) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. When thinking about it... Wouldn't
What do you think? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. all in all, I assumed in advance that the uuid had been tested before and there were some problems with it :) But yes I would give uuid a chance There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sadly, I cannot neither confirm nor dismiss this assumption. Let's try with |
||
return f"{BUCKET_NAME_PREFIX}{suffix}" | ||
|
||
|
||
def current_time_millis() -> int: | ||
return int(round(time.time() * 1000)) | ||
|
||
|
||
class Api(BucketTrackingMixin, B2Api): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How about a name that would reflect the information that it's only for testing purposes? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
""" | ||
B2Api wrapper which should only be used for testing purposes! | ||
""" | ||
|
||
def __init__(self, account_id: str, application_key: str, realm: str, *args, **kwargs): | ||
info = InMemoryAccountInfo() | ||
cache = InMemoryCache() | ||
super().__init__(info, cache=cache, *args, **kwargs) | ||
self.authorize_account(realm, account_id, application_key) | ||
|
||
@backoff.on_exception( | ||
backoff.constant, | ||
DuplicateBucketName, | ||
max_tries=8, | ||
) | ||
def create_test_bucket(self, bucket_type="allPublic", **kwargs) -> Bucket: | ||
bucket_name = generate_bucket_name() | ||
print('Creating bucket:', bucket_name) | ||
try: | ||
return self.create_bucket(bucket_name, bucket_type, **kwargs) | ||
except DuplicateBucketName: | ||
self._duplicated_bucket_name_debug_info(bucket_name) | ||
raise | ||
|
||
@backoff.on_exception( | ||
backoff.expo, | ||
TooManyRequests, | ||
max_tries=8, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Wouldn't it be better to extract file removal as a function/method and put a backoff on it? What do you think? |
||
) | ||
def clean_bucket(self, bucket: Union[Bucket, str]) -> None: | ||
if isinstance(bucket, str): | ||
bucket = self.get_bucket_by_name(bucket) | ||
|
||
files_leftover = False | ||
file_versions = bucket.ls(latest_only=False, recursive=True) | ||
|
||
for file_version_info, _ in file_versions: | ||
if file_version_info.file_retention: | ||
if file_version_info.file_retention.mode == RetentionMode.GOVERNANCE: | ||
print('Removing retention from file version:', file_version_info.id_) | ||
self.update_file_retention( | ||
file_version_info.id_, file_version_info.file_name, | ||
NO_RETENTION_FILE_SETTING, True | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I know it probably came from previous version, but would you mind using |
||
) | ||
elif file_version_info.file_retention.mode == RetentionMode.COMPLIANCE: | ||
if file_version_info.file_retention.retain_until > current_time_millis(): # yapf: disable | ||
print( | ||
'File version: %s cannot be removed due to compliance mode retention' % | ||
(file_version_info.id_,) | ||
) | ||
files_leftover = True | ||
continue | ||
elif file_version_info.file_retention.mode == RetentionMode.NONE: | ||
pass | ||
else: | ||
raise ValueError( | ||
'Unknown retention mode: %s' % (file_version_info.file_retention.mode,) | ||
) | ||
if file_version_info.legal_hold.is_on(): | ||
print('Removing legal hold from file version:', file_version_info.id_) | ||
self.update_file_legal_hold( | ||
file_version_info.id_, file_version_info.file_name, LegalHold.OFF | ||
) | ||
print('Removing file version:', file_version_info.id_) | ||
try: | ||
self.delete_file_version(file_version_info.id_, file_version_info.file_name) | ||
except FileNotPresent: | ||
print( | ||
'It seems that file version %s has already been removed' % | ||
(file_version_info.id_,) | ||
) | ||
|
||
if files_leftover: | ||
print('Unable to remove bucket because some retained files remain') | ||
else: | ||
print('Removing bucket:', bucket.name) | ||
try: | ||
self.delete_bucket(bucket) | ||
except BucketIdNotFound: | ||
print('It seems that bucket %s has already been removed' % (bucket.name,)) | ||
print() | ||
|
||
def clean_buckets(self) -> None: | ||
for bucket in self.buckets: | ||
with contextlib.suppress(BucketIdNotFound, v3BucketIdNotFound, NonExistentBucket): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Per chance, could you explain to me the difference between |
||
self.clean_bucket(bucket) | ||
self.buckets = [] | ||
|
||
def clean_all_buckets(self) -> None: | ||
buckets = self.list_buckets() | ||
print(f'Total bucket count: {len(buckets)}') | ||
|
||
for bucket in buckets: | ||
if not bucket.name.startswith(BUCKET_NAME_PREFIX): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we really want an operation like that? What about other runners running in parallel on the same set of keys? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As I understand it, this function is intended as a kind of garbage collector after all the tests have been performed in order to collect the "leftovers". You are right about if several commits run the tests there may be unwanted deletions. It seems to me that it is enough here to add the first 10-15 characters from the SHA of the commit to our |
||
print(f'Skipping bucket removal: "{bucket.name}"') | ||
continue | ||
|
||
print(f'Removing bucket: "{bucket.name}"') | ||
try: | ||
self.clean_bucket(bucket) | ||
except (BucketIdNotFound, v3BucketIdNotFound): | ||
print(f'It seems that bucket "{bucket.name}" has already been removed') | ||
|
||
buckets = self.list_buckets() | ||
print(f'Total bucket count after cleanup: {len(buckets)}') | ||
for bucket in buckets: | ||
print(bucket) | ||
|
||
def count_and_print_buckets(self) -> int: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. A simple search provides me with no usages of this function. Can you point me to an actual usage, please? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It was intended to be a bit of statistic in total as you write in the comment below also we can put this function into |
||
buckets = self.list_buckets() | ||
count = len(buckets) | ||
print(f'Total bucket count at {datetime.now()}: {count}') | ||
for i, bucket in enumerate(buckets, start=1): | ||
print(f'- {i}\t{bucket.name} [{bucket.id_}]') | ||
return count | ||
|
||
def _duplicated_bucket_name_debug_info(self, bucket_name: str) -> None: | ||
# Trying to obtain as much information as possible about this bucket. | ||
print(' DUPLICATED BUCKET DEBUG START '.center(60, '=')) | ||
bucket = self.get_bucket_by_name(bucket_name) | ||
|
||
print('Bucket metadata:') | ||
bucket_dict = bucket.as_dict() | ||
for info_key, info in bucket_dict.items(): | ||
print('\t%s: "%s"' % (info_key, info)) | ||
|
||
print('All files (and their versions) inside the bucket:') | ||
ls_generator = bucket.ls(recursive=True, latest_only=False) | ||
for file_version, _directory in ls_generator: | ||
# as_dict() is bound to have more info than we can use, | ||
# but maybe some of it will cast some light on the issue. | ||
print('\t%s (%s)' % (file_version.file_name, file_version.as_dict())) | ||
|
||
print(' DUPLICATED BUCKET DEBUG END '.center(60, '=')) |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,32 @@ | ||
###################################################################### | ||
# | ||
# File: b2sdk/_test_manager/bucket_tracking.py | ||
# | ||
# Copyright 2023 Backblaze Inc. All Rights Reserved. | ||
# | ||
# License https://www.backblaze.com/using_b2_code.html | ||
# | ||
###################################################################### | ||
from b2sdk.v2 import Bucket | ||
|
||
|
||
class BucketTrackingMixin: | ||
""" | ||
Mixin class for B2Api, which enables bucket tracking. | ||
This mixin will add a `buckets` member to the B2Api instance and will use it track created and | ||
deleted buckets. The main purpose of this are tests -- the `buckets` member can be used in test | ||
teardown to ensure proper bucket cleanup. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How about adding a statistic of how many buckets in total is created in tests and how many buckets exists at once when tests are ran? This could help us to determine whether we can use |
||
""" | ||
|
||
def __init__(self, *args, **kwargs): | ||
self.buckets = [] | ||
super().__init__(*args, **kwargs) | ||
|
||
def create_bucket(self, name: str, *args, **kwargs) -> Bucket: | ||
bucket = super().create_bucket(name, *args, **kwargs) | ||
self.buckets.append(bucket) | ||
return bucket | ||
|
||
def delete_bucket(self, bucket: Bucket): | ||
super().delete_bucket(bucket) | ||
self.buckets = [b for b in self.buckets if b.id_ != bucket.id_] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I know there's only 100 buckets tops. But I still find this total rewrite of a list kinda ugly... But maybe that's just me? |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
backoff>=1.4.0 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How about adding also an upper limit on the library version? |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -108,7 +108,10 @@ def read_requirements(extra=None): | |
# dependencies). You can install these using the following syntax, | ||
# for example: | ||
# $ pip install -e .[dev,test] | ||
extras_require={'doc': read_requirements('doc')}, | ||
extras_require={ | ||
'doc': read_requirements('doc'), | ||
'dev': read_requirements('dev') | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Add a comma on the end of this line, please. This way, if another line is added, this line is not changed. I believe it's explained in the handbook. |
||
}, | ||
setup_requires=['setuptools_scm<6.0'], | ||
use_scm_version=True, | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -13,10 +13,7 @@ | |
|
||
import pytest | ||
|
||
from b2sdk.v2 import current_time_millis | ||
from b2sdk.v2.exception import DuplicateBucketName | ||
from .bucket_cleaner import BucketCleaner | ||
from .helpers import GENERAL_BUCKET_NAME_PREFIX, BUCKET_NAME_LENGTH, BUCKET_CREATED_AT_MILLIS, bucket_name_part, authorize | ||
from b2sdk._test_manager.api import Api | ||
|
||
|
||
class IntegrationTestBase: | ||
|
@@ -26,30 +23,11 @@ def set_http_debug(self): | |
http.client.HTTPConnection.debuglevel = 1 | ||
|
||
@pytest.fixture(autouse=True) | ||
def save_settings(self, dont_cleanup_old_buckets, b2_auth_data): | ||
type(self).dont_cleanup_old_buckets = dont_cleanup_old_buckets | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I believe that this flag had some logic behind it. Could you tell me, in short, if the new code "has something similar" or was it just not needed, please? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. having changed the logic for removing bucket (using mixin) it seem to me that we no longer need to keep this flag |
||
type(self).b2_auth_data = b2_auth_data | ||
|
||
@classmethod | ||
def setup_class(cls): | ||
cls.this_run_bucket_name_prefix = GENERAL_BUCKET_NAME_PREFIX + bucket_name_part(8) | ||
kkalinowski-reef marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
@classmethod | ||
def teardown_class(cls): | ||
BucketCleaner( | ||
cls.dont_cleanup_old_buckets, | ||
*cls.b2_auth_data, | ||
current_run_prefix=cls.this_run_bucket_name_prefix | ||
).cleanup_buckets() | ||
|
||
@pytest.fixture(autouse=True) | ||
def setup_method(self): | ||
self.b2_api, self.info = authorize(self.b2_auth_data) | ||
|
||
def generate_bucket_name(self): | ||
return self.this_run_bucket_name_prefix + bucket_name_part( | ||
BUCKET_NAME_LENGTH - len(self.this_run_bucket_name_prefix) | ||
) | ||
def setup_method(self, b2_auth_data, realm): | ||
self.b2_api = Api(*b2_auth_data, realm) | ||
self.info = self.b2_api.account_info | ||
yield | ||
self.b2_api.clean_buckets() | ||
|
||
def write_zeros(self, file, number): | ||
line = b'0' * 1000 + b'\n' | ||
|
@@ -60,32 +38,4 @@ def write_zeros(self, file, number): | |
written += line_len | ||
|
||
def create_bucket(self): | ||
bucket_name = self.generate_bucket_name() | ||
try: | ||
return self.b2_api.create_bucket( | ||
bucket_name, | ||
'allPublic', | ||
bucket_info={BUCKET_CREATED_AT_MILLIS: str(current_time_millis())} | ||
) | ||
except DuplicateBucketName: | ||
self._duplicated_bucket_name_debug_info(bucket_name) | ||
raise | ||
|
||
def _duplicated_bucket_name_debug_info(self, bucket_name: str) -> None: | ||
# Trying to obtain as much information as possible about this bucket. | ||
print(' DUPLICATED BUCKET DEBUG START '.center(60, '=')) | ||
bucket = self.b2_api.get_bucket_by_name(bucket_name) | ||
|
||
print('Bucket metadata:') | ||
bucket_dict = bucket.as_dict() | ||
for info_key, info in bucket_dict.items(): | ||
print('\t%s: "%s"' % (info_key, info)) | ||
|
||
print('All files (and their versions) inside the bucket:') | ||
ls_generator = bucket.ls(recursive=True, latest_only=False) | ||
for file_version, _directory in ls_generator: | ||
# as_dict() is bound to have more info than we can use, | ||
# but maybe some of it will cast some light on the issue. | ||
print('\t%s (%s)' % (file_version.file_name, file_version.as_dict())) | ||
|
||
print(' DUPLICATED BUCKET DEBUG END '.center(60, '=')) | ||
return self.b2_api.create_test_bucket() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't it more like an infrastructure thing? It's for developers, not the actual users of the SDK, right?