Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove field _is_sky_managed for intermediate bucket #4545

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

zpoint
Copy link
Collaborator

@zpoint zpoint commented Jan 8, 2025

See comment

Test plan

smoke test

pytest -s tests/test_smoke.py::test_managed_jobs_intermediate_storage
pytest -s tests/test_smoke.py::TestStorageWithCredentials::test_bucket_sub_path --aws

custom test

(sky) ➜ cat ~/Desktop/hello-sky/work_dir_1.yaml
name: test_workdir_bucket_name_1

workdir: .

resources:
  cloud: aws
  instance_type: t3.small

file_mounts:
  # this will use the user config
  /checkpoint:
    name: zpoint-filemounts-bucket
    source: ~/Desktop/dir1
    mode: MOUNT
    store: azure

  # these will all use same bucket configured in ~/.sky/config.yaml jobs->bucket now for bucket storage
  /dir1: ~/Desktop/dir1
  /dir2: ~/Desktop/dir2
  /dir3/dir3.py: ~/Desktop/dir1/dir1.py

run: |
 sleep 10
  ls /checkpoint
  ls .

Without custom intermediate bucket config

bucket deleted

(sky) ➜ sky jobs launch ~/Desktop/hello-sky/work_dir_1.yaml
(sky) ➜ sky jobs logs --controller 1
(test_workdir_bucket_name_1, pid=2483) I 01-08 08:01:42 storage.py:984] Storage type StoreType.AZURE already exists under storage account 'sky635694a6141885a3'.
(test_workdir_bucket_name_1, pid=2483) I 01-08 08:01:42 storage.py:987] Storage type StoreType.S3 already exists.
(test_workdir_bucket_name_1, pid=2483) I 01-08 08:01:46 storage.py:1409] Deleted S3 bucket skypilot-filemounts-zepingguo-4f14dd72.
(test_workdir_bucket_name_1, pid=2483) I 01-08 08:01:46 storage.py:1409] S3 bucket skypilot-filemounts-zepingguo-4f14dd72 may have been deleted externally. Removing from local state.
(test_workdir_bucket_name_1, pid=2483) I 01-08 08:01:47 storage.py:1409] S3 bucket skypilot-filemounts-zepingguo-4f14dd72 may have been deleted externally. Removing from local state.
(test_workdir_bucket_name_1, pid=2483) I 01-08 08:01:47 storage.py:1409] S3 bucket skypilot-filemounts-zepingguo-4f14dd72 may have been deleted externally. Removing from local state.

With custom intermediate bucket config

bucket persist, sub path deleted

(sky) ➜  cat ~/.sky/config.yaml
jobs:
  bucket: s3://bucket-admin-test/
(sky) ➜ sky jobs launch ~/Desktop/hello-sky/work_dir_1.yaml
(sky) ➜ sky jobs logs --controller 1


(test_workdir_bucket_name_1, pid=2483) I 01-08 09:39:32 controller.py:523] Killing controller process 2779.
(test_workdir_bucket_name_1, pid=2483) I 01-08 09:39:32 controller.py:531] Controller process 2779 killed.
(test_workdir_bucket_name_1, pid=2483) I 01-08 09:39:32 controller.py:533] Cleaning up any cluster for job 1.
(test_workdir_bucket_name_1, pid=2483) I 01-08 09:39:33 storage.py:990] Storage type StoreType.S3 already exists.
(test_workdir_bucket_name_1, pid=2483) I 01-08 09:39:33 storage.py:990] Storage type StoreType.S3 already exists.
(test_workdir_bucket_name_1, pid=2483) I 01-08 09:39:33 storage.py:990] Storage type StoreType.S3 already exists.
(test_workdir_bucket_name_1, pid=2483) I 01-08 09:39:36 storage.py:987] Storage type StoreType.AZURE already exists under storage account 'sky635694a6141885a3'.
(test_workdir_bucket_name_1, pid=2483) I 01-08 09:39:36 storage.py:990] Storage type StoreType.S3 already exists.
(test_workdir_bucket_name_1, pid=2483) I 01-08 09:39:39 storage.py:1424] Removed objects from S3 bucket bucket-admin-test/job-48d27dc6/workdir.
(test_workdir_bucket_name_1, pid=2483) I 01-08 09:39:40 storage.py:1424] Removed objects from S3 bucket bucket-admin-test/job-48d27dc6/local-file-mounts/0.
(test_workdir_bucket_name_1, pid=2483) I 01-08 09:39:41 storage.py:1424] Removed objects from S3 bucket bucket-admin-test/job-48d27dc6/local-file-mounts/1.
(test_workdir_bucket_name_1, pid=2483) I 01-08 09:39:41 storage.py:1424] Removed objects from S3 bucket bucket-admin-test/job-48d27dc6/tmp-files.
(test_workdir_bucket_name_1, pid=2483) I 01-08 09:39:41 controller.py:542] Cluster of managed job 1 has been cleaned up.

Tested (run the relevant ones):

  • Code formatting: bash format.sh
  • Any manual or new tests for this PR (please specify below)
  • All smoke tests: pytest tests/test_smoke.py
  • Relevant individual smoke tests: pytest tests/test_smoke.py::test_fill_in_the_name
  • Backward compatibility tests: conda deactivate; bash -i tests/backward_compatibility_tests.sh

Comment on lines +81 to +83
def _is_sky_managed_intermediate_bucket(bucket_name: str) -> bool:
return re.match(r'skypilot-filemounts-.+-[a-f0-9]{8}',
bucket_name) is not None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basing on bucket name is not a robust solution. Can't we manually set the force_delete accordingly in maybe_translate...?

Copy link
Collaborator Author

@zpoint zpoint Jan 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_force_delete is already True in our cases. The controller will call delete anyway; it just doesn’t know whether the bucket is sky_managed. Should it delete the entire bucket or just the sub_path

Comment on lines +1373 to +1374
_is_sky_managed_intermediate_bucket(
self.name))
Copy link
Collaborator

@romilbhardwaj romilbhardwaj Jan 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is_sky_managed should not be inferred on bucket name, it depends on whether the bucket is created by sky or not. Can we instead pass this information to the controller using force_delete, since this field is used mainly for deletion logic?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_force_delete is already True in our cases. The controller will call delete anyway; it just doesn’t know whether the bucket is sky_managed. Should it delete the entire bucket or just the sub_path

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants