Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bucket to bucket "sync" of all versions #166

Open
mlech-reef opened this issue Oct 28, 2020 · 3 comments
Open

Bucket to bucket "sync" of all versions #166

mlech-reef opened this issue Oct 28, 2020 · 3 comments
Labels
enhancement New feature or request question Further information is requested

Comments

@mlech-reef
Copy link
Collaborator

Currently, the SDK is able to synchronize files between two B2 buckets (implemented in #165), but it synchronizes only the latest versions as the whole idea of synchronization works on files and not on file versions.

We may consider adding a feature to be able to sync every version of the files. It may not by b2 sync and something else, or a special b2 sync mode.

@mlech-reef mlech-reef added enhancement New feature or request more-information-needed More information is needed question Further information is requested and removed more-information-needed More information is needed labels Oct 28, 2020
@ppolewicz
Copy link
Collaborator

sync already has options to filter file versions, both server-side file versions may be filtered out and the client-reported times can also be used (not sure if all of that is implemented right now, but you get the idea). The reason for this is for example a backup uploading encrypted garbage during a ransomware/cryptolocker attack. Being able to restore a bucket from such situation by cloning it out to a fresh bucket (mostly using server-side copy!) might be a good option, especially if some files were not encrypted yet and new versions of them were backed up. It's all up to the user, really, how they will deal with this, but the tools should be there.

As sync already has so many options to tweak behavior of massive from/to bucket synchronization operations, I think we should add a switch to b2 sync (--mode=versions?) rather than create a new command, which would need to get most of b2 sync parameters anyway.

@bwbeach
Copy link
Contributor

bwbeach commented Oct 28, 2020

I agree that this would be a useful feature. Not sure of the priority.

What, exactly, would this feature do? It will not be able to replicate the upload times of the original files; they upload times will be the times the files were copied. It can preserve the metadata, including the file modification time. It can preserve the order of the versions of a file.

Because the upload times of the file versions in the destination bucket are different, the actions of lifecycle rules will be different in the source and destination buckets.

@ppolewicz
Copy link
Collaborator

This would, among other things, let users of old buckets clone them into new ones so that they could use the S3 interface with their data.

The server-side times will change, yes, but sync uses modification time.

It seems that to enable this we need just one or two functions (to iterate over every file version in the bucket instead of just the most recent version of every file in that bucket) (respecting filters that we already have).

We might specifically not copy the lifecycle rules to force the user to re-apply them appropriately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants