resumableS3 (rs3)

To solve what problem

When using S3 cp to download a very large file, original AWS S3 may fail and just report an error ("Connection broken: ConnectionResetError(104, 'Connection reset by peer')", ConnectionResetError(104, 'Connection reset by peer')), without writing the file to output, which means the download of original AWS S3 is not resumable.

ResumableS3 (rs3) is a Python based download utility, of which is meant to be a resumable download program. It is also multi-threaded, which means you shall get similar download speed as the original AWS S3.

Installation

Use pip to install, you should have Python >= 3.8 installation first

conda create -n rs3 python=3.8 -y;
conda activate rs3;
pip install resumables3;

Usage

Only two parameter is mandatory: -i/--input and -o/--output

Usage: rs3 [OPTIONS]

Options:
  -i, --input TEXT       S3 link, must be a specific downloadable object [required]
  -o, --output TEXT      Path to output  [required]
  -t, --temp TEXT        Path to the record file, default : download_progress.txt in output directory
  -w, --workers INTEGER  Max workers for download, default: max CPU threads in your system
  --chunk-size INTEGER   Chunk size for parallel download in MB [default: 25]
  --id TEXT              AWS access key id, default: None (anonymous)
  --key TEXT             AWS secert access key, default: None (anonymous)
  --region-name TEXT     AWS region name, default: None
  --version              Show the version and exit.
  --help                 Show this message and exit.

Example

rs3 \
-i s3://human-pangenomics/working/HPRC_PLUS/HG01109/assemblies/year1_freeze_assembly_v2/HG01109.maternal.f1_assembly_v2.fa.gz \
-o ./output

How to resume download

When you use rs3 to download file, you may encounter bad Internet connection for several chunk during rs3 download (such as the example below), but it is totally fine.

Downloading sectors:  85%|█████████████████████████████████████████████████████████████████████████████████▏              | 36779/43519 [27:27:54<7:41:26,  4.11s/sector]
An error occurred while downloading chunk 964585062400-964611276799: An error occurred while reading from response stream: ('Connection broken: IncompleteRead(18505070 bytes read, 7709330 more expected)', IncompleteRead(18505070 bytes read, 7709330 more expected))
Downloading sectors:  87%|███████████████████████████████████████████████████████████████████████████████████▋            | 37960/43519 [28:16:53<6:08:05,  3.97s/sector]
An error occurred while downloading chunk 996225843200-996252057599: An error occurred while reading from response stream: ("Connection broken: ConnectionResetError(104, 'Connection reset by peer')", ConnectionResetError(104, 'Connection reset by peer'))
Downloading sectors: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████▉| 43514/43519 [32:06:58<00:15,  3.10s/sector]
File downloaded to ./HG02723/HG02723_3.fast5.tar.gz

After the first rs3 command finished, just rerun the same rs3 command and rs3 will try to redownload the missing part:

Downloading sectors: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████| 201/201 [08:00<00:00,  2.39s/sector]
File downloaded to ./HG02723/HG02723_3.fast5.tar.gz

Validation

rs3 and s3 are used to download the same file: s3://human-pangenomics/working/HPRC_PLUS/HG01109/assemblies/year1_freeze_assembly_v2/HG01109.maternal.f1_assembly_v2.fa.gz with the following command:

# resumable S3
rs3 -i s3://human-pangenomics/working/HPRC_PLUS/HG01109/assemblies/year1_freeze_assembly_v2/HG01109.maternal.f1_assembly_v2.fa.gz -o ./rs3;

# AWS S3
aws s3 --no-sign-request cp s3://human-pangenomics/working/HPRC_PLUS/HG01109/assemblies/year1_freeze_assembly_v2/HG01109.maternal.f1_assembly_v2.fa.gz ./s3;

The downloaded files are compared through md5sum, which is exactly the same.

f2d8b690d0adeaf28ed3221514f30357  rs3/HG01109.maternal.f1_assembly_v2.fa.gz
f2d8b690d0adeaf28ed3221514f30357  s3/HG01109.maternal.f1_assembly_v2.fa.gz

Citation

Just cite this repository.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
resumableS3		resumableS3
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

resumableS3 (rs3)

To solve what problem

Installation

Usage

Example

How to resume download

Validation

Citation

About

Releases

Packages

Languages

License

JMencius/resumableS3

Folders and files

Latest commit

History

Repository files navigation

resumableS3 (rs3)

To solve what problem

Installation

Usage

Example

How to resume download

Validation

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages