Utilities for the MetaSUB Consortium. Free software: MIT license.
A collection of utilites to manage the MetaSUB project.
- Athena is a collection of tools to manage data on the Weill-Cornell ICB Compute Cluster
- Bridges is a collection of tools to manage data on the XSEDE Bridges Compute Cluster
- Data Packet contains scripts for building MetaSUB data packets
- Hudson Alpha contains tools for downloading raw sequence data from Hudson Alpha
- Metadata provides access to the MetaSUB Metadata
- Metagenscope is a set of utilites to upload data to metagenscope
- Wasabi uploads and downloads data from Wasabi Hot Storage, an S3 clone
- Zurich uploads and downloads data from the Zurich MetaSUB SFTP server
- Packet Parser contains utilities to parse the MetaSUB data packet.
You need to be using py3 to install this package.
Install from PyPi
pip install metasub_utils
Install from source.
git clone [email protected]:MetaSUB/metasub_utils.git
cd metasub_utils
python setup.py install
The public v1.2.0 of the MetaSUB data packet may be found at here_
To download raw data or assemblies from wasabi you will need API credentials. Please contact David Danko ([email protected]) to acquire these keys. Currently keys may only be obtained by members of the MetaSUB Consortium.
Reads with human DNA removed are publicly available and can be obtained without an API key. To list the relevant files use the following command. Various options for fitlering samples are available.
$ metasub wasabi list nonhuman-reads --help
$ metasub wasabi list nonhuman-reads --city-name paris
To download these files use a tool like curl or wget. For example.
$ metasub wasabi list nonhuman-reads --city-name paris | sed 's/s3:\/\//https:\/\/s3.wasabisys.com\//g' | xargs -l wget
A simple list of publicly available read files is available here.
Wasabi is a clone of amazon S3. To use Wasabi you will need to install the AWS-CLI
Once you have installed the aws command line tool you need to configure an account.
$ aws configure --profile wasabi
AWS Access Key ID [None]: `your access key`
AWS Secret Access Key [None]: `your secret key`
Default region name [None]:
Default output format [None]:
Once your account is configured you can use this utility package to download files. The following commands will be most useful. If you are downloading a large amount of data it may be faster to use the AWS-CLI directly.
$ metasub wasabi download contigs --help
$ metasub wasabi download raw-reads --help
$ metasub wasabi download kmers --help
Note that all download commands dryrun by default. You will need to add the --wetrun flag to actually download data.
You can also list the files without download. This gives cleaner output than a download dryrun would.
$ metasub wasabi list contigs --help
$ metasub wasabi list raw-reads --help
$ metasub wasabi list kmers --help
To download data from a specific city run
$ metasub wasabi download raw-reads --wetrun --city-name <city_name>
If your city has a large number of samples you may want to split the download into chunks. You can do this with the following script.
metasub metadata samples-from-city <city_name> > all_sample_names.txt
split -l <chunk_size> all_sample_names.txt chunk.
for f in chunk.*; do echo $f; metasub wasabi download-raw-reads --sample-names $f; done
v0.7.0
- Added command to download and list kmers
- added
download
andlist
sub-commands towasabi
v0.4.0
- Added a metadata CLI/API to list samples from a particular city
- Added a wasabi CLI/API to list raw reads with a city-specific option
- Added a wasabi CLI/API to download raw reads with a city-specific option
This package is structured as a set of microlibraries
This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.