UNItig construction in PARallel for de novo assembly
UNIPAR is a fast assembly tool that use De Bruijn graph based algorithms to assemble short sequencing reads to long unitigs. It uses both CPUs and GPUs to run in parallel, and scales to multiple computer nodes in a cluster.
CUDA 5 or later, with GPU compute capability 3.5 or higher
GCC 4.9 or later
MPI library
tbb library used for parallel sort and scan on CPUs
git clone https://github.com/ShuangQiuac/UNIPAR
cd <PATH_TO_UNIPAR>
mkdir build
cd build
cmake ..
make
./unipar -i <input file> -r <read length> -k <kmer length> -n <number of partitions> -c <number of CPUs> -g <number of GPUs> -d <intermediate file directory> -o <unitig output directory> -t <cutoff threshold>
mpirun -np <number of processes> [host options] ./unipar [parameter options]
./unipar -i <PATH_TO_UNIPAR>/example/test.fa -r 36 -k 27
mpirun -np 2 ./unipar -i <PATH_TO_UNIPAR>/example/test.fa -r 36 -k 27
-i [STRING]: input file, either a fasta or a fastq file
-r [INT]: read length, the first r number of base pairs in a read will be taken
-k [INT]: kmer length, less than or equal to the read length, suggestted to be an odd number
-n [INT]: [Optional] number of partitions, set to be 512 by default, suggestted to be a number of power of 2
-c [INT]: [Optional] number of CPUs to run, either 0 or 1, set to 1 by default
-g [INT]: [Optional] number of GPUs to run, either set to 0 or the number of GPUs detected with UNIPAR, set to the number of GPUs detected by default
-d [STRING]: [Optional] intermediate output directory for partitions, set to ./partitions by default
-o [STRING]: [Optional] unitig output directory, set to the current directory by default
-t [INT]: [Optional] the cutoff threshold for kmer coverage, set to 1 by default
Miminizer based partitioning files [intermediate file]
De Bruijn subgraph files [optional]: Users can choose to output constructed De Bruijn graph if they only needed the raw graph instead of the unitigs. The number of subgraph files is a user defined parameter, and set to 512 by default. Output of subgraph files is turned off by default.
Unitig files [these are output results of UNIPAR]: The total number of unitig files equals to the total number of processors run with UNIPAR.
Format: contig_<processor id>_<process id>.fa
Unitigs in all the files contributes to the final results.
Ecoli on SRA (SRR001665) https://www.ncbi.nlm.nih.gov/sra/?term=SRR001665
Human Chr14 on GAGE: http://gage.cbcb.umd.edu/data/Hg_chr14/
Bumbblebee on GAGE http://gage.cbcb.umd.edu/data/Rhodobacter_sphaeroides/
Whole Human Genomes on SRA (SRX016231) https://www.ncbi.nlm.nih.gov/sra?term=SRX016231
GPU Nvidia K80, Nvidia P40
Total number of GPUs Upto 24 (2*12 K40)
Total number of Computer Nodes Upto 6