-
Notifications
You must be signed in to change notification settings - Fork 186
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a joint-wrapper which parallelizes SAMTOOLS MPILEUP/VARSCAN MPILEUP2SNP #163
Comments
Hi @moldach , seems like parallelizing over chromosomes is really an enhancement here. Did you think about using the chromosomes as a wildcard? Then, snakemake would run every chromosome in a different job, without the need of xargs and the likes. Additionally, you could also pipe the output from |
Hi @jafors thanks for getting back to me. I've tried your suggestion of pipe the output from
and then combine the output of these rules at the end
We don't have to list the temporary files in
For C. elegans it produces a DAG, like:
However, forever the Human genome this would require many more rules. Is there a way to do this more succinctly? |
There is indeed a way to improve on this when you replace the specific contigs with a wildcard.
Then, you need the other
|
Thanks for the suggestion. I've tried the following but get an error:
Error
Any idea what could be wrong? |
|
There is no need to expand in the two mpileup-related rules. Those rules will run once for all contigs over all samples. Just write
and the contigs will be determined by the |
There are currently two separate wrappers for SAMTOOLS MPILEUP and VARSCAN MPILEUP2SNP
These tools are used sequentially and unfortunately single-threaded.
I'm in the process of converting
shell
commands towrappers
so I have not had a chance to benchmark these wrappers specifically; however, I assume it is the same as piping the output of samtools mpileup into varscan, e.g.:When I initially benchmarked the above code on C. elegans it took 101 minutes.
It would be ideal to create a joint-wrapper, combining the two tools, taking advantage of
samtools mpileup
's--region
parameter and GNU Parallel.The
shell
command I'm currently using is:This parallelized the variant calling process by applying these operations to each chromosome (on a separate core) reducing computation time to 17 minutes - A 81% decrease in processing time for the lengthiest step in the C. elegans pipeline.
The text was updated successfully, but these errors were encountered: