Skip to content
/ pasha Public

parallel algorithms for small hitting set approximations

License

Notifications You must be signed in to change notification settings

ekimb/pasha

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PASHA: A randomized parallel algorithm for efficiently finding near-optimal universal hitting sets

PASHA (A randomized parallel algorithm for efficiently finding near-optimal universal hitting sets) is a tool for finding approximations for a small universal hitting set (a small set of k-mers that hits every sequence of length L). A detailed description of the functionality of PASHA, along with results in set size, running time, memory usage and CPU utilization are provided in:

Ekim, B., Berger, B., Orenstein, Y. A randomized parallel algorithm for efficiently finding near-optimal universal hitting sets. bioRxiv (2020).

PASHA is a tool for research and still under development. It is not presented as accurate or free of any errors, and is provided "AS IS", without warranty of any kind.

Installation and setup

Installing the package

Clone the repository to your local machine, and compile via g++ using the commands:

cd src
g++ -O3 -o pasha rand.cpp -fopenmp

The -O3 or -Ofast flag is necessary for efficient parallelization and optimization. The -fopenmp flag is needed to process parallelization via OpenMP.

Example commands

To compute the hitting set for a specified k-mer and sequence length, use the command:

./pasha <kmerlength> <seqlength> <numThreads> <decycPath> <hitPath>

usr@usr:src usr$ ./pasha 8 20 24 decyc8.txt hit.txt

Decycling set size: 8230
Alpha: 2
Delta: 0.0769231
Epsilon: 0.0961538
Max hitting number: 3.71838e+07
Length of longest remaining path: 12
7.72136 seconds.
Hitting set size: 5287

License

PASHA is freely available under the MIT License.

References

If you find PASHA useful, please cite the following papers:

  • Ekim, B., Berger, B., Orenstein, Y. A randomized parallel algorithm for efficiently finding near-optimal universal hitting sets. bioRxiv (2020).

  • Orenstein, Y., Pellow, D., Marçais, G., Shamir, R., Kingsford, C. Compact universal k-mer sets. 16th International Workshop on Algorithms in Bioinformatics (2016).

  • Orenstein, Y., Pellow, D., Marçais, G., Shamir, R., Kingsford, C. Designing small universal k-mer hitting sets for improved analysis of high-throughput sequencing. PLoS Computational Biology (2017).

Contributors

Development of PASHA is led by Barış Ekim, collaboratively in the labs of Bonnie Berger at the Computer Science and Artificial Intelligence Laboratory (CSAIL) at Massachusetts Institute of Technology (MIT), and Yaron Orenstein at the Department of Electrical and Computer Engineering at Ben-Gurion University of the Negev (BGU).

Contact

All of the software executables, manuals, and quick-start scripts, in addition to the calculated universal hitting sets are provided here. Should you have any inquiries, please contact Barış Ekim at baris [at] mit [dot] edu.

About

parallel algorithms for small hitting set approximations

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published