PASHA (A randomized parallel algorithm for efficiently finding near-optimal universal hitting sets) is a tool for finding approximations for a small universal hitting set (a small set of k-mers that hits every sequence of length L). A detailed description of the functionality of PASHA, along with results in set size, running time, memory usage and CPU utilization are provided in:
Ekim, B., Berger, B., Orenstein, Y. A randomized parallel algorithm for efficiently finding near-optimal universal hitting sets. bioRxiv (2020).
PASHA is a tool for research and still under development. It is not presented as accurate or free of any errors, and is provided "AS IS", without warranty of any kind.
Clone the repository to your local machine, and compile via g++
using the commands:
cd src
g++ -O3 -o pasha rand.cpp -fopenmp
The -O3
or -Ofast
flag is necessary for efficient parallelization and optimization. The -fopenmp
flag is needed to process parallelization via OpenMP.
To compute the hitting set for a specified k-mer and sequence length, use the command:
./pasha <kmerlength> <seqlength> <numThreads> <decycPath> <hitPath>
usr@usr:src usr$ ./pasha 8 20 24 decyc8.txt hit.txt
Decycling set size: 8230
Alpha: 2
Delta: 0.0769231
Epsilon: 0.0961538
Max hitting number: 3.71838e+07
Length of longest remaining path: 12
7.72136 seconds.
Hitting set size: 5287
PASHA is freely available under the MIT License.
If you find PASHA useful, please cite the following papers:
-
Ekim, B., Berger, B., Orenstein, Y. A randomized parallel algorithm for efficiently finding near-optimal universal hitting sets. bioRxiv (2020).
-
Orenstein, Y., Pellow, D., Marçais, G., Shamir, R., Kingsford, C. Compact universal k-mer sets. 16th International Workshop on Algorithms in Bioinformatics (2016).
-
Orenstein, Y., Pellow, D., Marçais, G., Shamir, R., Kingsford, C. Designing small universal k-mer hitting sets for improved analysis of high-throughput sequencing. PLoS Computational Biology (2017).
Development of PASHA is led by Barış Ekim, collaboratively in the labs of Bonnie Berger at the Computer Science and Artificial Intelligence Laboratory (CSAIL) at Massachusetts Institute of Technology (MIT), and Yaron Orenstein at the Department of Electrical and Computer Engineering at Ben-Gurion University of the Negev (BGU).
All of the software executables, manuals, and quick-start scripts, in addition to the calculated universal hitting sets are provided here. Should you have any inquiries, please contact Barış Ekim at baris [at] mit [dot] edu.