HOPE in Memory

This repo contains the source code of Hirschberg algorithm implemented in OpenMP that is Efficient (HOPE) in Memory. The algorithm goal is to globally align sequences of characters of big sizes. The algorithm uses Levenshtein Distance and Hirschberg algorithm to find the best alignment, while using the smallest amount of memory. The algorithm has been succesfully tested with up to two input strings of 14MB each. Classical implementations of the Smith-Waterman-Gotoh algoritm cannot process these inputs, because it would need 196 Exabytes of RAM to allocate the entire score matrix.

This implementation does not only focus on memory efficiency, but also in performance. There are four versions of the code, each one introduce incremental steps to vectorize and parallelize the source code. The final version of the code is able to achieve 20 GCUPS (Million of Cell Updates per second) on a Intel Xeon CPU E5645 with 24 cores.

Source Tree Organization

HSeq: Hirschberg Sequential is the baseline implementation that executes the algorithm without parallelization or vectorization.
HVec: Hirschberg Vector is a vectorized version that exploits anti-diagonal independency.
VecRec_Tasks: Vector & Recursive Tasks is a vector and parallel version that incrementally adds OpenMP recursive tasks. The parallelism exploited by these tasks is the decomposition of the score matrix done by Hirchberg Algorithm.
Parallel_Vec_RecTasks: This is the final version that adds fine grained parallelism using parallel blocks computing the anti-diagonals.
data: This directory contains some examples to execute with the algorithm. It also contains some reference outputs (tiny.out)

Compilation and run

To compile the different source codes use the make files with the two possible variants hirchberg and hirchberg_omp. For example:

cd Parallel_Vec_RecTasks
make hirschberg_omp
./hirschberg_omp ../data/seq1_tiny.txt ../data/seq2_tiny.txt

To launch the experiments use SLURM with the submit.sh files:

sbatch submit-omp.sh hirschberg_omp ../data/seq1_tiny.txt ../data/seq2_tiny.txt $NUM_THREADS

To compare the output use diff against the reference output inside data. The different versions are named short.out, medium.out, large.out, double_large.out and quadra_large.out

./hirschberg_omp ../data/seq1_tiny.txt ../data/seq2_tiny.txt >tiny.out
diff tiny.out ../data/tiny.out

Extrae Support

Each code version supports Tracing through Extrae, for more information visit BSC tools

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HOPE in Memory

Source Tree Organization

Compilation and run

Extrae Support

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
HSeq		HSeq
HVec		HVec
Parallel_Vec_RecTasks		Parallel_Vec_RecTasks
VecRec_Tasks		VecRec_Tasks
data		data
README.md		README.md

vsoriap/HOPE-in-Memory

Folders and files

Latest commit

History

Repository files navigation

HOPE in Memory

Source Tree Organization

Compilation and run

Extrae Support

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages