Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scores are possibly off #9

Open
giuliaguidi opened this issue Feb 24, 2020 · 0 comments
Open

Scores are possibly off #9

giuliaguidi opened this issue Feb 24, 2020 · 0 comments

Comments

@giuliaguidi
Copy link
Contributor

Hi,

I ran libgaba using a linear gap penalty and regardless of the sequences similarity and scoring matrix, the scores are way off with respect to the ground truth.

Could you please let me know if I'm doing something wrong?

Here's the program I'm using:

int gabaAlign(int mat, int mis, int gap, int k, int xdrop, 
    std::string& targetSeg, std::string& querySeg)
{
gaba_t *ctx = gaba_init(GABA_PARAMS(
		// match award, mismatch penalty, gap open penalty (G_i), and gap extension penalty (G_e)
		GABA_SCORE_SIMPLE(mat, -mis, 0, -gap),
		gfa : 0,
		gfb : 0,
		xdrop : (int8_t)xdrop,
		filter_thresh : 0,
	));

	char const t[64] = {0};	// tail array
	gaba_section_t asec = gaba_build_section(0, (uint8_t const *)targetSeg.c_str(), (uint32_t)targetSeg.size());
	gaba_section_t bsec = gaba_build_section(2, (uint8_t const *)querySeg.c_str(),  (uint32_t)querySeg.size());
	gaba_section_t tail = gaba_build_section(4, t, 64);

	// create thread-local object
	gaba_dp_t *dp = gaba_dp_init(ctx);	// dp[0] holds a 64-cell-wide context

	// init section pointers
	gaba_section_t const *ap = &asec, *bp = &bsec;
	gaba_fill_t const *f = gaba_dp_fill_root(dp, // dp -> &dp[_dp_ctx_index(band_width)] makes the band width selectable
		ap, 0,					// a-side (reference side) sequence and start position
		bp, 0,					// b-side (query)
		UINT32_MAX				// max extension length
	);

	// until x-drop condition is detected
	gaba_fill_t const *m = f;

	while((f->status & GABA_TERM) == 0) {
		// substitute the pointer by the tail section's if it reached the end
		if(f->status & GABA_UPDATE_A) { ap = &tail; }
		if(f->status & GABA_UPDATE_B) { bp = &tail; }

		f = gaba_dp_fill(dp, f, ap, bp, UINT32_MAX);	// extend the banded matrix
		m = f->max > m->max ? f : m;					// swap if maximum score was updated
	}

	// alignment path
	gaba_alignment_t *r = gaba_dp_trace(dp,
		m,		// section with the max
		NULL	// custom allocator: see struct gaba_alloc_s in gaba.h
	);

    int score = r->score;
    
	// clean up
	gaba_dp_res_free(dp, r); gaba_dp_clean(dp);
	gaba_clean(ctx);    

    return score;
}

For example, using seq1.seq and seq2.seq as input sequences, x = 50, match = 1, mismatch = -1, and gap = -1, the score should be 3105, while libgaba return 4197.

You can find the benchmark code here: https://github.com/giuliaguidi/xavier/tree/master/benchmark. It can be compiled simply via:

make

and run as:

./test-score-exe <x-drop> <seq1-file> <seq2-file>

The ground truth can be generated using PyAligner and running it in the https://github.com/giuliaguidi/xavier/tree/master/benchmark folder as:

python3 run_pyaligner.py <seq1-file> <seq2-file> -x <x-drop>

Thank you very much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant