Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return subseq from correct side of seq when on reverse strand using a… #454

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

aparton
Copy link
Contributor

@aparton aparton commented Sep 13, 2019

… fasta file

Initial bug reported by Fergal:

Basically the core API will return the wrong sequence for the introns when the is_splice_canonical subroutine is called, if fetching the splice sites for introns on the -1 strand using a fasta flatfile. If the sequence is fetched from the db then it’s correct. Also I have added in a alternative subroutine from the genebuild API that works slightly differently to the core API call, but seems to get the sequence correctly in all cases (so this might be useful in determining the specific behaviour of the core API that isn’t replicated correctly via flatfile seq fetching).

I believe that most core API calls actually fetch the sequence on the 1 strand and then reverse complement it, so it could be the case that this is an example of a call that does not do that and thus does not work with the flatfile seq fetching. I assume there are not many cases like this, as we would have probably noticed them.

Anyway, I though I should flag it up as it could have significant consequences that are hard to spot (for example, we would have likely flagged some transcripts as bad due to the lack of canonical splicing, while in reality they were just on the -1 strand in most cases).

The underlying issue seems to be when $slice->subseq(1,2) is called on the intron to access the donor sequence, then FastaSequence.pm takes the sequence from the forward strand first, and then calculates the reverse compliment. However, if the initial sequence is taken from the forward strand and complimented, then the donor and acceptor sequences are the wrong way around for introns on the reverse strand.

Suggested fix attached. No idea what the consequences of this could be, but there's every chance they'll be wide-ranging.

@coveralls
Copy link

coveralls commented Sep 16, 2019

Coverage Status

Coverage increased (+0.1%) to 82.707% when pulling f68290d on aparton:bugfix/fastamaster into 53fc0d8 on Ensembl:master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants