Return subseq from correct side of seq when on reverse strand using a… #454
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
… fasta file
Initial bug reported by Fergal:
Basically the core API will return the wrong sequence for the introns when the is_splice_canonical subroutine is called, if fetching the splice sites for introns on the -1 strand using a fasta flatfile. If the sequence is fetched from the db then it’s correct. Also I have added in a alternative subroutine from the genebuild API that works slightly differently to the core API call, but seems to get the sequence correctly in all cases (so this might be useful in determining the specific behaviour of the core API that isn’t replicated correctly via flatfile seq fetching).
I believe that most core API calls actually fetch the sequence on the 1 strand and then reverse complement it, so it could be the case that this is an example of a call that does not do that and thus does not work with the flatfile seq fetching. I assume there are not many cases like this, as we would have probably noticed them.
Anyway, I though I should flag it up as it could have significant consequences that are hard to spot (for example, we would have likely flagged some transcripts as bad due to the lack of canonical splicing, while in reality they were just on the -1 strand in most cases).
The underlying issue seems to be when $slice->subseq(1,2) is called on the intron to access the donor sequence, then FastaSequence.pm takes the sequence from the forward strand first, and then calculates the reverse compliment. However, if the initial sequence is taken from the forward strand and complimented, then the donor and acceptor sequences are the wrong way around for introns on the reverse strand.
Suggested fix attached. No idea what the consequences of this could be, but there's every chance they'll be wide-ranging.