Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling BAM files with polyA trimmed #256

Open
omarelgarwany opened this issue Nov 5, 2024 · 1 comment
Open

Handling BAM files with polyA trimmed #256

omarelgarwany opened this issue Nov 5, 2024 · 1 comment
Labels
question Further information is requested

Comments

@omarelgarwany
Copy link

Hello

I have a question/issue regarding the polyA requirement. So far, I have been using BAM files that were aligned using the ISOSEQ pipeline. As part of this pipeline, polyA tails are trimmed (isoseq refine). I would like to keep using the ISOSEQ pipeline as it does a few other single-cell-realted tasks that I think are not handled by Isoquant (e.g. extracting barcode tags, barcode correction, real cell calling). However, as polyA are trimmed, I keep getting a warning that too few reads have polyA tails (< 1%).

I am not worried about these reads having polyA tails because isoseq refine requires at least 20 As. I know that isoquant complains because these have been trimmed. I have considered adding back those tails, but I am not sure if it is safe to add back these sequences to the mapped BAM file as it will require modifying the CIGAR string and potentially other related BAM tags. Are there any other workarounds? While I agree that it the polyA requirement is a sensible check, I wish it was a little bit more flexible. I was thinking maybe if there's an option to check for a user-defined tag that specifies the length of the polyA tail (e.g. PA:i:34), then this could be more easily added back without having to modify the sequence and going through all the trouble that comes with it.

Do you have any thoughts on this?

Best wishes
Omar

@andrewprzh
Copy link
Collaborator

Dear @omarelgarwany

Yes, you are not the first one to ask since many people do use both tools.
PolyA tails are needed only to ensure correct positions of discovered novel transcripts. When poly-A percentage is low, their presence is not required.
Moreover, if you are sure that every read had polyA tail initially, it's safe to use --fl_data flag. I think results should be very similar if not identical to those that would have been obtained with non-truncated reads. So I doubt modifying BAM files is necessary.

As far as I know IsoSeq pipeline can also provide information on whether a read had polyA tail or not, right? We planned to incorporate that into IsoQuant as well. But we are bit short on man power now.

Best
Andrey

@andrewprzh andrewprzh added the question Further information is requested label Nov 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants