-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Check if PHMM-based alignment necessarily outputs an MSA #35
Comments
This should be handled by the 'extra' argument in
But definitely worth validating as its been a while since this was tested |
Thinking about it further, the problem is the PHMM alignment is done in chunks (by default 1000 sequences at a time) for efficiency reasons, so different insertions could produce different coordinate systems for the resultant MSAs and they won’t be compatible when concatenated. So unfortunately I don't think that function argument helps
I’ve already started implementing an optional alignment step at the end of the pipeline and will soon add in dealignment after the PHMM alignment step to produce a default unaligned output
|
Ahh yes i get the issue, good point. I was initially thinking it might make more sense to merge the chunks before going into Insted, there may be something we can output from Conceptually i prefer the idea of outputting an aligned FASTA at the end and just unaligning for IDTAXA (can be just a regex removing any |
Yes, the inefficiency of waiting for all sequences to download before aligning would be bad, but an additional problem is that
I think this is possible but I imagine it would require writing a lot of code outside of existing tools, which could be risky/slow. But something to keep in mind for the future if bulk alignment proves to be an unreasonable bottleneck for the entire pipeline
I agree aligned output is very useful -- easy to make that the default if we like in the future |
I don't think this is true if the aligned sequence has an insertion relative to the PHMM
If this is false,
FILTER_PHMM
sequences will need to be dealigned, and aligned output (#34) will need to be produced by an independent process (make sure if model-training ala. #4 requires unaligned input, both processes can run in parallel)The text was updated successfully, but these errors were encountered: