-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: dereplicate-sequences
expose parameter to disable sequence hash IDs
#55
Comments
Should we request this as a feature of vsearch? Vsearch currently supports:
Colin |
Wait... there are several, nested feature requests here!
Is this what we want?
Having a sequence with identical ID and sequence seems a little silly to me, but if both dada2 and deblur implement this natively, then I'm comfortable requesting it for vsearch. However, if this is an option within the q2 plugins, maybe we should implement this within Q2-vsearch. Colin |
Hello @torognes, what do you think about a
|
Hi @colinbrislawn, yes, that's a feature that should be easy to add to vsearch. I'll add it to issues for vsearch and implement it soon. |
Thanks @torognes! @nbokulich I'll add As far as I can see, the reads will be hashed with sha1, which conflicts with the md5 of dada2... |
Thanks @colinbrislawn ! Looks like VSEARCH has both |
So Looks like both this issue and #48 can't be closed until the vsearch version is bumped. While we wait for the bump, I'll try to get this PR submitted added before the October 18th deadline. |
It looks like removing the hashes breaks this section: id_map = {e.metadata['description']: e.metadata['id']
for e in skbio.io.read(str(dereplicated_sequences), With just a sample ID, instead of hash + sample ID, this section breaks. What's the recommended way to build this id_map without hashes? |
Improvement Description
Similar to q2-dada2 and q2-deblur, there should be an option to use the unhashed sequences as their own IDs instead of using a hash ID in
dereplicate-sequences
.Current Behavior
Seq hashes are used by default.
Proposed Behavior
Expose a
--p-hashed-feature-ids
parameter to choose how sequence IDs get handled.References
The text was updated successfully, but these errors were encountered: