Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parsing of genus/species names #37

Open
mattjmeier opened this issue Oct 7, 2019 · 1 comment
Open

Parsing of genus/species names #37

mattjmeier opened this issue Oct 7, 2019 · 1 comment
Labels
enhancement Python Bug or fix related to the Python scripts.

Comments

@mattjmeier
Copy link

Hello,

I've been using the SAMSA2 pipeline and it works great for my application.

One thing I've noticed is that the genus/species names reported for Step 5 outputs are parsed using the final two space-separated names in the taxonomy. Most of the time this works well enough (e.g., the output is something like Bacillus subtilis, a proper genus and species pair).

But I seem to have quite a few cases where the output is something like "sp. Root239" or "sp. NRRL", the latter of which is particularly uninformative because NRRL is a type collection and so could really be pointing to anything.

I'm wondering if there is a way to modify the output of the script so that the user can get the full taxonomy? I see that the DIAMOND_general_RefSeq_analysis_counter.py python script deals with this function (around line 132 if I'm reading this correctly?). Maybe even having an option to add a column for taxid in the output here would be useful.

Thanks for any input you have on this!
Matt

@transcript
Copy link
Owner

Hi Matt,

This is a good suggestion, and I'll tag this as enhancement - it shouldn't be too difficult to add another parameter to capture the full name when extracting from the RefSeq database. Feel free to submit a PR if you want to tackle this, or I'll work on it when I have a chance and will update this ticket.

Best,
Sam

@transcript transcript added enhancement Python Bug or fix related to the Python scripts. labels Oct 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Python Bug or fix related to the Python scripts.
Projects
None yet
Development

No branches or pull requests

2 participants