-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Genome names do not match annotation #13
Comments
Hi @Marlinski95, When giving a directory as input to emapper2gbk, the tool expects that the files for a same organism in the ANNOTATION/FASTAPROT/FASTNUCLEIC folders have the same name. And it seems that this is not the case for your data. Your input seems to be:
With these names, emapper2gbk will not be able to map the different files to the same organism and will return an error. They should be formatted as this:
By renaming the files, this should fix this issue. Best Regards, |
Can you give me the complete error message and the command you used?
This GFF folder seems to be correct and should not produce error.
Sorry, there is a typo in it, I will fix it. The correct sentence is: So something like this:
And the GFF folder is an independent folder (such as FASTAPROT and ANNOTATION) so it must not be in the nucleotide folder. The location of the GFF folder is given to emapper2gbk with the option |
Hello, Creating GFF database (gffutils) for BC-1_bin.15 Am I still missing something? I know that my taxonomic resolutions isn't very high since we suspect a lot of Candidate species in my samples but I think I don't entirely understand how this is tied to reformatting the data. Thanks a thousands for your help and time! |
Hi, The issue here is that The search on https://www.ebi.ac.uk/ena/data/taxonomy/v1/taxon/scientific-name/BC-1 show no results. You should use a higher taxonomic rank (either species or genus). By adding the taxon name to the address https://www.ebi.ac.uk/ena/data/taxonomy/v1/taxon/scientific-name/, you should see if this is working. For example https://www.ebi.ac.uk/ena/data/taxonomy/v1/taxon/scientific-name/Escherichia%20coli And do you have only one taxon for all your data?
In this way you can give the specific taxon associated to each genome. Best regards, |
Ahaa, I see! Thank you.
Hmm - those bins are from metagenomes so there is all kinds of stuff in
there. I guess I could try to set it to -Bacteria- instead of BC-1 but not
sure if that would fix the problem. I'll play around with it - thank you!
…On Wed, Apr 20, 2022 at 9:15 AM Arnaud Belcour ***@***.***> wrote:
Hi,
The issue here is that BC-1 is too precise as a taxonomic resolutions for
the taxonomic database.
The search on
https://www.ebi.ac.uk/ena/data/taxonomy/v1/taxon/scientific-name/BC-1
show no results.
You should use a higher taxonomic rank (either species or genus). By
adding the taxon name to the address
https://www.ebi.ac.uk/ena/data/taxonomy/v1/taxon/scientific-name/, you
should see if this is working.
For example
https://www.ebi.ac.uk/ena/data/taxonomy/v1/taxon/scientific-name/Escherichia%20coli
(The %20 here is to replace a space between the genus and the species
names.)
And do you have only one taxon for all your data?
Because you can give a taxonomic file by using the option -nf. This
option takes as input a .tsv file with 2 columns (first is the name of the
organism and the second is the name of the corresponding taxon).
For example:
BC-1_bin.100 Genus species
BC-1_bin.116 Escherichia coli
... ...
In this way you can give the specific taxon associated to each genome.
Best regards,
Arnaud Belcour.
—
Reply to this email directly, view it on GitHub
<#13 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AMWYRTMRZXBPFYRNJRK6P5LVF77QBANCNFSM5TYXY5PQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Hi,
I am trying to convert my emapper annotations into genebank format using your tool. I have the following directories set up:
ANNOTATION/ FASTAPROT/ FASTNUCLEIC/ GENBANK/ GFF/ HITS/ ORTHOLOGS/
(emapper2gbk) [mjensen2$] ls FASTNUCLEIC/
BC-1_bin.100.fna BC-1_bin.116.fna BC-1_bin.14.fna etc.
(emapper2gbk) [mjensen2$] ls FASTAPROT/
BC-1_bin.100.emapper.genepred.faa BC-1_bin.116.emapper.genepred.faa BC-1_bin.14.emapper.genepred.faa etc.
(emapper2gbk) [mjensen2$] ls ANNOTATION/
BC-1_bin.100.emapper.annotations BC-1_bin.116.emapper.annotations BC-1_bin.14.emapper.annotations etc.
When I run the following command, however, I get the an error saying that the genomes names do not match the annotation names.
(emapper2gbk) [mjensen2$] emapper2gbk genes -fn ./FASTNUCLEIC/ -fp ./FASTAPROT/ -o ./GENBANK/ -a ./ANNOTATION/ -c 10 -n BC-1 -go gobasic -g ./GFF/
Since it is not the filenames I checked the file content and noticed that emapper has added an additional number to the identifier when it predicted genes and annotated these, e.g.
Contig ID: >bin.1.fak127_1021
Prot ID: >bin.1.fak127_1021_1
Annotation ID: bin.1.fak127_1021_1
I believe this is the problem but I don't know how to work around this as this is something emapper added. Have you encountered this before? I might just be missing a flag of some sort but I am unsure and would appreciate your help!
Cheers,
Marlene
The text was updated successfully, but these errors were encountered: