Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect gbk files when genes identifiers are numbers #10

Open
cfrioux opened this issue Mar 8, 2022 · 3 comments
Open

Incorrect gbk files when genes identifiers are numbers #10

cfrioux opened this issue Mar 8, 2022 · 3 comments

Comments

@cfrioux
Copy link

cfrioux commented Mar 8, 2022

  • eggnog2gbk version: 0.1.0
  • Python version: 3.7.7
  • Operating System: MacOS 10.15.7

Description

Running emapper2gbk in genes mode with gene identifiers consisting of numbers does not create all the GBK features (translation etc.). There is no crash, a gbk is created but it lacks some important information.

What I Did

emapper2gbk genes -fn bin.fna -fp bin.faa -o bin.gbk -n "Prevotella" -a bin.tsv
LOCUS       _10007119               3225 bp    DNA              BCT 08-MAR-2022
DEFINITION  Prevotella genome.
ACCESSION   10007119
VERSION     10007119
KEYWORDS    Prevotella.
SOURCE      .
  ORGANISM  Prevotella
            Bacteria; Bacteroidetes; Bacteroidia; Bacteroidales; Prevotellaceae.
FEATURES             Location/Qualifiers
     source          1..3225
                     /scaffold="10007119"
                     /db_xref="taxon:838"
     gene            2..3225
                     /locus_tag="gene_10007119"
     CDS             2..3225
                     /locus_tag="gene_10007119"
ORIGIN
        1 atgaaagatc aaaatattaa gaaggtgttg ctcctcggct ccggtgcgtt gaagatcggt
       61 gaggccggcg agttcgacta ttccggttca caggcactca aggcgctgcg tgaggaaggc
      121 gtctacacgg tgctcatcaa tcctaatatc gccaccgtgc agacctccga gggcgtggcc
     [...]
//

When adding a prefix to all identifiers, a correct gbk is created:

LOCUS       g10007119               3225 bp    DNA              BCT 08-MAR-2022
DEFINITION  Prevotella genome.
ACCESSION   g10007119
VERSION     g10007119
KEYWORDS    Prevotella.
SOURCE      .
  ORGANISM  Prevotella
            Bacteria; Bacteroidetes; Bacteroidia; Bacteroidales; Prevotellaceae.
FEATURES             Location/Qualifiers
     source          1..3225
                     /scaffold="g10007119"
                     /db_xref="taxon:838"
     gene            2..3225
                     /locus_tag="g10007119"
     CDS             2..3225
                     /locus_tag="g10007119"
                     /gene="carB"
                     /EC_number="6.3.5.5"
                     /dbxref="KEGG:R00256"
                     /dbxref="KEGG:R00575"
                     /dbxref="KEGG:R01395"
                     /dbxref="KEGG:R10948"
                     /dbxref="KEGG:R10949"
                     /translation="MKDQNIKKVLLLGSGALKIGEAGEFDYSGSQALKALREEGVYTVL
                     INPNIATVQTSEGVADQIYFLP[...]"
ORIGIN
        1 atgaaagatc aaaatattaa gaaggtgttg ctcctcggct ccggtgcgtt gaagatcggt
       61 gaggccggcg agttcgacta ttccggttca caggcactca aggcgctgcg tgaggaaggc
      121 gtctacacgg tgctcatcaa tcctaatatc gccaccgtgc agacctccga gggcgtggcc
      [...]
\\
@cfrioux
Copy link
Author

cfrioux commented Mar 8, 2022

This case is partly accounted for already in
https://github.com/AuReMe/emapper2gbk/blob/master/emapper2gbk/genes_to_gbk.py#L131

ArnaudBelcour added a commit that referenced this issue Sep 20, 2022
But the issue is still present in emapper2gbk genomes.
@ArnaudBelcour
Copy link

This should be fixed in commit 9025e08 for emapper2gbk genes.

But there is still work to fix it for emapper2gbk genomes.

ArnaudBelcour added a commit that referenced this issue Sep 21, 2022
But it is not stable. It could need more work.
@ArnaudBelcour
Copy link

A first fix for genomes and genes have been made in 0.2.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants