Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Formatting Issue? #72

Open
remiketchum opened this issue May 11, 2021 · 2 comments
Open

Formatting Issue? #72

remiketchum opened this issue May 11, 2021 · 2 comments
Assignees
Labels

Comments

@remiketchum
Copy link

remiketchum commented May 11, 2021

Hi,

I am currently trying to convert a gtf or gff3 file output from Augustus to a format that will be read by PASA. However, when I try to run this command:

convert_augustus_to_gff3.py -i augustus.hints.gtf -o test.gff3

I consistently get an error code like this one:

File "/users/rketchu1/.local/bin/convert_augustus_to_gff3.py", line 133, in main
raise Exception("ERROR: GTF detected but gene row has bad 9th column format: {0}".format(cols[8]))
Exception: ERROR: GTF detected but gene row has bad 9th column format: jg33035

I have rerun Augustus with the -gff3 flag and tried to run the same command pasted above on the gff3 file but the following error:

Traceback (most recent call last):
File "/users/rketchu1/.local/bin/convert_augustus_to_gff3.py", line 189, in
main()
File "/users/rketchu1/.local/bin/convert_augustus_to_gff3.py", line 173, in main
raise Exception("ERROR: Found CDS column with parent ({0}) mRNA not yet in the file".format(parent_id))
Exception: ERROR: Found CDS column with parent (jg33035.t1) mRNA not yet in the file

I'm not entirely sure what is wrong with the file formatting. I am running Augustus 3.4.0 through Braker 2.1.5.

@jorvis jorvis self-assigned this May 11, 2021
@jorvis jorvis added the bug label May 11, 2021
@jorvis
Copy link
Owner

jorvis commented May 11, 2021

Hmm, in the documentation of the convert_augustus_to_gff3.py script it has an example gene block from Augustus. Can you see how yours compares with that structure? Maybe it has been updated. Regarding the 2nd error, after you've used the -gff3 flag, it seems the features are out of order with some being referenced before they are actually defined. This may be another augustus issue but possibly corrected with this script:

biocode/gff/correct_gff_feature_order.pl

Could you possibly copy/paste a block representing an entire gene's rows here into the ticket?

@remiketchum
Copy link
Author

remiketchum commented May 11, 2021

The first GTF that I get an error on (I put the - in front of the # just to copy paste below):

-# overlap start --------------------------------------------------------------------------------
-# this overlap has 1 different transcripts
-# This transcript jg33035.t1 is derived from g24141.t1 from the input file /scratch/rketchu1/Dovetail_Genome_EM/ANNOTATION/BRAKER3/augustus.Ppri5.gtf
-# It is supported by 0 other predicted genes
-# the core of this joined transcript has priority 2
spez_2 AUGUSTUS gene 44188381 44210443 . + . jg33035
spez_2 AUGUSTUS transcript 44188381 44210443 . + . transcript_id "jg33035.t1"; gene_id "jg33035"
spez_2 AUGUSTUS start_codon 44188381 44188383 . + 0 transcript_id "jg33035.t1"; gene_id "jg33035";
spez_2 AUGUSTUS CDS 44188381 44188722 1 + 0 transcript_id "jg33035.t1"; gene_id "jg33035";
spez_2 AUGUSTUS exon 44188381 44188722 . + . transcript_id "jg33035.t1"; gene_id "jg33035";

The gff3 file looks like this:

spez_2 AUGUSTUS gene 44188381 44210443 . + . ID=jg33035;
spez_2 AUGUSTUS mRNA 44188381 44210443 . + . ID=jg33035.t1;Parent=jg33035;
spez_2 AUGUSTUS start_codon 44188381 44188383 . + 0 ID=jg33035.t1.start1;Parent=jg33035.t1;
spez_2 AUGUSTUS CDS 44188381 44188722 1 + 0 ID=jg33035.t1.CDS1;Parent=jg33035.t1;
spez_2 AUGUSTUS exon 44188381 44188722 . + . ID=jg33035.t1.exon1;Parent=jg33035.t1;
spez_2 AUGUSTUS intron 44188723 44189267 . + . ID=jg33035.t1.intron1;Parent=jg33035.t1;
spez_2 AUGUSTUS CDS 44189268 44189333 1 + 0 ID=jg33035.t1.CDS2;Parent=jg33035.t1;
spez_2 AUGUSTUS exon 44189268 44189333 . + . ID=jg33035.t1.exon2;Parent=jg33035.t1;
spez_2 AUGUSTUS intron 44189334 44189619 . + . ID=jg33035.t1.intron2;Parent=jg33035.t1;
spez_2 AUGUSTUS CDS 44189620 44189671 1 + 0 ID=jg33035.t1.CDS3;Parent=jg33035.t1;

I definitely see some differences but not sure how to resolve the issue.

I am trying to locate correct_gff_feature_order.pl, it looks like my install of biocode is missing this script.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants