Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gff3_to_gtf_converter.pl #1

Open
lmanchon opened this issue Jun 24, 2014 · 4 comments
Open

gff3_to_gtf_converter.pl #1

lmanchon opened this issue Jun 24, 2014 · 4 comments

Comments

@lmanchon
Copy link

--Hi,

recently i have used gff3_to_gtf_converter.pl script (https://github.com/vipints/converters/blob/master/gfftools/codebase/gff3_to_gtf_converter.pl)
to convert gff3 file to gtf as: ./gff3_to_gtf_converter.pl Strongylocentrotus_purpuratus.gff3 Strongylocentrotus_purpuratus.gtf
and it returns errors:

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: Object Bio::Annotation::SimpleValue=HASH(0x237fa30) was not valid with key type. If you were adding new keys in, perhaps you want to make use
of the archetype method to allow registration to a more basic type
STACK: Error::throw
STACK: Bio::Root::Root::throw /tools/perl5/modules/lib/perl5/Bio/Root/Root.pm:472
STACK: Bio::Annotation::Collection::add_Annotation /tools/perl5/modules/lib/perl5/Bio/Annotation/Collection.pm:360
STACK: Bio::SeqFeature::Annotated::add_Annotation /tools/perl5/modules/lib/perl5/Bio/SeqFeature/Annotated.pm:608
STACK: Bio::FeatureIO::gff::_handle_non_reserved_tag /tools/perl5/modules/lib/perl5/Bio/FeatureIO/gff.pm:830
STACK: Bio::FeatureIO::gff::_handle_feature /tools/perl5/modules/lib/perl5/Bio/FeatureIO/gff.pm:785
STACK: Bio::FeatureIO::gff::next_feature /tools/perl5/modules/lib/perl5/Bio/FeatureIO/gff.pm:174

STACK: ./gff3_to_gtf_converter.pl:35

i don't know what's wrong.

my gff3 input file is available here:
ftp://ftp.ensemblgenomes.org/pub/metazoa/release-22/gff3/strongylocentrotus_purpuratus/Strongylocentrotus_purpuratus.GCA_000002235.2.22.gff3.gz

Laurent --

@vipints
Copy link
Owner

vipints commented Jun 25, 2014

Hi Laurent,
I have moved this script from perl to python, The new repository and associated files can be found under https://github.com/vipints/GFFtools-GX.
Please let me know if you have any issues to access the above repo.

@lmanchon
Copy link
Author

no problem to acces this new repository,
so i have tested your python script gff_to_gtf.py but again it failed:

GFFtools-GX/gff_to_gtf.py Strongylocentrotus_purpuratus.gff3 > out.gtf
Traceback (most recent call last):
File "/save/dpiquemal/GFFtools-GX/gff_to_gtf.py", line 76, in
printGTF(Transcriptdb)
File "/save/dpiquemal/GFFtools-GX/gff_to_gtf.py", line 48, in printGTF
for idz, ex_cod in enumerate(exons):
TypeError: iteration over a 0-d array

@vipints
Copy link
Owner

vipints commented Jun 27, 2014

I have looked at your case, it looks like the original GFF3 file have some issues with the defining its parent and child features. For example, here the gene and transcript (parent) features are defined with a source field ensembl but the corresponding exon, CDS and UTRs (child) features are not connected with the same source, instead it is having . as the source field. As per the GFF3 specification, we can define multiple feature annotation using different source fields. So that means the same region of the genome can be represented in a single GFF3 file, annotated by ensembl pipeline and annotated by Refseq pipeline something like that.

In your case, to fix the error message, you can replace . source from the original GFF file with ensembl, as I am not seeing any other source in your GFF file which will not affect the feature annotations. You can use the following command to do the same.

sed -i 's/\./ensembl/' file_name

Essentially add the missing source to rest of the annotated features in the file.
Hope that helps, please let me know if that doesn't work.

@lmanchon
Copy link
Author

Le 27/06/2014 17:45, Vipin a écrit :

I have looked at your case, it looks like the original GFF3 file have
some issues with the defining its parent and child features. For
example, here the |gene| and |transcript| (parent) features are
defined with a source field |ensembl| but the corresponding exon, CDS
and UTRs (child) features are not connected with the same source,
instead it is having |.| as the source field. As per the GFF3
specification, we can define multiple feature annotation using
different source fields. So that means the same region of the genome
can be represented in a single GFF3 file, annotated by ensembl
pipeline and annotated by Refseq pipeline something like that.

In your case, to fix the error message, you can replace |.| source
from the original GFF file with |ensembl|, as I am not seeing any
other source in your GFF file which will not affect the feature
annotations. You can use the following command to do the same.

|sed -i 's/./ensembl/' file_name|

Essentially add the missing source to rest of the annotated features
in the file.
Hope that helps, please let me know if that doesn't work.

okay good your script works fine now, first i have run sed command.
and the resulting gtf file seems to be similar to the ensembl gtf file:

ftp://ftp.ensemblgenomes.org/pub/metazoa/release-22/gtf/strongylocentrotus_purpuratus/Strongylocentrotus_purpuratus.GCA_000002235.2.22.gtf.gz

wc -l Strongylocentrotus_purpuratus.GCA_000002235.2.22.gtf
431563 Strongylocentrotus_purpuratus.GCA_000002235.2.22.gtf

/GFFtools-GX/gff_to_gtf.py Strongylocentrotus_purpuratus.gff3 | wc -l
432859

best,
Laurent --


Reply to this email directly or view it on GitHub
#1 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants