Cumbersome features #1

Proginski · 2021-12-06T08:54:38Z

In some gff files are features that cover most of the track.
For example : GCF_000247795.1
In the related gff file (enclosed), there is a feature named "match" that fully overlaps with the first chromosome
NC_032650.1 RefSeq region 1 161108492 . + . ID=NC_032650.1:1..161108492;Dbxref=taxon:9915;Name=1;breed=Nelore;chromosome=1;country=Brazil;gb-synonym=Bos taurus indicus;gbkey=Src;genome=chromosome;isolate=QUIL7308;mol_type=genomic DNA;note=animal owned by Agropecuaria Quilombo Inc.;sex=male;tissue-type=peripheral blood mononuclear cells
line num 37235:
NC_032650.1 RefSeq match 1 161108492 . + . ID=aln0;Target=NC_032650.1 1 161108492 +;gap_count=0;num_mismatch=0;pct_coverage=100;pct_identity_gap=100

In consequence orfget is not able to define any pure intergenic ORF :

NC_032650.1

ORF type Quantity Average length (aa)

c_CDS 7649 100.45
nc_ovp_opp-CDS 19987 58.68
nc_ovp_opp-cDNA_match 201 39.65
nc_ovp_opp-match 1983772 46.8
nc_ovp_same-CDS 11740 52.03
nc_ovp_same-cDNA_match 713 39.64
nc_ovp_same-lnc_RNA 15831 42.05
nc_ovp_same-mRNA 439133 44.33
nc_ovp_same-match 2449854 46.35
nc_ovp_same-pseudogene 10750 48.33
nc_ovp_same-tRNA 16 68.0
nc_ovp_same-transcript 281 65.47

Would it be possible as a preliminary step in orftrack, to exclude features whose region coverage exceeds lets say 90% to avoid this behavior ?

Meanwhile, since the 6 only genomes with this error I identified so far, all contain a 'match' feature, I suggest to simply add 'match' to line 597 of gff_parser.py
if element_type not in ['chromosome', 'region','match']:

nchenche · 2022-05-04T09:16:47Z

Hi Paul,

This is an old and resolved issue now but yes you were right.

Thanks !

Fadwa7 pushed a commit that referenced this issue Mar 28, 2024

#1 #2 Fix issues and Add data/scerevisae

5a3bfbb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cumbersome features #1

Cumbersome features #1

Proginski commented Dec 6, 2021 •

edited

Loading

nchenche commented May 4, 2022

Cumbersome features #1

Cumbersome features #1

Comments

Proginski commented Dec 6, 2021 • edited Loading

NC_032650.1

nchenche commented May 4, 2022

Proginski commented Dec 6, 2021 •

edited

Loading