GFF to BB #44

EdwinvZon · 2017-10-26T13:12:04Z

I have been following the instructions on the docker hub (https://hub.docker.com/r/ave2/allelic-variation-explorer/) to input gff data in to the system.
First I get an error when running the bedToBigBed script:
pass1 - making usageList (7 chroms): 141 millis
Trailing characters parsing signed integer in field 5 line 1 of /data/glimmer.bed, got .

This might be related to the 6th column of the GFF file. So I changed that column to '1'.
And repeat the steps, then again when I get to the bedToBigBed script I get a list of errors:
Traceback (most recent call last):
File "/opt/conda/envs/ave2/bin/avedata", line 11, in
load_entry_point('avedata', 'console_scripts', 'avedata')()
File "/opt/conda/envs/ave2/lib/python3.5/site-packages/click/core.py", line 722, in call
return self.main(*args, **kwargs)
File "/opt/conda/envs/ave2/lib/python3.5/site-packages/click/core.py", line 697, in main
rv = self.invoke(ctx)
File "/opt/conda/envs/ave2/lib/python3.5/site-packages/click/core.py", line 1066, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/opt/conda/envs/ave2/lib/python3.5/site-packages/click/core.py", line 895, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/opt/conda/envs/ave2/lib/python3.5/site-packages/click/core.py", line 535, in invoke
return callback(*args, **kwargs)
File "/app/avedata/commands.py", line 48, in register
register_file(datatype, filename, genome, species)
File "/app/avedata/commands.py", line 61, in register_file
genes_2_whoosh(filename, whoosh_dir)
File "/app/avedata/genes.py", line 29, in genes_2_whoosh
gene_id=fields[12],
IndexError: list index out of range

Have you seen this before and how do I fix it?

sverhoeven · 2017-10-26T15:04:53Z

I have not seen this error before.

So you have a created a bigbed file using bedToBigBed.
From the traceback I see you did a avedata register ... --datatype genes ... command and that your bigbed files does not have all the columns that is expected for bigbed file with genes.

Gene bigbed files are expected to have start/stop codon and exons for each gene. For example in bed format:

SL2.40ch06	3687	7998	Solyc06g005000.2.1	0	+	3705	7789	0	3	720,752,336,	0,1165,3975,	Solyc06g005000.2	IBR finger domain-containing protein (AHRD V1 *--- C1GML6_PARBD)%3B contains Interpro domain(s)  IPR002867  Zinc finger%2C C6HC-type

If you used the bedToBigBed -tab -type=bed6+4 -as=gff3.as ... command.

Could you try to use the avedata register ... --datatype features ... command instead?

EdwinvZon · 2017-10-27T08:53:23Z

Thanks for the suggestion, using the 'features' datatype finish without error's but did not show anything in the viewer.
This are the files I have used.
glimmer.zip

sverhoeven · 2017-10-30T09:09:22Z

Good to hear. The bigbed file looks correct.
I suspect the viewer can't read the bigbed file and does not give a proper error.

The viewer fetches the bigbed file using a http(s) url. During the registration you need to use a url to the big bed file that the viewer running in a web browser can read.

What is the command you used to register?
And what does the web api endpoint http://<aveserver>/api/genomes/<genome id> return?

EdwinvZon · 2017-10-31T15:49:41Z

Thank you for your time, so far.
This is what I get:
{
"accessions": [
"487_206a",
"499_206B",
"500_206C",
"501_206D",
"502_206E",
"503_206F"
],
"chromosomes": [
{
"chrom_id": "Chr1",
"length": 30427671
},
{
"chrom_id": "Chr2",
"length": 19698289
},
{
"chrom_id": "Chr3",
"length": 23459830
},
{
"chrom_id": "Chr4",
"length": 18585056
},
{
"chrom_id": "Chr5",
"length": 26975502
},
{
"chrom_id": "chloroplast",
"length": 154478
},
{
"chrom_id": "mitochondria",
"length": 366924
}
],
"feature_tracks": [
{
"label": "glimmer",
"url": "/data/glimmer.bb"
}
],
"gene_track": "/data/glimmer.bb",
"genome_id": "TAIR10",
"reference": "/data/genome.2bit",
"species": {
"name": "A.tal",
"species_id": "A.tal"
}
}

Hopefully this helps.

sverhoeven · 2017-11-07T14:07:00Z

To find out why the non-gene features are not showing up, I need halve a day to reproduce and fix. To write a script which can convert a gff file with gene features (exon, cds) I will need 2 days.

JorisBenschop · 2017-11-08T10:16:55Z

Lets revisit whether such scripts are within scope of AVE2

sverhoeven · 2018-01-23T12:57:43Z

There are discussions with similar problem, for example https://www.biostars.org/p/85869/
They point to https://github.com/vipints/converters/blob/master/gfftools/codebase/gff3_to_bed_converter.py which could be tested and enhanced with geneId and name columns.

sverhoeven · 2018-01-23T13:00:31Z

Tested sample gff from RijkZwaan with gff3_to_bed_converter.py, but did not output anything.

sverhoeven · 2018-01-25T14:56:26Z

Tried UCSC utilities on sorted gff got OK looking bed file with the following commands:

wget http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/gff3ToGenePred http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/genePredToBed
chmod +x gff3ToGenePred genePredToBed

# Make sure column seperator is tab
perl -p -e 's/ /\t/g' foo.gff > foo.tabbed.gff

# prepend `##gff-version 3` to foo.tabbed.gff file

./gff3ToGenePred foo.tabbed.gff foo.tabbed.gp
./genePredToBed foo.tabbed.gp foo.tabbed.bed

# ave expects additonal columns geneId and Name, for now copy 4th column to 13th and 14th
perl -n -e 'chomp;@F=split("\t", $_);push(@F,$F[3]);push(@F,$F[3]);print join("\t", @F),"\n"' foo.tabbed.bed > foo.tabbed.bed14

# Use the foo.tabbed.bed14 file as input for the bedToBigBed

EdwinvZon · 2018-01-31T13:18:15Z

I have had a lot of problems running your workflow, making it necessary to change somethings in the start file. But now I have run into something I can not fix.
I include all the files I have used and made and also the workflow and commands I used to make them.
The files are in 1 tar file which you can download here: https://rijkzwaan.sharefile.eu/d-s5cac9a290a54437a
Hopefully you can help me out?

sverhoeven · 2018-02-02T09:50:27Z

You where using the wrong bedToBigBed command, that command was for non-gene features for genes you have to use the -type=bed12+2 flag.

The bed to big bed command should be

bedToBigBed -tab -type=bed12+2 genemodels_sorted_TAB.bed14_v2 chrom.sizes genemodels_sorted_TAB.bb

Output:

pass1 - making usageList (226 chroms): 20 millis
pass2 - checking and writing primary data (56819 records, 14 fields): 326 millis

sverhoeven · 2018-11-01T09:51:56Z

A screenshot was received where the genes bb file with 242 chromosomes was requested with Range: bytes=2411691-NaN causing a error response with 416 Requested Range Not Satisfiable.

This error is similar as hammerlab/pileup.js#446 and should be fixed in the application as nlesc-ave/ave-app#37.

sverhoeven added 2day halveday labels Nov 7, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GFF to BB #44

GFF to BB #44

EdwinvZon commented Oct 26, 2017

sverhoeven commented Oct 26, 2017

EdwinvZon commented Oct 27, 2017

sverhoeven commented Oct 30, 2017 •

edited

Loading

EdwinvZon commented Oct 31, 2017

sverhoeven commented Nov 7, 2017

JorisBenschop commented Nov 8, 2017

sverhoeven commented Jan 23, 2018

sverhoeven commented Jan 23, 2018 •

edited

Loading

sverhoeven commented Jan 25, 2018 •

edited

Loading

EdwinvZon commented Jan 31, 2018

sverhoeven commented Feb 2, 2018

sverhoeven commented Nov 1, 2018

GFF to BB #44

GFF to BB #44

Comments

EdwinvZon commented Oct 26, 2017

sverhoeven commented Oct 26, 2017

EdwinvZon commented Oct 27, 2017

sverhoeven commented Oct 30, 2017 • edited Loading

EdwinvZon commented Oct 31, 2017

sverhoeven commented Nov 7, 2017

JorisBenschop commented Nov 8, 2017

sverhoeven commented Jan 23, 2018

sverhoeven commented Jan 23, 2018 • edited Loading

sverhoeven commented Jan 25, 2018 • edited Loading

EdwinvZon commented Jan 31, 2018

sverhoeven commented Feb 2, 2018

sverhoeven commented Nov 1, 2018

sverhoeven commented Oct 30, 2017 •

edited

Loading

sverhoeven commented Jan 23, 2018 •

edited

Loading

sverhoeven commented Jan 25, 2018 •

edited

Loading