Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GFF to BB #44

Open
EdwinvZon opened this issue Oct 26, 2017 · 12 comments
Open

GFF to BB #44

EdwinvZon opened this issue Oct 26, 2017 · 12 comments

Comments

@EdwinvZon
Copy link

I have been following the instructions on the docker hub (https://hub.docker.com/r/ave2/allelic-variation-explorer/) to input gff data in to the system.
First I get an error when running the bedToBigBed script:
pass1 - making usageList (7 chroms): 141 millis
Trailing characters parsing signed integer in field 5 line 1 of /data/glimmer.bed, got .

This might be related to the 6th column of the GFF file. So I changed that column to '1'.
And repeat the steps, then again when I get to the bedToBigBed script I get a list of errors:
Traceback (most recent call last):
File "/opt/conda/envs/ave2/bin/avedata", line 11, in
load_entry_point('avedata', 'console_scripts', 'avedata')()
File "/opt/conda/envs/ave2/lib/python3.5/site-packages/click/core.py", line 722, in call
return self.main(*args, **kwargs)
File "/opt/conda/envs/ave2/lib/python3.5/site-packages/click/core.py", line 697, in main
rv = self.invoke(ctx)
File "/opt/conda/envs/ave2/lib/python3.5/site-packages/click/core.py", line 1066, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/opt/conda/envs/ave2/lib/python3.5/site-packages/click/core.py", line 895, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/opt/conda/envs/ave2/lib/python3.5/site-packages/click/core.py", line 535, in invoke
return callback(*args, **kwargs)
File "/app/avedata/commands.py", line 48, in register
register_file(datatype, filename, genome, species)
File "/app/avedata/commands.py", line 61, in register_file
genes_2_whoosh(filename, whoosh_dir)
File "/app/avedata/genes.py", line 29, in genes_2_whoosh
gene_id=fields[12],
IndexError: list index out of range

Have you seen this before and how do I fix it?

@sverhoeven
Copy link
Contributor

I have not seen this error before.

So you have a created a bigbed file using bedToBigBed.
From the traceback I see you did a avedata register ... --datatype genes ... command and that your bigbed files does not have all the columns that is expected for bigbed file with genes.

Gene bigbed files are expected to have start/stop codon and exons for each gene. For example in bed format:

SL2.40ch06	3687	7998	Solyc06g005000.2.1	0	+	3705	7789	0	3	720,752,336,	0,1165,3975,	Solyc06g005000.2	IBR finger domain-containing protein (AHRD V1 *--- C1GML6_PARBD)%3B contains Interpro domain(s)  IPR002867  Zinc finger%2C C6HC-type

If you used the bedToBigBed -tab -type=bed6+4 -as=gff3.as ... command.

Could you try to use the avedata register ... --datatype features ... command instead?

@EdwinvZon
Copy link
Author

Thanks for the suggestion, using the 'features' datatype finish without error's but did not show anything in the viewer.
This are the files I have used.
glimmer.zip

@sverhoeven
Copy link
Contributor

sverhoeven commented Oct 30, 2017

Good to hear. The bigbed file looks correct.
I suspect the viewer can't read the bigbed file and does not give a proper error.

The viewer fetches the bigbed file using a http(s) url. During the registration you need to use a url to the big bed file that the viewer running in a web browser can read.

What is the command you used to register?
And what does the web api endpoint http://<aveserver>/api/genomes/<genome id> return?

@EdwinvZon
Copy link
Author

Thank you for your time, so far.
This is what I get:
{
"accessions": [
"487_206a",
"499_206B",
"500_206C",
"501_206D",
"502_206E",
"503_206F"
],
"chromosomes": [
{
"chrom_id": "Chr1",
"length": 30427671
},
{
"chrom_id": "Chr2",
"length": 19698289
},
{
"chrom_id": "Chr3",
"length": 23459830
},
{
"chrom_id": "Chr4",
"length": 18585056
},
{
"chrom_id": "Chr5",
"length": 26975502
},
{
"chrom_id": "chloroplast",
"length": 154478
},
{
"chrom_id": "mitochondria",
"length": 366924
}
],
"feature_tracks": [
{
"label": "glimmer",
"url": "/data/glimmer.bb"
}
],
"gene_track": "/data/glimmer.bb",
"genome_id": "TAIR10",
"reference": "/data/genome.2bit",
"species": {
"name": "A.tal",
"species_id": "A.tal"
}
}

Hopefully this helps.

@sverhoeven
Copy link
Contributor

To find out why the non-gene features are not showing up, I need halve a day to reproduce and fix. To write a script which can convert a gff file with gene features (exon, cds) I will need 2 days.

@JorisBenschop
Copy link

Lets revisit whether such scripts are within scope of AVE2

@sverhoeven
Copy link
Contributor

There are discussions with similar problem, for example https://www.biostars.org/p/85869/
They point to https://github.com/vipints/converters/blob/master/gfftools/codebase/gff3_to_bed_converter.py which could be tested and enhanced with geneId and name columns.

@sverhoeven
Copy link
Contributor

sverhoeven commented Jan 23, 2018

Tested sample gff from RijkZwaan with gff3_to_bed_converter.py, but did not output anything.

@sverhoeven
Copy link
Contributor

sverhoeven commented Jan 25, 2018

Tried UCSC utilities on sorted gff got OK looking bed file with the following commands:

wget http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/gff3ToGenePred http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/genePredToBed
chmod +x gff3ToGenePred genePredToBed

# Make sure column seperator is tab
perl -p -e 's/ /\t/g' foo.gff > foo.tabbed.gff

# prepend `##gff-version 3` to foo.tabbed.gff file

./gff3ToGenePred foo.tabbed.gff foo.tabbed.gp
./genePredToBed foo.tabbed.gp foo.tabbed.bed

# ave expects additonal columns geneId and Name, for now copy 4th column to 13th and 14th
perl -n -e 'chomp;@F=split("\t", $_);push(@F,$F[3]);push(@F,$F[3]);print join("\t", @F),"\n"' foo.tabbed.bed > foo.tabbed.bed14

# Use the foo.tabbed.bed14 file as input for the bedToBigBed

@EdwinvZon
Copy link
Author

I have had a lot of problems running your workflow, making it necessary to change somethings in the start file. But now I have run into something I can not fix.
I include all the files I have used and made and also the workflow and commands I used to make them.
The files are in 1 tar file which you can download here: https://rijkzwaan.sharefile.eu/d-s5cac9a290a54437a
Hopefully you can help me out?

@sverhoeven
Copy link
Contributor

You where using the wrong bedToBigBed command, that command was for non-gene features for genes you have to use the -type=bed12+2 flag.

The bed to big bed command should be

bedToBigBed -tab -type=bed12+2 genemodels_sorted_TAB.bed14_v2 chrom.sizes genemodels_sorted_TAB.bb

Output:

pass1 - making usageList (226 chroms): 20 millis
pass2 - checking and writing primary data (56819 records, 14 fields): 326 millis

@sverhoeven
Copy link
Contributor

A screenshot was received where the genes bb file with 242 chromosomes was requested with Range: bytes=2411691-NaN causing a error response with 416 Requested Range Not Satisfiable.

This error is similar as hammerlab/pileup.js#446 and should be fixed in the application as nlesc-ave/ave-app#37.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants