Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support GFF with sequences in separate FASTA file #13

Open
peterjc opened this issue Sep 26, 2018 · 6 comments
Open

Support GFF with sequences in separate FASTA file #13

peterjc opened this issue Sep 26, 2018 · 6 comments

Comments

@peterjc
Copy link

peterjc commented Sep 26, 2018

The README says "The GFF must contain the reference sequence in Fasta format"

This seems to explain why our first attempt to use SnpEffWrapper failed (snpEff build could not find the FASTA files in the temporary directory). It would be nice to optionally allow passing a FASTA file for the assembly separately from the GFF file.

@peterjc
Copy link
Author

peterjc commented Sep 26, 2018

Should I file a separate issue on the unclear failure if used with a GFF file without embedded FASTA sequences? Looking at the code it seems to try to give a warning, https://github.com/sanger-pathogens/SnpEffWrapper/blob/v0.2.5/snpEffWrapper/wrapper.py#L297 - but wouldn't it be better to actually abort without calling snpEff build?

@GBeattie
Copy link

hey peterjc,

Might I ask how you merged your fasta and GFF in the end? I used cat .gff .fasta > .fasta.gff and I am getting the following error (which may be unrelated to how I've combined the files, but just want to check!)

[2018-10-19 15:45:10,601] INFO: Checking that the VCF and GFF contigs are consistent
[2018-10-19 15:45:11,724] INFO: Building snpeff database
Traceback (most recent call last):
File "/home/manager/miniconda3/envs/ddocent_env/lib/python3.6/site-packages/snpEffWrapper/wrapper.py", line 222, in _snpeff_build_database
subprocess.check_call(command, stdout=stdout, stderr=stderr)
File "/home/manager/miniconda3/envs/ddocent_env/lib/python3.6/subprocess.py", line 291, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['/usr/bin/java', '-Xmx4g', '-jar', '/media/sf_SharedDrive/Download/snpEff/snpEff.jar', 'build', '-gff3', '-verbose', 'data', '-c', '/home/manager/WGS/WGS analysis/snpeff_data_dir_hrk9sc02/config']' returned non-zero exit status 1.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/manager/miniconda3/envs/ddocent_env/bin/snpEffBuildAndRun", line 45, in
annotate_vcf(args)
File "/home/manager/miniconda3/envs/ddocent_env/lib/python3.6/site-packages/snpEffWrapper/wrapper.py", line 343, in annotate_vcf
args.vcf_file, config_filename, args.debug)
File "/home/manager/miniconda3/envs/ddocent_env/lib/python3.6/site-packages/snpEffWrapper/wrapper.py", line 275, in run_snpeff
build_stderr)
File "/home/manager/miniconda3/envs/ddocent_env/lib/python3.6/site-packages/snpEffWrapper/wrapper.py", line 224, in _snpeff_build_database
raise BuildDatabaseError("Problem building the database from your GFF")
snpEffWrapper.wrapper.BuildDatabaseError: Problem building the database from your GFF

@peterjc
Copy link
Author

peterjc commented Oct 19, 2018

What you are likely missing is the special line ##FASTA\n before the FASTA file starts with ">"...

Luckily I had documented this locally, I did it once by hand and then came up with the following as a reproducible alternative:

bash -c "cat annotation_only.gff; echo '##FASTA' ; cat reference.fasta" > annotation_with_fasta.gff

You could make a dummy file with the magic line, and then concatenate the three files (in order) to make your combined files, but I used the echo command here instead.

@GBeattie
Copy link

Hey peter!

Thanks for that, unfortunately the same error message appears after using your cat command to put in that extra line, so I'm back at square one I feel!

Thanks again,

Gordon

@peterjc
Copy link
Author

peterjc commented Oct 22, 2018

I suspect there is something else "wrong" with your GFF file then - I would suggest opening a new issue, and offering to share the files directly with the tool authors (or if you can, posting them online, e.g. via https://gist.github.com).

@GBeattie
Copy link

Yes I will open another issue as now it seems it's unrelated to how the fasta and GFF are merged.

Thanks,

G

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants