Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTSlib not parsing CRAM files #67

Open
chrisclarkson opened this issue Nov 19, 2024 · 0 comments
Open

HTSlib not parsing CRAM files #67

chrisclarkson opened this issue Nov 19, 2024 · 0 comments

Comments

@chrisclarkson
Copy link

chrisclarkson commented Nov 19, 2024

Hello I am trying to process 1000s of CRAM files with EHDN (latest version installed via conda).
While it works fine for bam files the HTSlib seems to struggle to parse CRAM formatted data- which relates to the explanation at the bottom of this page (see "The REF_PATH and REF_CACHE" section): https://www.htslib.org/workflow/cram.html

Indeed the following command:

ExpansionHunterDenovo profile --reads in.cram --reference GRCh38.fa --output-prefix out

results in the error:

[W::find_file_url] Failed to open reference "http://www.ebi.ac.uk/ena/cram/md5/b0397179e5a92bb7a3300b68e45a9f72" Permission denied
.... 
[E::cram_next_slice] Failure to decode slice

The permission denied error is due to the fact that I am on a closed server and I can't download things onto it..

It has been suggested to me to convert the files to BAM format and then analyse- but this isn't really an option for me as the files that I have are large and there are 1000s of them.... I was hoping for a longer term solution?

I have tried the instructions at https://www.htslib.org/workflow/cram.html
where I use the command

./seq_cache_populate.pl -root /some_dir/cache GRCh38.fasta #works fine
export REF_PATH=/some_dir/cache/%2s/%2s/%s:http://www.ebi.ac.uk/ena/cram/md5/%s
export REF_CACHE=/some_dir/cache/%2s/%2s/%s

While the MD5 reference is made correctly, this still results in the same error as the conda installed binary executable seems to have its' own internal environment variables....

Hence I am wondering if I could I possibly edit the code before installing EHDN from source?
Is there some place in the script where I could set the environment variables ($REF_PATH) setting the path to the MD5 sums cache folder?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant