Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

samtools collate fails when /tmp runs out of disk space #148

Open
benjschiller opened this issue May 7, 2024 · 1 comment
Open

samtools collate fails when /tmp runs out of disk space #148

benjschiller opened this issue May 7, 2024 · 1 comment

Comments

@benjschiller
Copy link

benjschiller commented May 7, 2024

Hi,

We're seeing the convertCRAMtoFASTQ task fail for large CRAMs (> 60 GB). It seems that that samtools collate command is writing intermediate files to /tmp, which is outside the /cromwell_root volume (EDIT: we're running this in Terra using the version from Dockstore) in the container. I noticed this behavior elsewhere, and had to either set TMPDIR or provide an explicit prefix with a path inside the desired volume (as described here https://www.htslib.org/doc/samtools-collate.html)

Here is an example log message before the container is reclaimed (usually repeated several times):

samtools collate: Couldn't write to intermediate file "/tmp/collatef.0004.bam": No such file or directory

Relevant code:

if [ ~{in_paired_reads} == true ]
then
samtools collate -@ ~{half_cores} --reference ~{in_ref_file} -Ouf ~{in_cram_file} | samtools fastq -@ ~{half_cores} -1 reads.R1.fastq.gz -2 reads.R2.fastq.gz -0 reads.o.fq.gz -s reads.s.fq.gz -c 1 -N -
else
samtools fastq -@ ~{in_cores} -o reads.R1.fastq.gz -c 1 --reference ~{in_ref_file} ~{in_cram_file}
fi

@jjfarrell
Copy link

I am not that familiar with Terra and the google enviroment. But there are probably a couple aproaches to address this by modifying the wdl script.

  1. Modify samtools collate command to point to a temp directory -Edit the wdl script to update the samtools command to point to a writable volume that is mounted on the docker image

samtools collate -@ ~{half_cores} --reference ~{in_ref_file} -Ouf ~{in_cram_file} ~{tmp_dir} |samtools fastq -@ ~{half_cores} -1 reads.R1.fastq.gz -2 reads.R2.fastq.gz -0 reads.o.fq.gz -s reads.s.fq.gz -c 1 -N -
or
samtools collate -T ~{tmp_dir} -@ ~{half_cores} --reference ~{in_ref_file} -Ouf ~{in_cram_file} |samtools fastq -@ ~{half_cores} -1 reads.R1.fastq.gz -2 reads.R2.fastq.gz -0 reads.o.fq.gz -s reads.s.fq.gz -c 1 -N -

  1. Modify wdl runtime env to mount a disk as /tmp with enough storage
    runtime {
    disks: "local-disk 100 SSD, /tmp 180 SSD"
    }
    This specified 3x the cram size.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants