Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

graphtyper genotype produces too many files for graphtyper vcf_concatenate to combine #156

Open
zachary-foster opened this issue Oct 17, 2024 · 1 comment

Comments

@zachary-foster
Copy link

This is because graphtyper genotype outputs many small VCFs. For long references with lots of samples, this creates so many files that the command line to graphtyper vcf_concatenate is too long for the shell to run:

.command.sh: line 2: /usr/local/bin/graphtyper: Argument list too long

For the dataset that caused this error, the command to graphtyper vcf_concatenate was 3 million characters long.
I know this is probably an unusual dataset and there are workarounds, like combining files in batches and combining them again, but we are using graphtyper in a automated pipeline that has to handle these cases, so it would be nice if this case was handled.

It would be nice if there was a way to make graphtyper genotype make fewer but larger files or make graphtyper vcf_concatenate accept a file-of-filenames like graphtyper genotype does.

@hannespetur
Copy link
Member

Hey, thanks for the suggestion. As a workaround you could pass a filelist to bcftools concat to concatenate the VCFs

# Make a vcf_filelist.txt
bcftools concat --naive -Oz -oall.vcf.gz --file-list vcf_filelist.txt

Best, Hannes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants