Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

high coverage fail #4

Open
calkan opened this issue Oct 18, 2018 · 10 comments
Open

high coverage fail #4

calkan opened this issue Oct 18, 2018 · 10 comments

Comments

@calkan
Copy link

calkan commented Oct 18, 2018

Hi

I am trying to simulate aDNA data at high coverage. I assume the "-c" parameter sets the overall depth of coverage. Is this correct, or does it set the endogenous coverage? I do this:

./gargammel.pl -c 30 --comp 0.7,0.05,0.25 -l 110 -rl 100 -SS HS25 -o data/70-5-25-40x data/

after quite a long time gargammel fails:

....
Produced 2,147,400,000
ERROR: Cannot add thousandSeparator to non-integer 2147500000
system cmd /mnt/compgen/homes/calkan/projects/ancient/gargammel/src/adptSim -f AGATCGGAAGAGCACACGTCTGAACTCCAGTCACCGATTCGATCTCGTATGCCGTCTTCTGCTTG -s AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATTT -l 100 -artp data/70-5-25-40x_a.fa data/70-5-25-40x_d.fa.gz failed: 256 at ./gargammel.pl line 79.

@grenaud
Copy link
Owner

grenaud commented Oct 19, 2018

I upgraded the isInt to accommodate up to unin64 in my little library libgab. Can you do a:

cd libgab
git status
git pull origin master
make clean 
make 
cd ..
make clean 
make

I hope this will not overflow to more than 4 billion fragments, yes -c is the endogenous coverage.

@calkan
Copy link
Author

calkan commented Oct 22, 2018

that problem is now gone, I think. I now have this error with ART though:

terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
system cmd /mnt/compgen/homes/calkan/projects/ancient/gargammel/art_src_MountRainier/art_illumina -ss HS25 -amp -na -p -i data/70-5-25-40x_a.fa -l 100 -c 1 -qs 0 -qs2 0 -o data/70-5-25-40x_s failed: 134 at ./gargammel.pl line 79.

@calkan
Copy link
Author

calkan commented Oct 22, 2018

ok that is probably because data/70-5-25-40x_s file is 917 GB for some reason. Am I doing this wrong?:

./gargammel.pl -c 30 --comp 0.7,0.05,0.25 -l 110 -rl 100 -SS HS25 -o data/70-5-25-40x data/

what I want to get is a total of 30X human genome coverage with 100 bp paired end reads (fragment 110). That should translate to 900M reads (450M pairs) of length 100bp. Of this data set, 70% should be bacterial, 25% endogenous, 5% present-day contamination. That's what I'm trying to get anyway, but I guess I misinterpret the -c parameter.

@grenaud
Copy link
Owner

grenaud commented Oct 22, 2018

The ART package cannot take zipped files. Hence we have to use plain files.

Can you do an ls -al in the directory data/70-5-25-40x data/

@grenaud
Copy link
Owner

grenaud commented Oct 22, 2018

can you also try to run art on a subset, do you still get the std:bac_alloc?

@calkan
Copy link
Author

calkan commented Oct 22, 2018

ls -l data/70-5-25-40x*
-rw-rw-r-- 1 calkan compgen 984014037789 Oct 21 22:50 data/70-5-25-40x_a.fa
-rw-rw-r-- 1 calkan compgen 91693121260 Oct 20 14:17 data/70-5-25-40x.b.fa.gz
-rw-rw-r-- 1 calkan compgen 7782171239 Oct 20 05:07 data/70-5-25-40x.c.fa.gz
-rw-rw-r-- 1 calkan compgen 177612870941 Oct 21 08:27 data/70-5-25-40x_d.fa.gz
-rw-rw-r-- 1 calkan compgen 78137570969 Oct 20 04:22 data/70-5-25-40x.e.fa.gz
-rw-rw-r-- 1 calkan compgen 0 Oct 21 22:50 data/70-5-25-40x_s1.fq

@calkan
Copy link
Author

calkan commented Oct 22, 2018

art works well with a small subset, no std:bad_alloc

@grenaud
Copy link
Owner

grenaud commented Oct 22, 2018 via email

@calkan
Copy link
Author

calkan commented Oct 22, 2018

ok. there are _b, _c files as well, should I repeat with them? What happens after that, is the ART output the final output?

@grenaud
Copy link
Owner

grenaud commented Oct 22, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants