-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
most input reads mysteriously discarded from xenome output #14
Comments
I was able to track down a copy of xenome 1.0.1 via a co-worker and am having the same issue with that version. |
Thank you! That helps a lot. I've been looking for an incompatibility, but it sounds like it's actually a longstanding bug. Everyone has probably worked out by now that you can make the bug go away, at the cost of it taking longer, by running the "index" command single-threaded:
|
So, that did not seem to fix the problem. I tried running The most interesting part is that the file size of the output index files does not change after the program starts outputting lines of text, ie.
So the file size of the output files did not change between 0 and 100 being printed to stdout, even tho at least 40 minutes to an hour passed. EDIT: I am now trying to run |
UPDATE: Using the "dna_sm" references from Ensembl the size of the index files is different, but the output index files also do not change during/after Subsequently running |
The problem is a concurrency issue. And it's a nasty one. This is why it works on short reference genomes: there isn't enough work for multithreading to be an issue. |
I have a co-worker who has said he has run xenome without issue using human/mouse reference genomes (not sure exactly which ones) in the past (two years or so ago). I've now tried running Are you certain that this is a problem that is due to Do you think it could work if I tried running this on a different server? Our cluster is running Is there anything else I could do that may resolve this? Is this a bug with the program or could recompiling it fix the issue? If the former, is there any outlook for a fix? Apologies for the many questions, thanks for your help! |
"I've now tried running xenome index with -T 1 and -T 8, could it potentially work with other values of -T?" If the bug is what we think it is, higher values should make it go faster and also increase the probability of xenome hanging. "Are you certain that this is a problem that is due to xenome index and not xenome classify?" Now that you mention it, no, I'm not certain. The only reason I said this is that nobody had reported an issue with classify yet, and we hadn't observed an issue with classify yet. "Do you think it could work if I tried running this on a different server?" It probably won't make a difference. "Is this a bug with the program or could recompiling it fix the issue? If the former, is there any outlook for a fix?" It's a bug. We are still working on it when we can. And, of course, we would eagerly accept patches! |
Weird. A co-worker tipped me to uncompress the fastq ( |
Same issue with |
I am trying to use Xenome to filter xenograft mouse reads from graft human reads. I am running into the issue that xenome classify finishes in seconds and returns kilobyte size output files without any discernible error message. Most the the reads seem to have been discarded and I can't figure out where they have gone or why this may have happened.
Is there any reason you can think of why this may have happened? I am using xenome 1.0.0.
This is the command I ran:
xenome classify --tmp-dir /fastscratch/XXX -v -P index --pairs -i /XXXX/Sample_17-86718-3/17-86718-3_S7_L002_R1_001.fastq.gz -i /XXXX/Sample_17-86718-3/17-86718-3_S7_L002_R2_001.fastq.gz --output-filename-prefix /XXX/86718-3 --graft-name human --host-name mouse --log-file logfile > statsfile
This is the output to the log file:
This is the stdout:
The input fastq files are large:
The output files are tiny:
The index took surprising little time to create, and also seems a bit small perhaps, but I don't know what it should be....
This is the command I used to create the index:
xenome index -M 24 -T 8 -P index -H Mus_musculus.GRCm38.dna_rm.primary_assembly.fa.gz -G Homo_sapiens.GRCh38.dna_rm.primary_assembly.fa.gz
The text was updated successfully, but these errors were encountered: