-
Notifications
You must be signed in to change notification settings - Fork 192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Representative sequences after deduplication not consistent between different runs #440
Comments
If you set the bash variable Since python 3.3, the hashing used in e.g dictionary keys is non-determininistic and are 'salted' with a unpredictable random values: https://docs.python.org/3.4/reference/datamodel.html#object.__hash__. I understand this is prevent DOS attacks. @IanSudbery - Should we add the above to the FAQ? |
Yes. I guess there is no way to hardcode this?
… |
Seems like it is possible: https://stackoverflow.com/questions/32538764/unable-to-see-or-modify-value-of-pythonhashseed-through-a-module. I think it would make sense from a user point of view if |
Has this actually been fixed in a release? I'm seeing the same non-deterministic behaviour in |
Hi @SPPearce - Sorry for the wasted time spent digging into how to make UMI-tools determininstic. We have two open PRs to deal with this (#365 & #470), and I have a separete idea I wanted to try as well. I'm optimistically hoping to decide which route to take this week and then issue a new version. I've been saying that for the past few weeks though 😬 |
See the outstanding #550 |
When I run
dedup
on the same BAM files twice, even with the same--random-seed
, the returned deduped BAM files have different sets of reads. This has a very small but non-zero effect on downstream analysis. Would it be possible to have completely consistent results between runs when random seed is the same?For context, the input BAM was coordinate-sorted, and generated using STAR.
dedup
was run with--random-seed 100 --spliced-is-unique --multimapping-detection-method=NH
.The text was updated successfully, but these errors were encountered: