-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduces memory usage and improves speed #48
base: bloody
Are you sure you want to change the base?
Conversation
Current slicing was concatenating the whole chrosome sequence when only a slice was required. That required 2-3 GB of memory in some case and it was slow.
If my memories are correct, I change this part of code to return the correct sequence when insertion, deletion or indel are present. example: If we apply the slice on data (old version), the function will return only (X-1) nuc on the left, because one of them is deleted. But as you mentioned, apply the slice on sequence (new version 3bb0518 and 2fd3fbe) involve to load all the chromosome and increase memory usage and time. There are certainly a more efficient way to do it, but go back to the old version, just change the problem to somewhere else. |
Also have you try to scan your mutations by Chromosome? Something like:
|
I'm already processing by chromosome and it reloads the sequence for every transcript. |
Even with something like :
It's my last suggestion to help you. I work on other project for the moment, sorry but I don't have the time to improve this specific case. |
I can't apply this since I don't want to load the whole chromosome sequence to memory. Anyway I have solve my problem. Thanks for your help. |
What is your solution? |
I don't have a solution that fixes both the high memory usage and @ericloud bug. I found a way in my script to skip the call to _getSequence |
I also have noticed this drop in performances. This is something that would need to be improved in the future. |
Current slicing/join was concatenating the whole chromosome sequence when only a slice was required. That required 2-3 GB of memory in some case and it was slow in my use case:
Before
I suggest reverting back to original pyGeno code that was changed by @ericloud .
After
@ericloud
Is this a problem with your bug reported here:
2fd3fbe#diff-9a0352f44c9b0e9b00e4e2df44eae54b