-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
processing time streamlined #47
Comments
@vpnagraj can you comment on this? would it be possible? It would dramatically speed up compute time if containers could pre-load the databases. |
@nsheff well i think this would have serious implications for the memory usage, right? i just tried loading a couple of region dbs outside of the app, Core hg19 and LOLAroadmap hg19, and they were ~ 1GB and ~ 3GB respectively:
so for this to work as you've described we'd need to preload all of the regiondbs, and if just those two alone are hogging > 4GB of memory, i'm not sure if we'll have enough memory in the containers to do the rest of the computing |
yeah, makes sense... So it's a tradeoff of memory and time. What if we kept a few containers pre-loaded, sacrificing a few gigs of memory, but the users would get the compute done in just a few seconds? The only advantage of doing it the other way is you save a few gigs of memory (when the process isn't being used)... but if multiple people are using it you're consuming that memory in duplicate. But look, even better; we can scale it with user increase, using only a single representation of data in memory... You can share memory between R processes. Sweet... this video is really worth watching, it is great. we would use this idea for the containers. Giving us not only a speed advantage but a memory advantage. This opens up all kinds of possibilities. |
also why would we need to preload all of them? |
@nsheff That video is interesting (okay yes I got pulled in by your "this video is really worth watching" line). Redis might be worth considering, as a pool in which we dump all reference datasets -- everything remains in memory, super-fast, is already containerized, and can be quite large. Just another way to share things between containers. |
@vpnagraj where are we on the sharing processes/redis idea? |
tl;dr ... i think we should skip the socket idea and just use redis i've been looking into the i am able to successfully use the however the performance with even a moderate sized vector ( i've written up a gist with some code to benchmark the methods but basically to retrieve the package manual speaks to this:
i haven't dug too deep on the "Rserver" option but i think that might be referring to the either way at this point i'm not convinced that this memory sharing concept is worth the overhead @nsheff ... 👍 to move forward with |
Interesting. I don't suppose it would make sense to do the inverse? (move the smaller query files over and then perform the computation in the svSocket server that already has the big files loaded)? Or, even better -- don't even read the uploaded file into the client session; just use It looks like you're right about the |
One other thought: would this method dramatically reduce memory required by making it so all the little child processes didn't even need to load the database? In any case, I think it's fine to just move forward with the redis method. I have two thoughts in that regard:
|
Mongo actually has an even smaller limit for keys – 16MB I think, and not 512MB – so that would probably make things more difficult, plus Mongo will have to do actual reads and not have keys available in memory like Redis does.
Sounds smart to work this into simpleCache instead of LOLA, so I think your idea is smart.
Neal
From: Nathan Sheffield <[email protected]>
Reply-To: databio/LOLAweb <[email protected]>
Date: Thursday, July 5, 2018 at 11:36 AM
To: databio/LOLAweb <[email protected]>
Cc: Neal Magee <[email protected]>, Mention <[email protected]>
Subject: Re: [databio/LOLAweb] processing time streamlined (#47)
One other thought: would this method dramatically reduce memory required by making it so all the little child processes didn't even need to load the database?
In any case, I think it's fine to just move forward with the redis method. I have two thoughts in that regard:
1. I think this functionality should be put in simpleCache, not in LOLAweb, so it's more universally useful.
2. is it possible that something like mongoDB could be used for this instead?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#47 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AAqtlus_CZxlxGMPYY-i3HkZOLzROSNfks5uDjKKgaJpZM4SeyVM>.
|
Is the key size a problem? or do you mean value size? but good point about the memory thing. |
Yeah sorry, the value of a key is limited.
Oddly, when the heavy users talk about limits they just refer to them as keys. But actual keys themselves have to be super small.
Neal
On Jul 5, 2018, at 11:58 AM, Nathan Sheffield <[email protected]<mailto:[email protected]>> wrote:
Mongo actually has an even smaller limit for keys – 16MB I think, and not 512MB
Is the key size a problem? or do you mean value size?
but good point about the memory thing.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#47 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AAqtljDpbY252X3g-2S5jJDekBRf-aJbks5uDjejgaJpZM4SeyVM>.
|
If we can load all the libraries in a global area, then what stops us from also doing this with the database caches, so they don't need to be re-loaded in each container?
The text was updated successfully, but these errors were encountered: