-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Silly question about ZODB #73
Comments
BTrees work fine in-memory without being persisted to ZODB (but note that you typically can't pickle them). One place where they're especially useful is if you're storing integers or floating point keys or values, since those are unwrapped to the C data type and as such use much less memory than Python objects. Your application may have different trade-offs than those that persist BTrees, so you might want to check the node sizes and possibly adust them. |
I'm curious about the node sizes and if I should use different values if I know the tree will never be persisted. Does it mean that for an Object-Object tree, the max amount of values to be set on the tree is 30*250? Seems quite low. Basically I'm loading some possibly large sorted list of objects indexed by a datetime value that needs to be queried thousands of times by a date range (reason it can't hit the database every time) and this BTree implementation seems to fit the job perfectly. |
IIRC the node sizes given are the maximum before the node splits into another level, thus impacting the height of the tree. Nodes split automatically when needed and AFAIK there is no artificial maximum number of keys one tree can hold.
My app has a similar index. We turn the datetimes into their floating point timestamps so we can use an optimized tree (actually, we turn them into an integer representation of their floating point timestamp so we can normalize and use an even more optimized tree; this is a common approach, although it may not directly work for you). |
I see, I guess I will need to test various values although I think it shouldn't really matter. Using float or ints rather than datetimes is a good idea, the overhead of converting from datetime might be too much and negate any improvements since it's more about speed than actual memory usage, but I will give it a try, thanks for the help! |
Probably not much in your case, it's more about persistence, but there is a performance component to it (e.g., larger buckets mean less pointer chasing means better cache locality, but maybe more searches)?
Right, as in all things optimization: profile first! (Actually, profile third!) I'm going to go ahead and close this now. |
Reading the docs seems this library is made to work to ZODB, however, I'm trying to use this simply as an in memory BTree to store some ordered data and be able to efficiently query it after, since it's implemented in C (or the critical parts at least) it seems to yield slightly better results than using a simple python list with a bisect algorithm, not to mention a much cleaner syntax for range searches.
Am I using this wrong since it's optimized around ZODB? Should I consider some other implementation?
The text was updated successfully, but these errors were encountered: