Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Silly question about ZODB #73

Closed
cristianocca opened this issue Jun 7, 2017 · 5 comments
Closed

Silly question about ZODB #73

cristianocca opened this issue Jun 7, 2017 · 5 comments

Comments

@cristianocca
Copy link

Reading the docs seems this library is made to work to ZODB, however, I'm trying to use this simply as an in memory BTree to store some ordered data and be able to efficiently query it after, since it's implemented in C (or the critical parts at least) it seems to yield slightly better results than using a simple python list with a bisect algorithm, not to mention a much cleaner syntax for range searches.

Am I using this wrong since it's optimized around ZODB? Should I consider some other implementation?

@jamadden
Copy link
Member

jamadden commented Jun 7, 2017

BTrees work fine in-memory without being persisted to ZODB (but note that you typically can't pickle them). One place where they're especially useful is if you're storing integers or floating point keys or values, since those are unwrapped to the C data type and as such use much less memory than Python objects.

Your application may have different trade-offs than those that persist BTrees, so you might want to check the node sizes and possibly adust them.

@cristianocca
Copy link
Author

I'm curious about the node sizes and if I should use different values if I know the tree will never be persisted. Does it mean that for an Object-Object tree, the max amount of values to be set on the tree is 30*250? Seems quite low.

Basically I'm loading some possibly large sorted list of objects indexed by a datetime value that needs to be queried thousands of times by a date range (reason it can't hit the database every time) and this BTree implementation seems to fit the job perfectly.

@jamadden
Copy link
Member

jamadden commented Jun 7, 2017

Does it mean that for an Object-Object tree, the max amount of values to be set on the tree is 30*250?

IIRC the node sizes given are the maximum before the node splits into another level, thus impacting the height of the tree. Nodes split automatically when needed and AFAIK there is no artificial maximum number of keys one tree can hold.

possibly large sorted list of objects indexed by a datetime value

My app has a similar index. We turn the datetimes into their floating point timestamps so we can use an optimized tree (actually, we turn them into an integer representation of their floating point timestamp so we can normalize and use an even more optimized tree; this is a common approach, although it may not directly work for you).

@cristianocca
Copy link
Author

I see, I guess I will need to test various values although I think it shouldn't really matter.

Using float or ints rather than datetimes is a good idea, the overhead of converting from datetime might be too much and negate any improvements since it's more about speed than actual memory usage, but I will give it a try, thanks for the help!

@jamadden
Copy link
Member

jamadden commented Jun 7, 2017

I think it shouldn't really matter.

Probably not much in your case, it's more about persistence, but there is a performance component to it (e.g., larger buckets mean less pointer chasing means better cache locality, but maybe more searches)?

Using float or ints rather than datetimes is a good idea, the overhead of converting from datetime might be too much and negate any improvements since it's more about speed than actual memory usage, but I will give it a try

Right, as in all things optimization: profile first! (Actually, profile third!)

I'm going to go ahead and close this now.

@jamadden jamadden closed this as completed Jun 7, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants