Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support sets of 8-bit and 16-bit integers #19

Open
kytans opened this issue Dec 1, 2022 · 3 comments
Open

Support sets of 8-bit and 16-bit integers #19

kytans opened this issue Dec 1, 2022 · 3 comments

Comments

@kytans
Copy link

kytans commented Dec 1, 2022

Support sets of 8-bit and 16-bit integers

@droundy
Copy link
Owner

droundy commented Dec 1, 2022

These are already supported with Set64<u8> and Set64<u16>. If you are hoping for better optimized versions than that, it might be worth benchmarking the size and speed of what we've already got.

@kytans
Copy link
Author

kytans commented Dec 2, 2022

At least in the heap case, not having SetU16 and SetU8 seems like it would result in 2x-4x overhead, which is obviously not acceptable.

It seems like it should be possible to use a macro to generate all SetU#, or maybe even have a generic version for anything integer-like.

@droundy
Copy link
Owner

droundy commented Dec 4, 2022

No, there wouldn't be overhead, tinyset is optimized for size and for small values. I could do better for u8, e.g. always store a 32-byte dense bitmap when forced onto the heap, but probably not a factor of two better in size.

A Set64<u8> that is forced to the heap would either hold a dense bitmap when an extra 16 bytes storing the maximum value and the number of elements, or would store a few u64 holding bitmaps in their most significant bits, and an offset in the least significant bits, plus a 16 bytes to hold the number of elements and the maximum value. Which it holds would depend on how many elements there are relative to the maximum value in the set. I'm not sure off the top of my head how big the result would be, which is why I suggested starting with a benchmark.

You could also do better when not on the heap (for u8 since Set64 can only store 7 values without going to the heap, but you could do a little better if you knew no values exceeded 255.

For u16 it's very hard to imagine improving matters with a custom implementation, depending on the pattern of numbers stored. To do better than Set64<u16> you'd need to store numbers that are spaced by more than 60 or so, so that the bitmaps mostly hold only a single element. One assumption on Set64 is that values will tend to be clumped. If they aren't clumped then we will indeed take 64 bits per element.

A macro to generate similar implementations for each integer type wouldn't be practical because you aren't going to be using your SetU8 on an 8-bit computer, so you'd still want 64-but bitmaps. SetU32 is primarily implemented to enable SetUsize to work efficiently on 32-bit computers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants