Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QUESTION] support custom types (i256) #93

Open
spebern opened this issue Mar 19, 2023 · 5 comments
Open

[QUESTION] support custom types (i256) #93

spebern opened this issue Mar 19, 2023 · 5 comments
Assignees
Labels
enhancement New feature or request question Further information is requested
Milestone

Comments

@spebern
Copy link

spebern commented Mar 19, 2023

Question

What is the best way of using the lexical optimized parsing routines for
a custom type (arbitrary size integers e.g. 256bit)?

Background:
In arrow the decimal type uses an underlying i256 (https://github.com/apache/arrow-rs/blob/7bf7ea5e341c15dbd8653b16413459f5fa4784eb/arrow-buffer/src/bigint.rs#LL26-L29C2). However, lexical only
supports the rust native types (as far as I understand it). What is the best
way of parsing into arbitrary size integers/byte arrays using lexical?

Thanks!

@spebern spebern added the question Further information is requested label Mar 19, 2023
@spebern spebern changed the title [QUESTION] [QUESTION] support custom types (i256) Mar 20, 2023
@Alexhuszagh
Copy link
Owner

Alexhuszagh commented Sep 9, 2024

Let me think about a good way to do this: we might be able to use 2 128-bit high/low values and parse from there, since multiplication and addition is quite easy like that. We'd have to benchmark the performance. Is there still interest in this? I could add a i256 feature.

Ideally, this would be done internally with a custom i256 type.

@Alexhuszagh Alexhuszagh added this to the 1.0 milestone Sep 9, 2024
@Alexhuszagh Alexhuszagh added the enhancement New feature or request label Sep 9, 2024
@spebern
Copy link
Author

spebern commented Sep 11, 2024

I am not working on anything related to arrow currently, but lexical is still being used there. Maybe @tustvold as a maintainer of arrow can answer this. This was related to a pull request: apache/arrow-rs#3854

@Alexhuszagh
Copy link
Owner

Sounds good this seems like a very doable request so I'll put it as an enhancement after our release later this week.

@Alexhuszagh
Copy link
Owner

I'm making serious progress on this, sorry for the delay but we have most of the base logic and then we can implement the parsing logic quite easily into this.

@Alexhuszagh
Copy link
Owner

Alexhuszagh commented Dec 30, 2024

I've got the higher order integers, which I needed to optimize from other big integer implementations here. It's got very efficient multiplication and division, especially with scalars, which since we'll be breaking these operations into groups of 19 digits lets us do this very efficiently.

For example, multiplying u256 * u256 is ~2ns on my machine, but u256 * u64 is ~1ns, and this is true no matter the size of the value. So, if we're processing ~75 digits (78 is the max for a 256-bit integer), we can break that into 4 chunks of u64, which would make the overhead of big integer operations much less prohibitive.

This is especially true for division, where I've decreased the time from ~150ns from other libraries (in bad cases, these can even go to ~400us) to ~7ns (slow for integer operations, but not unworkably so, and highly optimal cases can go down to ~2ns), which means all the base logic should be there.

I'll then implement it for the types present, add some benchmarks, and we'll go from there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants