-
-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Grapheme segmentation is 1.2x-8x slower than unicode-segmentation
in benchmarks
#179
Comments
Here are my results:
And in a much easier to read format:
So this crate's implementation is universally slower, but it's not 5-7x slower. It's 1.2x-8x slower. Which means it depends on the specific workload. I personally don't have any plans to work on this in the immediate future, but other folks are welcome to work on it. If you think it will take significant changes to fix, we should discuss first. But there are perhaps some easy wins. Not sure. |
unicode-segmentation
in benchmarksunicode-segmentation
in benchmarks
I'm not sure how this ever worked? Apparently, a mutable ref to `corpus` was being taken and thus emptied out after the first use. We fix it by re-assigning for each run. Ref #179
This is fixed on |
... and except for at least several
Otherwise I'm not really sure what to do with this issue. It seems more like a general complaint "hey some things aren't as fast as std" instead of something actionable. So I'm going to close it. And note that some benchmarks are likely impossible for |
Came over from unicode-rs/unicode-segmentation#46 looking for something that can do single pass lossy UTF-8 decode and grapheme iteration. Thank you for your efforts, of course.
Unfortunately it's quite significantly behind unicode-segmentation in performance, despite the benchmarks using only valid UTF-8 inputs.
Benchmarks were done on a Debian 12 Linux machine with an Intel i5-7500T CPU and 16 GB of RAM, using stable rustc 1.75.0
graphemes_bench.txt
Other benchmarks also tended to trail ~3-4x behind
std
, except forto_str
andto_str_lossy_valid
.The benchmark also panicked partway through:
The text was updated successfully, but these errors were encountered: