-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up encode() #20
Conversation
Co-authored-by: Kartikaya Gupta (kats) <[email protected]>
Question - would you consider a bit of unsafe code to speed this up further? I can get another 50% speedup with something like this:
That's better than pushing each byte onto the result one by one. It's safe if the index-finding logic is right and if the code ensures that that range contains only valid utf8 bytes. I believe we can rely on both, but I do understand if unsafe code doesn't seem worth it. After that change, this version is just as fast as zoomascii, which uses a C implementation. |
Sorry, but no. I'd like to stay away from adding any unsafe code. I spent some more time looking at this patch and while I believe it works correctly, it is still somewhat confusing to me. The confusion I have arises from the fact that |
In the end I left your structure as I couldn't find a cleaner way to combine the if+while, but I did adjust Thank you for the patch! |
Thank you! This was my first Rust project, happy it turned out to be useful! |
This PR improves the speed of encode() by more than 2x. On a corpus of test data consisting of a mix of text and HTML totaling
476kb
I get these results before my change:Encoding: 1000 runs took 2.4 sec - 414.72 iterations per second
And after my change:
Encoding: 1000 runs took 0.9 sec - 1054.39 iterations per second
The test data is in the
data
directory here:https://github.com/samtregar/zoomascii
The way this change works is by scanning ahead for runs of characters that do not need to be encoded and pushing them directly into the output. This is very common in the kinds of text that QP is used on, so it provides a big speedup. Encoding speed for QP encoding is important for applications like mass email sending, for example.
If it seems useful I can also commit my benchmarking binary, or I can include it on this PR if you'd like to play with it but don't think it's worth including in the repo.