-
-
Notifications
You must be signed in to change notification settings - Fork 493
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance of parsing simple dates #8386
Conversation
JMH allows per-method benchmarks. See jmh/README for more details
Simples dates are common in many cases. Performance analysis showed that this can be a performance hotspot during harvesting. This is due to the design of UUIDMapper, which loads all metadata for a harvester for every new batch. This can not be easily changed, but we can improve the performance here, which also helps other code paths using this method. This change improves the performance of the method by a factor of about 1.3x
This change improves the performance of parseDate by removing as many string operations as possible. Doing this and some other minor optimisations, we can improve the performance so it is about 5x faster than the original.
Hi @juanluisrp, I hope you’re doing well! I noticed this PR has been waiting for a while, and I wanted to follow up. The core changes are relatively straightforward, but if they feel too complex, I can roll back to the first attempt, as that version is much simpler and already yields measurable results. However, the current version is more polished and brings additional improvements. If the JMH tests are a concern, I’m happy to separate them into another PR to speed up the review process. Please let me know if there’s anything I can do to help get this PR merged. Thanks a lot for your time! |
Hi @tobias-hotz. The enhancement looks good to me since it still passing the ISODate tests. However I'm not sure if we should add this new performance test module to the source code. Do you plan to keep using it for other improvements? |
Hi @juanluisrp, |
The backport to
stderr
stdout
To backport manually, run these commands in your terminal: # Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-4.2.x 4.2.x
# Navigate to the new working tree
cd .worktrees/backport-4.2.x
# Create a new branch
git switch --create backport-8386-to-4.2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick a00732a80c76dbc3046390107169122f8b70a4f7,8d5e5e64305d055147d972223aef0500e5fce70f,3c90f55e8c82e5b97ebcfa537d65a319f25f89e5,b0ebf075236076678b66a265ffe661e1d27bb5cd,a754b4957356dde13735698869510837400ef49e
# Push it to GitHub
git push --set-upstream origin backport-8386-to-4.2.x
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-4.2.x Then, create a pull request where the |
Simples dates are common in many cases. Performance analysis showed that this can be a performance hotspot during harvesting. This is due to the design of UUIDMapper, which loads all metadata for a harvester for every new batch. This can not be easily changed, but we can improve the performance here, which also helps other code paths using this method. This change improves the performance of parseDate by removing as many string operations as possible. Doing this and some other minor optimisations, we can improve the performance so it is about 5x faster than the original.
This PR aims to improve the performance of IsoDate when handling "simple" Dates. This method is not very slow, but it is called very often (e.g. by UUIDMapper during harvesting), which makes it show up in some performance measurements. See #8007 where parsing simple Dates would take as much as 10-20% of the entire harvesting CPU time.
To validate the performance improvement, JMH benchmarks were added.
This is the baseline performance of the patch:
Notice that the overall execution time is very small, but due to the extremly high number of calls, this is still a relevant metric
The first idea was to precompile the pattern used for the regex (see the first number), which resulted in the following numbers:
This means a minor improvement, but there is still room to improve.
The second commit changes the algorithm to use as little string operations as possible. Whith this. I get the following numbers:
The speedup is about 5x compared to the baseline performance.
Checklist
main
branch, backports managed with labelREADME.md
filespom.xml
dependency management. Update build documentation with intended library use and library tutorials or documentationFunded by LGL BW