Improve performance of parsing simple dates #8386

tobias-hotz · 2024-09-24T10:49:14Z

This PR aims to improve the performance of IsoDate when handling "simple" Dates. This method is not very slow, but it is called very often (e.g. by UUIDMapper during harvesting), which makes it show up in some performance measurements. See #8007 where parsing simple Dates would take as much as 10-20% of the entire harvesting CPU time.

To validate the performance improvement, JMH benchmarks were added.
This is the baseline performance of the patch:

Benchmark                               (arg)  Mode  Cnt    Score     Error  Units
ISODateBenchmark.measureIsoSimple  1976-06-03  avgt    5  604.618 ± 69.233  ns/op
ISODateBenchmark.measureIsoSimple  1976/06/03  avgt    5  608.454 ± 44.684  ns/op
ISODateBenchmark.measureIsoSimple    24-06-06  avgt    5  874.100 ± 55.160  ns/op

Notice that the overall execution time is very small, but due to the extremly high number of calls, this is still a relevant metric
The first idea was to precompile the pattern used for the regex (see the first number), which resulted in the following numbers:

Benchmark                               (arg)  Mode  Cnt    Score     Error  Units
ISODateBenchmark.measureIsoSimple  1976-06-03  avgt    5  466.098 ±  32.134  ns/op
ISODateBenchmark.measureIsoSimple  1976/06/03  avgt    5  456.157 ±  52.184  ns/op
ISODateBenchmark.measureIsoSimple    24-06-06  avgt    5  716.912 ± 112.579  ns/op

This means a minor improvement, but there is still room to improve.
The second commit changes the algorithm to use as little string operations as possible. Whith this. I get the following numbers:

Benchmark                               (arg)  Mode  Cnt    Score    Error  Units
ISODateBenchmark.measureIsoSimple  1976-06-03  avgt    5  105.912 ± 11.120  ns/op
ISODateBenchmark.measureIsoSimple  1976/06/03  avgt    5  115.320 ±  9.542  ns/op
ISODateBenchmark.measureIsoSimple    24-06-06  avgt    5  193.689 ± 14.505  ns/op

The speedup is about 5x compared to the baseline performance.

Checklist

Funded by LGL BW

JMH allows per-method benchmarks. See jmh/README for more details

Simples dates are common in many cases. Performance analysis showed that this can be a performance hotspot during harvesting. This is due to the design of UUIDMapper, which loads all metadata for a harvester for every new batch. This can not be easily changed, but we can improve the performance here, which also helps other code paths using this method. This change improves the performance of the method by a factor of about 1.3x

This change improves the performance of parseDate by removing as many string operations as possible. Doing this and some other minor optimisations, we can improve the performance so it is about 5x faster than the original.

CLAassistant · 2024-12-08T03:43:30Z

All committers have signed the CLA.

tobias-hotz · 2025-01-28T07:19:51Z

Hi @juanluisrp,

I hope you’re doing well! I noticed this PR has been waiting for a while, and I wanted to follow up. The core changes are relatively straightforward, but if they feel too complex, I can roll back to the first attempt, as that version is much simpler and already yields measurable results. However, the current version is more polished and brings additional improvements.

If the JMH tests are a concern, I’m happy to separate them into another PR to speed up the review process. Please let me know if there’s anything I can do to help get this PR merged.

Thanks a lot for your time!

juanluisrp · 2025-01-28T08:15:46Z

Hi @tobias-hotz. The enhancement looks good to me since it still passing the ISODate tests. However I'm not sure if we should add this new performance test module to the source code. Do you plan to keep using it for other improvements?

This reverts commit a00732a.

tobias-hotz · 2025-01-28T08:27:28Z

Hi @juanluisrp,
I’ve removed the JMH part, as I don’t currently plan to use it for anything else. It was a useful method for measuring performance in this PR, but we can always implement ad-hoc tests for any future JMH benchmarking if needed.

geonetworkbuild · 2025-01-28T08:58:46Z

The backport to 4.2.x failed:

The process '/usr/bin/git' failed with exit code 128

stderr

error: commit a754b4957356dde13735698869510837400ef49e is a merge but no -m option was given.
fatal: cherry-pick failed

stdout

Auto-merging pom.xml
[backport-8386-to-4.2.x 1bbc7fb189] Add initial JMH infrastructure and benchmarks
 Author: Tobias Hotz <[email protected]>
 Date: Fri Sep 6 13:34:05 2024 +0200
 5 files changed, 145 insertions(+)
 create mode 100644 jmh/.gitignore
 create mode 100644 jmh/README.md
 create mode 100644 jmh/pom.xml
 create mode 100644 jmh/src/main/java/org/fao/geonet/domain/ISODateBenchmark.java
[backport-8386-to-4.2.x 5107b50ce4] Improve the performance of simple dates, take 1
 Author: Tobias Hotz <[email protected]>
 Date: Tue Sep 24 12:34:57 2024 +0200
 1 file changed, 3 insertions(+), 1 deletion(-)
[backport-8386-to-4.2.x afa8ece847] Improve the performance of simple dates, take 2
 Author: Tobias Hotz <[email protected]>
 Date: Tue Sep 24 12:41:51 2024 +0200
 1 file changed, 65 insertions(+), 42 deletions(-)
Auto-merging pom.xml
[backport-8386-to-4.2.x c0e730286e] Revert "Add initial JMH infrastructure and benchmarks"
 Author: Tobias Hotz <[email protected]>
 Date: Tue Jan 28 09:24:05 2025 +0100
 5 files changed, 145 deletions(-)
 delete mode 100644 jmh/.gitignore
 delete mode 100644 jmh/README.md
 delete mode 100644 jmh/pom.xml
 delete mode 100644 jmh/src/main/java/org/fao/geonet/domain/ISODateBenchmark.java

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-4.2.x 4.2.x
# Navigate to the new working tree
cd .worktrees/backport-4.2.x
# Create a new branch
git switch --create backport-8386-to-4.2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick a00732a80c76dbc3046390107169122f8b70a4f7,8d5e5e64305d055147d972223aef0500e5fce70f,3c90f55e8c82e5b97ebcfa537d65a319f25f89e5,b0ebf075236076678b66a265ffe661e1d27bb5cd,a754b4957356dde13735698869510837400ef49e
# Push it to GitHub
git push --set-upstream origin backport-8386-to-4.2.x
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-4.2.x

Then, create a pull request where the base branch is 4.2.x and the compare/head branch is backport-8386-to-4.2.x.

Simples dates are common in many cases. Performance analysis showed that this can be a performance hotspot during harvesting. This is due to the design of UUIDMapper, which loads all metadata for a harvester for every new batch. This can not be easily changed, but we can improve the performance here, which also helps other code paths using this method. This change improves the performance of parseDate by removing as many string operations as possible. Doing this and some other minor optimisations, we can improve the performance so it is about 5x faster than the original.

tobias-hotz added 3 commits September 6, 2024 13:41

Add initial JMH infrastructure and benchmarks

a00732a

JMH allows per-method benchmarks. See jmh/README for more details

Improve the performance of simple dates, take 2

3c90f55

This change improves the performance of parseDate by removing as many string operations as possible. Doing this and some other minor optimisations, we can improve the performance so it is about 5x faster than the original.

tobias-hotz changed the title ~~Simple datetime perf~~ Improve performance of parsing simple Dates Sep 24, 2024

tobias-hotz changed the title ~~Improve performance of parsing simple Dates~~ Improve performance of parsing simple dates Sep 24, 2024

fxprunayre requested a review from juanluisrp October 9, 2024 13:11

fxprunayre added this to the 4.4.7 milestone Oct 9, 2024

juanluisrp added the backport 4.2.x label Jan 28, 2025

juanluisrp self-assigned this Jan 28, 2025

juanluisrp added enhancement changelog labels Jan 28, 2025

tobias-hotz added 2 commits January 28, 2025 09:24

Revert "Add initial JMH infrastructure and benchmarks"

b0ebf07

This reverts commit a00732a.

Merge remote-tracking branch 'upstream/main' into simple_datetime_perf

a754b49

juanluisrp approved these changes Jan 28, 2025

View reviewed changes

juanluisrp merged commit 3e3dacb into geonetwork:main Jan 28, 2025
7 checks passed

geonetworkbuild added the failed backport label Jan 28, 2025

tobias-hotz deleted the simple_datetime_perf branch January 28, 2025 09:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve performance of parsing simple dates #8386

Improve performance of parsing simple dates #8386

tobias-hotz commented Sep 24, 2024

CLAassistant commented Dec 8, 2024 •

edited

Loading

tobias-hotz commented Jan 28, 2025

juanluisrp commented Jan 28, 2025

tobias-hotz commented Jan 28, 2025

geonetworkbuild commented Jan 28, 2025

Improve performance of parsing simple dates #8386

Improve performance of parsing simple dates #8386

Conversation

tobias-hotz commented Sep 24, 2024

Checklist

CLAassistant commented Dec 8, 2024 • edited Loading

tobias-hotz commented Jan 28, 2025

juanluisrp commented Jan 28, 2025

tobias-hotz commented Jan 28, 2025

geonetworkbuild commented Jan 28, 2025

CLAassistant commented Dec 8, 2024 •

edited

Loading