Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Prometheus fast regexp #4329

Merged
merged 11 commits into from
Nov 20, 2024
Merged

Use Prometheus fast regexp #4329

merged 11 commits into from
Nov 20, 2024

Conversation

joe-elliott
Copy link
Member

@joe-elliott joe-elliott commented Nov 14, 2024

What this PR does:
Uses prometheus' fast regexp package instead of the standard go regexp for performance. This does create a breaking change in Tempo as prometheus regexps are always anchored.

This { span.foo =~ "bar" } used to be a substring search. After this PR it will be an exact match.

My primary concern with this PR is that we are now memoizing at the fetch and engine layers. This will cause extra allocations for certain queries. Should we remove this for the engine layer to match current behavior?

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@joe-elliott
Copy link
Member Author

joe-elliott commented Nov 16, 2024

BenchmarkBlockTraceql was adjusted to:

{"plain", `{span:name=~"HTTP GET" }`},
{"||ed", `{span:name=~"HTTP GET|HTTP POST|HTTP PUT" }`},
{"prefix", `{span:name=~"HTTP GET.*" }`},
{"postfix", `{span:name=~".*GET" }`},
{"substring", `{span:name=~".*HTTP GET.*" }`},
{"caseinsensitive", `{span:name=~"(?i:HTTP GET)" }`},
{"rando", `{span:name=~".*G([A-Z]+)T.*" }`},
{"mutiple", `{span:name=~".*HTTP.*GET.*" }`},
{"all", `{span:name=~".*" }`},
Benches!
goos: darwin
goarch: arm64
pkg: github.com/grafana/tempo/tempodb/encoding/vparquet4
cpu: Apple M3 Pro
                                       │ before-anchored.txt │           after-cheat.txt           │
                                       │       sec/op        │   sec/op     vs base                │
BackendBlockTraceQL/plain-11                     33.40m ± 1%   30.24m ± 0%   -9.44% (p=0.000 n=10)
BackendBlockTraceQL/||ed-11                      43.04m ± 1%   40.25m ± 1%   -6.47% (p=0.000 n=10)
BackendBlockTraceQL/prefix-11                    53.21m ± 1%   50.18m ± 0%   -5.69% (p=0.000 n=10)
BackendBlockTraceQL/postfix-11                   33.69m ± 0%   31.82m ± 0%   -5.56% (p=0.000 n=10)
BackendBlockTraceQL/substring-11                 54.63m ± 0%   54.13m ± 0%   -0.92% (p=0.002 n=10)
BackendBlockTraceQL/caseinsensitive-11           33.66m ± 1%   32.35m ± 0%   -3.90% (p=0.000 n=10)
BackendBlockTraceQL/rando-11                     54.79m ± 0%   52.45m ± 0%   -4.28% (p=0.000 n=10)
BackendBlockTraceQL/mutiple-11                   55.67m ± 0%   52.47m ± 0%   -5.75% (p=0.000 n=10)
BackendBlockTraceQL/all-11                       442.6m ± 1%   271.9m ± 1%  -38.56% (p=0.000 n=10)
geomean                                          57.04m        51.50m        -9.73%

                                       │ before-anchored.txt │           after-cheat.txt            │
                                       │         B/s         │     B/s       vs base                │
BackendBlockTraceQL/plain-11                    415.5Mi ± 1%   458.8Mi ± 0%  +10.43% (p=0.000 n=10)
BackendBlockTraceQL/||ed-11                     322.4Mi ± 1%   344.7Mi ± 1%   +6.92% (p=0.000 n=10)
BackendBlockTraceQL/prefix-11                   260.8Mi ± 1%   276.5Mi ± 0%   +6.03% (p=0.000 n=10)
BackendBlockTraceQL/postfix-11                  411.9Mi ± 0%   436.1Mi ± 0%   +5.89% (p=0.000 n=10)
BackendBlockTraceQL/substring-11                254.0Mi ± 0%   256.4Mi ± 0%   +0.93% (p=0.001 n=10)
BackendBlockTraceQL/caseinsensitive-11          412.2Mi ± 1%   429.0Mi ± 0%   +4.06% (p=0.000 n=10)
BackendBlockTraceQL/rando-11                    253.3Mi ± 0%   264.6Mi ± 0%   +4.47% (p=0.000 n=10)
BackendBlockTraceQL/mutiple-11                  249.3Mi ± 0%   264.5Mi ± 0%   +6.10% (p=0.000 n=10)
BackendBlockTraceQL/all-11                      31.35Mi ± 1%   51.03Mi ± 1%  +62.77% (p=0.000 n=10)
geomean                                         243.3Mi        269.5Mi       +10.78%

                                       │ before-anchored.txt │           after-cheat.txt           │
                                       │      MB_io/op       │  MB_io/op   vs base                 │
BackendBlockTraceQL/plain-11                      14.55 ± 0%   14.55 ± 0%       ~ (p=1.000 n=10) ¹
BackendBlockTraceQL/||ed-11                       14.55 ± 0%   14.55 ± 0%       ~ (p=1.000 n=10) ¹
BackendBlockTraceQL/prefix-11                     14.55 ± 0%   14.55 ± 0%       ~ (p=1.000 n=10) ¹
BackendBlockTraceQL/postfix-11                    14.55 ± 0%   14.55 ± 0%       ~ (p=1.000 n=10) ¹
BackendBlockTraceQL/substring-11                  14.55 ± 0%   14.55 ± 0%       ~ (p=1.000 n=10) ¹
BackendBlockTraceQL/caseinsensitive-11            14.55 ± 0%   14.55 ± 0%       ~ (p=1.000 n=10) ¹
BackendBlockTraceQL/rando-11                      14.55 ± 0%   14.55 ± 0%       ~ (p=1.000 n=10) ¹
BackendBlockTraceQL/mutiple-11                    14.55 ± 0%   14.55 ± 0%       ~ (p=1.000 n=10) ¹
BackendBlockTraceQL/all-11                        14.55 ± 0%   14.55 ± 0%       ~ (p=1.000 n=10) ¹
geomean                                           14.55        14.55       +0.00%
¹ all samples are equal

                                       │ before-anchored.txt │            after-cheat.txt            │
                                       │    traces_found     │ traces_found  vs base                 │
BackendBlockTraceQL/plain-11                      119.0 ± 0%     119.0 ± 0%       ~ (p=1.000 n=10) ¹
BackendBlockTraceQL/||ed-11                       747.0 ± 0%     747.0 ± 0%       ~ (p=1.000 n=10) ¹
BackendBlockTraceQL/prefix-11                    2.090k ± 0%    2.090k ± 0%       ~ (p=1.000 n=10) ¹
BackendBlockTraceQL/postfix-11                    120.0 ± 0%     120.0 ± 0%       ~ (p=1.000 n=10) ¹
BackendBlockTraceQL/substring-11                 2.090k ± 0%    2.090k ± 0%       ~ (p=1.000 n=10) ¹
BackendBlockTraceQL/caseinsensitive-11            119.0 ± 0%     119.0 ± 0%       ~ (p=1.000 n=10) ¹
BackendBlockTraceQL/rando-11                     2.090k ± 0%    2.090k ± 0%       ~ (p=1.000 n=10) ¹
BackendBlockTraceQL/mutiple-11                   2.090k ± 0%    2.090k ± 0%       ~ (p=1.000 n=10) ¹
BackendBlockTraceQL/all-11                       2.975k ± 0%    2.975k ± 0%       ~ (p=1.000 n=10) ¹
geomean                                           746.6          746.6       +0.00%
¹ all samples are equal

                                       │ before-anchored.txt │           after-cheat.txt           │
                                       │        B/op         │     B/op      vs base               │
BackendBlockTraceQL/plain-11                    6.192Mi ± 4%   6.245Mi ± 4%       ~ (p=0.853 n=10)
BackendBlockTraceQL/||ed-11                     10.33Mi ± 3%   10.41Mi ± 4%       ~ (p=0.280 n=10)
BackendBlockTraceQL/prefix-11                   15.12Mi ± 3%   15.19Mi ± 3%       ~ (p=0.315 n=10)
BackendBlockTraceQL/postfix-11                  6.645Mi ± 5%   6.467Mi ± 6%       ~ (p=0.218 n=10)
BackendBlockTraceQL/substring-11                15.42Mi ± 3%   15.32Mi ± 3%       ~ (p=0.481 n=10)
BackendBlockTraceQL/caseinsensitive-11          6.640Mi ± 6%   6.570Mi ± 7%       ~ (p=0.165 n=10)
BackendBlockTraceQL/rando-11                    15.48Mi ± 3%   15.24Mi ± 4%       ~ (p=0.436 n=10)
BackendBlockTraceQL/mutiple-11                  15.60Mi ± 3%   15.45Mi ± 3%       ~ (p=0.315 n=10)
BackendBlockTraceQL/all-11                      339.2Mi ± 1%   340.5Mi ± 0%       ~ (p=0.739 n=10)
geomean                                         15.57Mi        15.50Mi       -0.50%

                                       │ before-anchored.txt │          after-cheat.txt           │
                                       │      allocs/op      │  allocs/op   vs base               │
BackendBlockTraceQL/plain-11                     89.36k ± 0%   88.86k ± 0%  -0.56% (p=0.000 n=10)
BackendBlockTraceQL/||ed-11                      131.3k ± 0%   130.8k ± 0%  -0.38% (p=0.000 n=10)
BackendBlockTraceQL/prefix-11                    196.6k ± 0%   196.2k ± 0%  -0.18% (p=0.000 n=10)
BackendBlockTraceQL/postfix-11                   89.48k ± 0%   89.12k ± 0%  -0.40% (p=0.000 n=10)
BackendBlockTraceQL/substring-11                 196.6k ± 0%   196.2k ± 0%  -0.18% (p=0.000 n=10)
BackendBlockTraceQL/caseinsensitive-11           89.39k ± 0%   89.02k ± 0%  -0.41% (p=0.000 n=10)
BackendBlockTraceQL/rando-11                     196.6k ± 0%   196.3k ± 0%  -0.16% (p=0.000 n=10)
BackendBlockTraceQL/mutiple-11                   196.6k ± 0%   196.3k ± 0%  -0.16% (p=0.000 n=10)
BackendBlockTraceQL/all-11                       1.989M ± 0%   1.989M ± 0%  -0.02% (p=0.000 n=10)
geomean                                          186.9k        186.4k       -0.27%

@joe-elliott
Copy link
Member Author

My primary concern with this PR is that we are now memoizing at the fetch and engine layers. This will cause extra allocations for certain queries. Should we remove this for the engine layer to match current behavior?

To respond to my own question. I believe the implementation is fine as is. It would be preferable to share the memoization, but this is such a small percentage of total allocs and cpu and the solutions to sharing memoization are so gnarly that I believe this path is the best for now.

Perhaps some future developer can find a way to cleanly share memoization between the fetch layer and engine.

Copy link
Contributor

@knylander-grafana knylander-grafana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for updating the docs!

pkg/regexp/regexp_test.go Outdated Show resolved Hide resolved
Copy link
Contributor

@ie-pham ie-pham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM nice improvement just small nit

joe-elliott and others added 11 commits November 20, 2024 10:06
Signed-off-by: Joe Elliott <[email protected]>
Signed-off-by: Joe Elliott <[email protected]>
Signed-off-by: Joe Elliott <[email protected]>
Signed-off-by: Joe Elliott <[email protected]>
Signed-off-by: Joe Elliott <[email protected]>
Signed-off-by: Joe Elliott <[email protected]>
Signed-off-by: Joe Elliott <[email protected]>
Signed-off-by: Joe Elliott <[email protected]>
Signed-off-by: Joe Elliott <[email protected]>
Signed-off-by: Joe Elliott <[email protected]>
@joe-elliott joe-elliott merged commit b61c3ce into grafana:main Nov 20, 2024
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants