-
Notifications
You must be signed in to change notification settings - Fork 525
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Blocklist update fix #4446
Blocklist update fix #4446
Conversation
} | ||
// add new if they don't already exist and weren't also removed | ||
for _, b := range add { | ||
if slices.ContainsFunc(final, hasID(b.BlockID)) || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
before we built a map for lookups and it looks like we're just doing linear searches here. wdyt think about sorting and doing a binary search?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I forgot to include the benchmarks but this version is already significantly faster, mainly by eliminating the maps, which reduces allocs.
pkg: github.com/grafana/tempo/tempodb/blocklist
cpu: Apple M4 Pro
│ before.txt │ after.txt │
│ sec/op │ sec/op vs base │
Update 5.485m ± 2% 1.212m ± 2% -77.91% (p=0.002 n=6)
│ before.txt │ after.txt │
│ B/op │ B/op vs base │
Update 6.345Mi ± 0% 1.746Mi ± 0% -72.48% (p=0.002 n=6)
│ before.txt │ after.txt │
│ allocs/op │ allocs/op vs base │
Update 4079.000 ± 0% 4.000 ± 0% -99.90% (p=0.002 n=6)
Since the number of metas to add/remove is low (max 4?), sorting/binsearch may not be much better linear, but I can check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, ok. This is called with just a few metas each compaction to pass the results of the cycle into the polled blocklist. And then each polling cycle it's called once with all accumulated modifications that cycle b/c it assumes that the tenant index is still a bit behind.
given everything the compactor does this may not be noticeable with either a linear or binary search. i'm good as is and will approve. up to you if you want to explore binary search.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested a bit and sorting the current blocklist is outweighing the benefits of the quicker lookups (benchmark using 100K blocklist). I checked and the even the current implementation isn't showing up in profiles, so I think it's not worth digging more in here.
} | ||
// add new if they don't already exist and weren't also removed | ||
for _, b := range add { | ||
if slices.ContainsFunc(final, hasID(b.BlockID)) || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, ok. This is called with just a few metas each compaction to pass the results of the cycle into the polled blocklist. And then each polling cycle it's called once with all accumulated modifications that cycle b/c it assumes that the tenant index is still a bit behind.
given everything the compactor does this may not be noticeable with either a linear or binary search. i'm good as is and will approve. up to you if you want to explore binary search.
What this PR does:
Fixes an issue where the blocklist could get doubled entries for the same block after quick compactions, leading to incorrect results for TraceQL metrics queries. Searches are unaffected because they dedupe, but the metrics path requires RF1.
This issue is only present for the SingleBinary, where the compactor and the read paths are in the same process. In distributed mode this isn't a problem, the indexes shared through object storage are correct.
Example:
Notes:
tempodb Retention Test had to be updated because it was relying on previous bug. It expected the same block couldn't be compacted and retained within the same polling cycle, which was because the blocklist didn't work as expected.
Which issue(s) this PR fixes:
Fixes #
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]