Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Blocklist update fix #4446

Merged
merged 3 commits into from
Dec 17, 2024
Merged

Blocklist update fix #4446

merged 3 commits into from
Dec 17, 2024

Conversation

mdisibio
Copy link
Contributor

@mdisibio mdisibio commented Dec 13, 2024

What this PR does:
Fixes an issue where the blocklist could get doubled entries for the same block after quick compactions, leading to incorrect results for TraceQL metrics queries. Searches are unaffected because they dedupe, but the metrics path requires RF1.

This issue is only present for the SingleBinary, where the compactor and the read paths are in the same process. In distributed mode this isn't a problem, the indexes shared through object storage are correct.

Example:

  • block A -> compacted to B
  • block B -> compacted to C
  • both B and C would be in the in-memory blocklist

image

Notes:
tempodb Retention Test had to be updated because it was relying on previous bug. It expected the same block couldn't be compacted and retained within the same polling cycle, which was because the blocklist didn't work as expected.

Which issue(s) this PR fixes:
Fixes #

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

}
// add new if they don't already exist and weren't also removed
for _, b := range add {
if slices.ContainsFunc(final, hasID(b.BlockID)) ||
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

before we built a map for lookups and it looks like we're just doing linear searches here. wdyt think about sorting and doing a binary search?

Copy link
Contributor Author

@mdisibio mdisibio Dec 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I forgot to include the benchmarks but this version is already significantly faster, mainly by eliminating the maps, which reduces allocs.

pkg: github.com/grafana/tempo/tempodb/blocklist
cpu: Apple M4 Pro
       │ before.txt  │             after.txt              │
       │   sec/op    │   sec/op     vs base               │
Update   5.485m ± 2%   1.212m ± 2%  -77.91% (p=0.002 n=6)

       │  before.txt  │              after.txt              │
       │     B/op     │     B/op      vs base               │
Update   6.345Mi ± 0%   1.746Mi ± 0%  -72.48% (p=0.002 n=6)

       │  before.txt   │             after.txt             │
       │   allocs/op   │ allocs/op   vs base               │
Update   4079.000 ± 0%   4.000 ± 0%  -99.90% (p=0.002 n=6)

Since the number of metas to add/remove is low (max 4?), sorting/binsearch may not be much better linear, but I can check.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, ok. This is called with just a few metas each compaction to pass the results of the cycle into the polled blocklist. And then each polling cycle it's called once with all accumulated modifications that cycle b/c it assumes that the tenant index is still a bit behind.

given everything the compactor does this may not be noticeable with either a linear or binary search. i'm good as is and will approve. up to you if you want to explore binary search.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested a bit and sorting the current blocklist is outweighing the benefits of the quicker lookups (benchmark using 100K blocklist). I checked and the even the current implementation isn't showing up in profiles, so I think it's not worth digging more in here.

}
// add new if they don't already exist and weren't also removed
for _, b := range add {
if slices.ContainsFunc(final, hasID(b.BlockID)) ||
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, ok. This is called with just a few metas each compaction to pass the results of the cycle into the polled blocklist. And then each polling cycle it's called once with all accumulated modifications that cycle b/c it assumes that the tenant index is still a bit behind.

given everything the compactor does this may not be noticeable with either a linear or binary search. i'm good as is and will approve. up to you if you want to explore binary search.

@mdisibio mdisibio merged commit f00ed6a into grafana:main Dec 17, 2024
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants