Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace unsafe interning with unique package #4264

Closed

Conversation

zalegrala
Copy link
Contributor

@zalegrala zalegrala commented Nov 1, 2024

What this PR does:

Here we remove the use of unsafe from the intern package and replace it with the unique package. The cleanup of the internal map is now handled by the unique package in a background go routine.

Which issue(s) this PR fixes:
Fixes #

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@zalegrala zalegrala changed the title Replace *pq.Value interning with unique package Replace *pq.Value interning with unique package Nov 1, 2024
@zalegrala zalegrala changed the title Replace *pq.Value interning with unique package Replace *pq.Value interning with unique package Nov 1, 2024
@zalegrala
Copy link
Contributor Author

This is currently increasing the allocations.

@zalegrala zalegrala changed the title Replace *pq.Value interning with unique package Replace unsafe interning with unique package Nov 1, 2024
@zalegrala
Copy link
Contributor Author

goos: linux
goarch: amd64
pkg: github.com/grafana/tempo/pkg/parquetquery
cpu: AMD Ryzen 7 PRO 7840U w/ Radeon 780M Graphics
                             │  main.txt   │              wip.txt               │
                             │   sec/op    │   sec/op     vs base               │
ColumnIterator/async-16        45.60m ± 1%   44.55m ± 1%  -2.32% (p=0.000 n=10)
ColumnIterator/sync-16         14.20m ± 1%   13.90m ± 1%  -2.13% (p=0.000 n=10)
ColumnIterator/internSync-16   14.71m ± 1%   14.48m ± 2%  -1.58% (p=0.007 n=10)
geomean                        21.20m        20.77m       -2.01%

                             │   main.txt   │               wip.txt               │
                             │     B/op     │     B/op      vs base               │
ColumnIterator/async-16        79.75Mi ± 0%   79.77Mi ± 0%       ~ (p=0.529 n=10)
ColumnIterator/sync-16         6.787Ki ± 3%   6.753Ki ± 3%       ~ (p=0.839 n=10)
ColumnIterator/internSync-16   6.827Ki ± 3%   6.788Ki ± 4%       ~ (p=0.436 n=10)
geomean                        155.8Ki        155.3Ki       -0.35%

                             │  main.txt   │               wip.txt                │
                             │  allocs/op  │  allocs/op   vs base                 │
ColumnIterator/async-16        300.3k ± 0%   300.3k ± 0%       ~ (p=0.515 n=10)
ColumnIterator/sync-16          48.00 ± 0%    48.00 ± 0%       ~ (p=1.000 n=10) ¹
ColumnIterator/internSync-16    50.00 ± 0%    48.00 ± 0%  -4.00% (p=0.000 n=10)
geomean                         896.6         884.5       -1.35%

Copy link
Contributor

@javiermolinar javiermolinar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice

pkg/parquetquery/intern/intern.go Outdated Show resolved Hide resolved
Copy link
Member

@mapno mapno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice cleanup 👍

@zalegrala
Copy link
Contributor Author

zalegrala commented Nov 5, 2024

I've reverted the drop of the intern package. A test was failing in the linkCollector when validating the block queries matched against the traceID. I believe this is because nothing in the test was holding the value and is it was getting GC'd. Holding the reference on the SyncIterator maintains the reference so it doesn't immediately get cleaned up during tests.

I've updated the package doc and renamed UnsafeClone to just Clone.

@mdisibio
Copy link
Contributor

mdisibio commented Nov 5, 2024

Do you have any tracesql search benchmarks? Since this loses the handle of the unique value, it means the interned values only last until the next garbage collection. Whereas the previous interner map would last as long as needed (I think cleared on each new row group).

@mdisibio
Copy link
Contributor

mdisibio commented Nov 5, 2024

For reference: https://go.dev/blog/unique#a-footnote-about-interning-strings

In the meantime, unique.Make("my string").Value() is one possible workaround. Even though failing to retain the handle will allow the string to be deleted from unique’s internal map, map entries are not deleted immediately. In practice, entries will not be deleted until at least the next garbage collection completes, so this workaround still allows for some degree of deduplication in the periods between collections.

Interested to see how the benefits look, since I'm considering doing the same for deduping span data in the generators, where a lot of duplicated span attribute data lives in the live traces map.

@zalegrala
Copy link
Contributor Author

What I don't like about the change currently is the lack of control about when objects are cleaned. I'd like to see if we can avoid impacting traceql.

@zalegrala
Copy link
Contributor Author

I'm going to take this back to draft while I work up a change I like better.

@zalegrala zalegrala marked this pull request as draft November 6, 2024 13:32
@mdisibio
Copy link
Contributor

mdisibio commented Nov 7, 2024

What do you think about returning to the map-based implementation, but populate it with the unique handles? That way we keep the reference and the deduped strings are kept alive across GCs. The lifetime of the interner hasn't changed, so I think that would work very closely to the previous implementation, and let us remove the unsafe code.

@zalegrala zalegrala force-pushed the uniquePackageInternReplacement branch from 064d96b to e40dfe3 Compare November 7, 2024 14:46
@zalegrala
Copy link
Contributor Author

I was thinking about bringing back the map also, to give us more control of the memory lifecycle.

@zalegrala
Copy link
Contributor Author

I don't presently understand the test failure.

@zalegrala zalegrala closed this Nov 20, 2024
@zalegrala zalegrala deleted the uniquePackageInternReplacement branch November 20, 2024 14:32
@zalegrala
Copy link
Contributor Author

It doesn't appear to be enough to maintain the unique.Handle reference. As seen in the last test failure, values can change underneath you even while those unique.Handle references are in the map. I believe the reason for this is that the "weak" pointer approach used in the unique package means that we are pointing to the underlying array of bytes rather that the struct, so when the struct is garbage collected, we still maintain the reference to the array, which is now something else entirely.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants