Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ingester memory improvements by adjusting prealloc #4344

Merged
merged 6 commits into from
Nov 19, 2024

Conversation

joe-elliott
Copy link
Member

@joe-elliott joe-elliott commented Nov 19, 2024

What this PR does:

Adjusts the use of prealloc for some nice memory improvements. Two adjustments were made:

  • Remove the use of prealloc for trace ids. We were alloc'ing 500 bytes for every trace id and never recouping them in the sync pool.
  • Change the traces pool to use linear buckets to reduce wasted allocs. In larger clusters millions of these slices are alloc'ed per second. We will have no problem with bucket collisions. Bucket size was tuned using an internal cluster and a metric was added to help diagnose future issues.

Seeing ~30% working set reduction in ingesters.

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

Signed-off-by: Joe Elliott <[email protected]>
Signed-off-by: Joe Elliott <[email protected]>
Signed-off-by: Joe Elliott <[email protected]>
Signed-off-by: Joe Elliott <[email protected]>
Signed-off-by: Joe Elliott <[email protected]>
Copy link
Contributor

@zalegrala zalegrala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. The new metric ought to be interesting.

return p.make(sz)

// Find the right bucket.
bkt := sz / p.bktSize
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose its a trad-off here to alloc or do the math twice.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this doesn't really "alloc" in any way we are concerned about. we're always conscious of allocating memory that escapes to the heap. everything else is cheap. whether this is stored on the stack of just stays in a register it would likely be undetectable from a tempo perf standpoint.


b := p.buckets[bkt].Get()
if b == nil {
sz := (bkt + 1) * p.bktSize
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this alloc, or is the compiler smart enough to see its only used once?

Copy link
Member Author

@joe-elliott joe-elliott Nov 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're really testing my knowledge of compilers :). my guess is this value would never leave a register

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can do escape analysis in Go with go build -gcflags "-m" <file>. It does not escape :P

@joe-elliott joe-elliott added type/bug Something isn't working backport r175 labels Nov 19, 2024
Copy link
Contributor

This PR must be merged before a backport PR will be created.

1 similar comment
Copy link
Contributor

This PR must be merged before a backport PR will be created.

@joe-elliott joe-elliott merged commit f71c4c6 into grafana:main Nov 19, 2024
20 checks passed
github-actions bot pushed a commit that referenced this pull request Nov 19, 2024
* remove trace ids

Signed-off-by: Joe Elliott <[email protected]>

* linear buckets

Signed-off-by: Joe Elliott <[email protected]>

* changelog

Signed-off-by: Joe Elliott <[email protected]>

* tuney tune

Signed-off-by: Joe Elliott <[email protected]>

* metric misses and increase pool size

Signed-off-by: Joe Elliott <[email protected]>

* lint

Signed-off-by: Joe Elliott <[email protected]>

---------

Signed-off-by: Joe Elliott <[email protected]>
(cherry picked from commit f71c4c6)
joe-elliott added a commit that referenced this pull request Nov 19, 2024
* remove trace ids

Signed-off-by: Joe Elliott <[email protected]>

* linear buckets

Signed-off-by: Joe Elliott <[email protected]>

* changelog

Signed-off-by: Joe Elliott <[email protected]>

* tuney tune

Signed-off-by: Joe Elliott <[email protected]>

* metric misses and increase pool size

Signed-off-by: Joe Elliott <[email protected]>

* lint

Signed-off-by: Joe Elliott <[email protected]>

---------

Signed-off-by: Joe Elliott <[email protected]>
(cherry picked from commit f71c4c6)

Co-authored-by: Joe Elliott <[email protected]>
Copy link
Contributor

The backport to r174 failed:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new branch
git switch --create backport-4344-to-r174 origin/r174
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x f71c4c6ccbd81b75826b434d6bf2b82174a757c5

When the conflicts are resolved, stage and commit the changes:

git add . && git cherry-pick --continue

If you have the GitHub CLI installed:

# Push the branch to GitHub:
git push --set-upstream origin backport-4344-to-r174
# Create the PR body template
PR_BODY=$(gh pr view 4344 --json body --template 'Backport f71c4c6ccbd81b75826b434d6bf2b82174a757c5 from #4344{{ "\n\n---\n\n" }}{{ index . "body" }}')
# Create the PR on GitHub
echo "${PR_BODY}" | gh pr create --title '[r174] Ingester memory improvements by adjusting prealloc' --body-file - --label 'type/bug' --label 'backport' --base r174 --milestone r174 --web

Or, if you don't have the GitHub CLI installed (we recommend you install it!):

# Push the branch to GitHub:
git push --set-upstream origin backport-4344-to-r174

# Create a pull request where the `base` branch is `r174` and the `compare`/`head` branch is `backport-4344-to-r174`.

# Remove the local backport branch
git switch main
git branch -D backport-4344-to-r174

joe-elliott added a commit that referenced this pull request Nov 20, 2024
* remove trace ids

Signed-off-by: Joe Elliott <[email protected]>

* linear buckets

Signed-off-by: Joe Elliott <[email protected]>

* changelog

Signed-off-by: Joe Elliott <[email protected]>

* tuney tune

Signed-off-by: Joe Elliott <[email protected]>

* metric misses and increase pool size

Signed-off-by: Joe Elliott <[email protected]>

* lint

Signed-off-by: Joe Elliott <[email protected]>

---------

Signed-off-by: Joe Elliott <[email protected]>
(cherry picked from commit f71c4c6)
joe-elliott added a commit that referenced this pull request Nov 20, 2024
* remove trace ids

Signed-off-by: Joe Elliott <[email protected]>

* linear buckets

Signed-off-by: Joe Elliott <[email protected]>

* changelog

Signed-off-by: Joe Elliott <[email protected]>

* tuney tune

Signed-off-by: Joe Elliott <[email protected]>

* metric misses and increase pool size

Signed-off-by: Joe Elliott <[email protected]>

* lint

Signed-off-by: Joe Elliott <[email protected]>

---------

Signed-off-by: Joe Elliott <[email protected]>
(cherry picked from commit f71c4c6)
@joe-elliott joe-elliott mentioned this pull request Nov 22, 2024
3 tasks
mapno added a commit that referenced this pull request Nov 25, 2024
* chore: remove gofakeit dependency (#4274)

* Further reduce Labes() calls in the metrics registry (#4283)

* Respect passed headers in read path requests (#4287)

* Ingester: Validate completed blocks (#4256)

* Add validate method to block

Signed-off-by: Joe Elliott <[email protected]>

* Add Validate usage in the ingester

Signed-off-by: Joe Elliott <[email protected]>

* changelog

Signed-off-by: Joe Elliott <[email protected]>

* add test and fix replay

Signed-off-by: Joe Elliott <[email protected]>

* increment metric

Signed-off-by: Joe Elliott <[email protected]>

---------

Signed-off-by: Joe Elliott <[email protected]>

* Add `invalid_utf8` to reasons spans could be rejected (#4293)

* Add `invalid_utf8` to reasons spans could be rejected

* Update changelog

* Update docs

* Ensure test covers invalid UTF-8 and not slack time

* add signals for duplicate rf1 data (#4296)

Signed-off-by: Joe Elliott <[email protected]>

* Bump anchore/sbom-action from 0.17.5 to 0.17.7 (#4307)

Bumps [anchore/sbom-action](https://github.com/anchore/sbom-action) from 0.17.5 to 0.17.7.
- [Release notes](https://github.com/anchore/sbom-action/releases)
- [Changelog](https://github.com/anchore/sbom-action/blob/main/RELEASE.md)
- [Commits](anchore/sbom-action@v0.17.5...v0.17.7)

---
updated-dependencies:
- dependency-name: anchore/sbom-action
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* docs: Update readme with explore traces info (#4263)

* docs: Update readme with explore traces info


Co-authored-by: Kim Nylander <[email protected]>

* chore: remove spanlogger (#4312)

* chore: remove spanlogger

* Query-Frontend: Add middleware to drop headers (#4298)

* header strip ware

Signed-off-by: Joe Elliott <[email protected]>

* comment

Signed-off-by: Joe Elliott <[email protected]>

* changelog

Signed-off-by: Joe Elliott <[email protected]>

* remove header strip wear from metrics summary

Signed-off-by: Joe Elliott <[email protected]>

---------

Signed-off-by: Joe Elliott <[email protected]>

* Increase length of time compactions have to fail (#4315)

* increase length of time compactions have to fail

Signed-off-by: Joe Elliott <[email protected]>

* gen

Signed-off-by: Joe Elliott <[email protected]>

---------

Signed-off-by: Joe Elliott <[email protected]>

* docs: mark serverless as deprecated (#4017)

* docs: mark serverless as deprecated

* Changelog + readme

* docs: Remove duplicated examples (#4295)

This removes duplicates examples from the Configure TraceQL
metrics page.

Signed-off-by: Alex Bikfalvi <[email protected]>

* tempo-cli: support dropping multiple traces in a single operation (#4266)

* tempo-cli: support dropping multiple traces in a single operation

* update final log message

---------

Co-authored-by: Suraj Nath <[email protected]>

* [DOC] Add clarification for metrics summary and traceQL metrics (#4316)

* Add clarification for metrics summary and traceQL metrics

* Apply suggestions from code review

Co-authored-by: Jennifer Villa <[email protected]>

* Update docs/sources/tempo/api_docs/metrics-summary.md

---------

Co-authored-by: Jennifer Villa <[email protected]>

* TraceQL metrics time range fixes (#4325)

* Disconnect job time range filtering from step, so that results in split backend/recent range is accurate

* changelog

* Fix to assert metrics query range before alignment because alignment may increase it, which is not the responsibility of the caller to account for (#4331)

* Add doc about configuring TLS with Helm (#4328)

* Add doc about configuring TLS with Helm

* Add memberlist and readinessProbe to example

* Include server config for listening on TLS

* Add note about scraping

* Update docs/sources/tempo/configuration/network/tls.md

Co-authored-by: Markus Toivonen <[email protected]>

* Update docs/sources/tempo/configuration/network/tls.md

Co-authored-by: Kim Nylander <[email protected]>

* Update docs/sources/tempo/configuration/network/tls.md

Co-authored-by: Kim Nylander <[email protected]>

* Add memcached config for TLS

---------

Co-authored-by: Markus Toivonen <[email protected]>
Co-authored-by: Kim Nylander <[email protected]>

* [DOC] Add TLS info to Helm chart doc (#4334)

* fix deprecation warning by switching to DoBatchWithOptions (#4343)

Signed-off-by: Daniel Strobusch <[email protected]>

* bump dskit to v0.0.0-20241115082728-f2a7eb3aa0e9 to leverage benefits for context causes for DoBatch calls. (#4341)

See grafana/dskit#576

Signed-off-by: Daniel Strobusch <[email protected]>

* Bump github.com/minio/minio-go/v7 from 7.0.70 to 7.0.80 (#4282)

* Bump github.com/minio/minio-go/v7 from 7.0.70 to 7.0.80

Bumps [github.com/minio/minio-go/v7](https://github.com/minio/minio-go) from 7.0.70 to 7.0.80.
- [Release notes](https://github.com/minio/minio-go/releases)
- [Commits](minio/minio-go@v7.0.70...v7.0.80)

---
updated-dependencies:
- dependency-name: github.com/minio/minio-go/v7
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>

* Update serverless vendor

---------

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Zach Leslie <[email protected]>

* update default config values to better align with production workloads (#4340)

* update default config values to better align with production workloads

* Update CHANGELOG.md and config docs

* Ingester memory improvements by adjusting prealloc (#4344)

* remove trace ids

Signed-off-by: Joe Elliott <[email protected]>

* linear buckets

Signed-off-by: Joe Elliott <[email protected]>

* changelog

Signed-off-by: Joe Elliott <[email protected]>

* tuney tune

Signed-off-by: Joe Elliott <[email protected]>

* metric misses and increase pool size

Signed-off-by: Joe Elliott <[email protected]>

* lint

Signed-off-by: Joe Elliott <[email protected]>

---------

Signed-off-by: Joe Elliott <[email protected]>

* Bump github.com/Azure/azure-sdk-for-go/sdk/azcore from 1.13.0 to 1.16.0 (#4302)

* Bump github.com/Azure/azure-sdk-for-go/sdk/azcore from 1.13.0 to 1.16.0

Bumps [github.com/Azure/azure-sdk-for-go/sdk/azcore](https://github.com/Azure/azure-sdk-for-go) from 1.13.0 to 1.16.0.
- [Release notes](https://github.com/Azure/azure-sdk-for-go/releases)
- [Changelog](https://github.com/Azure/azure-sdk-for-go/blob/main/documentation/release.md)
- [Commits](Azure/azure-sdk-for-go@sdk/azcore/v1.13.0...sdk/azcore/v1.16.0)

---
updated-dependencies:
- dependency-name: github.com/Azure/azure-sdk-for-go/sdk/azcore
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>

* Update serverless vendor

---------

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Zach Leslie <[email protected]>

* Use Prometheus fast regexp (#4329)

* basic integration

Signed-off-by: Joe Elliott <[email protected]>

* patch tests for new meaning

Signed-off-by: Joe Elliott <[email protected]>

* patch up more tests

Signed-off-by: Joe Elliott <[email protected]>

* add basic tests

Signed-off-by: Joe Elliott <[email protected]>

* changelog + docs

Signed-off-by: Joe Elliott <[email protected]>

* remove benches

Signed-off-by: Joe Elliott <[email protected]>

* Cleaned up + tests

Signed-off-by: Joe Elliott <[email protected]>

* comment

Signed-off-by: Joe Elliott <[email protected]>

* lint

Signed-off-by: Joe Elliott <[email protected]>

* Update docs/sources/tempo/traceql/_index.md

Co-authored-by: Kim Nylander <[email protected]>

* comment

Signed-off-by: Joe Elliott <[email protected]>

---------

Signed-off-by: Joe Elliott <[email protected]>
Co-authored-by: Kim Nylander <[email protected]>

* Fix broken link in service-graphs docs (#4351)

* Fix minor typo in TraceQL docs (#4356)

* Bump default memcached version (#4363)

* Exemplar fixes (#4366)

* Fix exemplars based on duration to convert to seconds, fix various other issues

* changelog

* fix: initialize histogram buckets to 0 to avoid them being downsampled (#4368)

* initialized histogram buckets to 0 to avoid them being downsampled

* Ingester/Generator Live trace cleanup (#4365)

* moved trace sizes somewhere shareable

Signed-off-by: Joe Elliott <[email protected]>

* use tracesizes in ingester

Signed-off-by: Joe Elliott <[email protected]>

* make tests work

Signed-off-by: Joe Elliott <[email protected]>

* trace bytes in generator

Signed-off-by: Joe Elliott <[email protected]>

* remove traceCount

Signed-off-by: Joe Elliott <[email protected]>

* live trace shenanigans

Signed-off-by: Joe Elliott <[email protected]>

* changelog

Signed-off-by: Joe Elliott <[email protected]>

* Update modules/generator/processor/localblocks/livetraces.go

Co-authored-by: Mario <[email protected]>

* Update modules/ingester/instance.go

Co-authored-by: Mario <[email protected]>

* Test cleanup. Add sz test, restore commented out and fix e2e

Signed-off-by: Joe Elliott <[email protected]>

* remove todo comment

Signed-off-by: Joe Elliott <[email protected]>

---------

Signed-off-by: Joe Elliott <[email protected]>
Co-authored-by: Mario <[email protected]>

* Bump anchore/sbom-action from 0.17.7 to 0.17.8 (#4371)

Bumps [anchore/sbom-action](https://github.com/anchore/sbom-action) from 0.17.7 to 0.17.8.
- [Release notes](https://github.com/anchore/sbom-action/releases)
- [Changelog](https://github.com/anchore/sbom-action/blob/main/RELEASE.md)
- [Commits](anchore/sbom-action@v0.17.7...v0.17.8)

---
updated-dependencies:
- dependency-name: anchore/sbom-action
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Update for IDs change

* Only run blockbuilder if ingest enabled

---------

Signed-off-by: Joe Elliott <[email protected]>
Signed-off-by: dependabot[bot] <[email protected]>
Signed-off-by: Alex Bikfalvi <[email protected]>
Signed-off-by: Daniel Strobusch <[email protected]>
Co-authored-by: Javier Molina Reyes <[email protected]>
Co-authored-by: Zach Leslie <[email protected]>
Co-authored-by: Joe Elliott <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Ryan Perry <[email protected]>
Co-authored-by: Kim Nylander <[email protected]>
Co-authored-by: Suraj Nath <[email protected]>
Co-authored-by: Alex Bikfalvi <[email protected]>
Co-authored-by: Andrey Karpov <[email protected]>
Co-authored-by: Jennifer Villa <[email protected]>
Co-authored-by: Martin Disibio <[email protected]>
Co-authored-by: Markus Toivonen <[email protected]>
Co-authored-by: Daniel Strobusch <[email protected]>
Co-authored-by: Carles Garcia <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants