-
Notifications
You must be signed in to change notification settings - Fork 526
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ingester: Validate completed blocks #4256
Ingester: Validate completed blocks #4256
Conversation
Signed-off-by: Joe Elliott <[email protected]>
Signed-off-by: Joe Elliott <[email protected]>
Signed-off-by: Joe Elliott <[email protected]>
7b01ac7
to
3a0f62c
Compare
Signed-off-by: Joe Elliott <[email protected]>
Signed-off-by: Joe Elliott <[email protected]>
|
||
// Validate will do a basic sanity check of the state of the parquet file. This can be extended to do more checks in the future. | ||
// This method should lean towards being cost effective over complete. | ||
func (b *backendBlock) Validate(ctx context.Context) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it would not be simpler trying to open the file directly with parquet-go:
o := []parquet.FileOption{
parquet.SkipBloomFilters(true),
parquet.SkipPageIndex(true),
parquet.FileSchema(parquetSchema),
parquet.FileReadMode(parquet.ReadModeAsync),
}
reader := NewBackendReaderAt(ctx, b.r, DataFileName, b.meta)
_, err := parquet.OpenFile(reader, int64(b.meta.Size_), o...)
if err != nil {
return fmt.Errorf("failed to read parquet fike: %w", err)
}
OpenFile seems to do essentially the same plus some other validations. Is this difference noticeable enough? Since this is at startup time maybe it isn't.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OpenFile also unmarshals the thrift footer which is more costly. I was trying to avoid paying those allocs on startup.
// Validate will do a basic sanity check of the state of the parquet file. This can be extended to do more checks in the future. | ||
// This method should lean towards being cost effective over complete. | ||
func (b *backendBlock) Validate(ctx context.Context) error { | ||
if b.meta == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to reassemble this information from data on disk and still avoid paying the allocs on startup like you mention above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
possibly? what do you mean by "this information"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
* chore: remove gofakeit dependency (#4274) * Further reduce Labes() calls in the metrics registry (#4283) * Respect passed headers in read path requests (#4287) * Ingester: Validate completed blocks (#4256) * Add validate method to block Signed-off-by: Joe Elliott <[email protected]> * Add Validate usage in the ingester Signed-off-by: Joe Elliott <[email protected]> * changelog Signed-off-by: Joe Elliott <[email protected]> * add test and fix replay Signed-off-by: Joe Elliott <[email protected]> * increment metric Signed-off-by: Joe Elliott <[email protected]> --------- Signed-off-by: Joe Elliott <[email protected]> * Add `invalid_utf8` to reasons spans could be rejected (#4293) * Add `invalid_utf8` to reasons spans could be rejected * Update changelog * Update docs * Ensure test covers invalid UTF-8 and not slack time * add signals for duplicate rf1 data (#4296) Signed-off-by: Joe Elliott <[email protected]> * Bump anchore/sbom-action from 0.17.5 to 0.17.7 (#4307) Bumps [anchore/sbom-action](https://github.com/anchore/sbom-action) from 0.17.5 to 0.17.7. - [Release notes](https://github.com/anchore/sbom-action/releases) - [Changelog](https://github.com/anchore/sbom-action/blob/main/RELEASE.md) - [Commits](anchore/sbom-action@v0.17.5...v0.17.7) --- updated-dependencies: - dependency-name: anchore/sbom-action dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * docs: Update readme with explore traces info (#4263) * docs: Update readme with explore traces info Co-authored-by: Kim Nylander <[email protected]> * chore: remove spanlogger (#4312) * chore: remove spanlogger * Query-Frontend: Add middleware to drop headers (#4298) * header strip ware Signed-off-by: Joe Elliott <[email protected]> * comment Signed-off-by: Joe Elliott <[email protected]> * changelog Signed-off-by: Joe Elliott <[email protected]> * remove header strip wear from metrics summary Signed-off-by: Joe Elliott <[email protected]> --------- Signed-off-by: Joe Elliott <[email protected]> * Increase length of time compactions have to fail (#4315) * increase length of time compactions have to fail Signed-off-by: Joe Elliott <[email protected]> * gen Signed-off-by: Joe Elliott <[email protected]> --------- Signed-off-by: Joe Elliott <[email protected]> * docs: mark serverless as deprecated (#4017) * docs: mark serverless as deprecated * Changelog + readme * docs: Remove duplicated examples (#4295) This removes duplicates examples from the Configure TraceQL metrics page. Signed-off-by: Alex Bikfalvi <[email protected]> * tempo-cli: support dropping multiple traces in a single operation (#4266) * tempo-cli: support dropping multiple traces in a single operation * update final log message --------- Co-authored-by: Suraj Nath <[email protected]> * [DOC] Add clarification for metrics summary and traceQL metrics (#4316) * Add clarification for metrics summary and traceQL metrics * Apply suggestions from code review Co-authored-by: Jennifer Villa <[email protected]> * Update docs/sources/tempo/api_docs/metrics-summary.md --------- Co-authored-by: Jennifer Villa <[email protected]> * TraceQL metrics time range fixes (#4325) * Disconnect job time range filtering from step, so that results in split backend/recent range is accurate * changelog * Fix to assert metrics query range before alignment because alignment may increase it, which is not the responsibility of the caller to account for (#4331) * Add doc about configuring TLS with Helm (#4328) * Add doc about configuring TLS with Helm * Add memberlist and readinessProbe to example * Include server config for listening on TLS * Add note about scraping * Update docs/sources/tempo/configuration/network/tls.md Co-authored-by: Markus Toivonen <[email protected]> * Update docs/sources/tempo/configuration/network/tls.md Co-authored-by: Kim Nylander <[email protected]> * Update docs/sources/tempo/configuration/network/tls.md Co-authored-by: Kim Nylander <[email protected]> * Add memcached config for TLS --------- Co-authored-by: Markus Toivonen <[email protected]> Co-authored-by: Kim Nylander <[email protected]> * [DOC] Add TLS info to Helm chart doc (#4334) --------- Signed-off-by: Joe Elliott <[email protected]> Signed-off-by: dependabot[bot] <[email protected]> Signed-off-by: Alex Bikfalvi <[email protected]> Co-authored-by: Javier Molina Reyes <[email protected]> Co-authored-by: Zach Leslie <[email protected]> Co-authored-by: Joe Elliott <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Ryan Perry <[email protected]> Co-authored-by: Kim Nylander <[email protected]> Co-authored-by: Suraj Nath <[email protected]> Co-authored-by: Alex Bikfalvi <[email protected]> Co-authored-by: Andrey Karpov <[email protected]> Co-authored-by: Jennifer Villa <[email protected]> Co-authored-by: Martin Disibio <[email protected]> Co-authored-by: Markus Toivonen <[email protected]>
* chore: remove gofakeit dependency (#4274) * Further reduce Labes() calls in the metrics registry (#4283) * Respect passed headers in read path requests (#4287) * Ingester: Validate completed blocks (#4256) * Add validate method to block Signed-off-by: Joe Elliott <[email protected]> * Add Validate usage in the ingester Signed-off-by: Joe Elliott <[email protected]> * changelog Signed-off-by: Joe Elliott <[email protected]> * add test and fix replay Signed-off-by: Joe Elliott <[email protected]> * increment metric Signed-off-by: Joe Elliott <[email protected]> --------- Signed-off-by: Joe Elliott <[email protected]> * Add `invalid_utf8` to reasons spans could be rejected (#4293) * Add `invalid_utf8` to reasons spans could be rejected * Update changelog * Update docs * Ensure test covers invalid UTF-8 and not slack time * add signals for duplicate rf1 data (#4296) Signed-off-by: Joe Elliott <[email protected]> * Bump anchore/sbom-action from 0.17.5 to 0.17.7 (#4307) Bumps [anchore/sbom-action](https://github.com/anchore/sbom-action) from 0.17.5 to 0.17.7. - [Release notes](https://github.com/anchore/sbom-action/releases) - [Changelog](https://github.com/anchore/sbom-action/blob/main/RELEASE.md) - [Commits](anchore/sbom-action@v0.17.5...v0.17.7) --- updated-dependencies: - dependency-name: anchore/sbom-action dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * docs: Update readme with explore traces info (#4263) * docs: Update readme with explore traces info Co-authored-by: Kim Nylander <[email protected]> * chore: remove spanlogger (#4312) * chore: remove spanlogger * Query-Frontend: Add middleware to drop headers (#4298) * header strip ware Signed-off-by: Joe Elliott <[email protected]> * comment Signed-off-by: Joe Elliott <[email protected]> * changelog Signed-off-by: Joe Elliott <[email protected]> * remove header strip wear from metrics summary Signed-off-by: Joe Elliott <[email protected]> --------- Signed-off-by: Joe Elliott <[email protected]> * Increase length of time compactions have to fail (#4315) * increase length of time compactions have to fail Signed-off-by: Joe Elliott <[email protected]> * gen Signed-off-by: Joe Elliott <[email protected]> --------- Signed-off-by: Joe Elliott <[email protected]> * docs: mark serverless as deprecated (#4017) * docs: mark serverless as deprecated * Changelog + readme * docs: Remove duplicated examples (#4295) This removes duplicates examples from the Configure TraceQL metrics page. Signed-off-by: Alex Bikfalvi <[email protected]> * tempo-cli: support dropping multiple traces in a single operation (#4266) * tempo-cli: support dropping multiple traces in a single operation * update final log message --------- Co-authored-by: Suraj Nath <[email protected]> * [DOC] Add clarification for metrics summary and traceQL metrics (#4316) * Add clarification for metrics summary and traceQL metrics * Apply suggestions from code review Co-authored-by: Jennifer Villa <[email protected]> * Update docs/sources/tempo/api_docs/metrics-summary.md --------- Co-authored-by: Jennifer Villa <[email protected]> * TraceQL metrics time range fixes (#4325) * Disconnect job time range filtering from step, so that results in split backend/recent range is accurate * changelog * Fix to assert metrics query range before alignment because alignment may increase it, which is not the responsibility of the caller to account for (#4331) * Add doc about configuring TLS with Helm (#4328) * Add doc about configuring TLS with Helm * Add memberlist and readinessProbe to example * Include server config for listening on TLS * Add note about scraping * Update docs/sources/tempo/configuration/network/tls.md Co-authored-by: Markus Toivonen <[email protected]> * Update docs/sources/tempo/configuration/network/tls.md Co-authored-by: Kim Nylander <[email protected]> * Update docs/sources/tempo/configuration/network/tls.md Co-authored-by: Kim Nylander <[email protected]> * Add memcached config for TLS --------- Co-authored-by: Markus Toivonen <[email protected]> Co-authored-by: Kim Nylander <[email protected]> * [DOC] Add TLS info to Helm chart doc (#4334) * fix deprecation warning by switching to DoBatchWithOptions (#4343) Signed-off-by: Daniel Strobusch <[email protected]> * bump dskit to v0.0.0-20241115082728-f2a7eb3aa0e9 to leverage benefits for context causes for DoBatch calls. (#4341) See grafana/dskit#576 Signed-off-by: Daniel Strobusch <[email protected]> * Bump github.com/minio/minio-go/v7 from 7.0.70 to 7.0.80 (#4282) * Bump github.com/minio/minio-go/v7 from 7.0.70 to 7.0.80 Bumps [github.com/minio/minio-go/v7](https://github.com/minio/minio-go) from 7.0.70 to 7.0.80. - [Release notes](https://github.com/minio/minio-go/releases) - [Commits](minio/minio-go@v7.0.70...v7.0.80) --- updated-dependencies: - dependency-name: github.com/minio/minio-go/v7 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> * Update serverless vendor --------- Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Zach Leslie <[email protected]> * update default config values to better align with production workloads (#4340) * update default config values to better align with production workloads * Update CHANGELOG.md and config docs * Ingester memory improvements by adjusting prealloc (#4344) * remove trace ids Signed-off-by: Joe Elliott <[email protected]> * linear buckets Signed-off-by: Joe Elliott <[email protected]> * changelog Signed-off-by: Joe Elliott <[email protected]> * tuney tune Signed-off-by: Joe Elliott <[email protected]> * metric misses and increase pool size Signed-off-by: Joe Elliott <[email protected]> * lint Signed-off-by: Joe Elliott <[email protected]> --------- Signed-off-by: Joe Elliott <[email protected]> * Bump github.com/Azure/azure-sdk-for-go/sdk/azcore from 1.13.0 to 1.16.0 (#4302) * Bump github.com/Azure/azure-sdk-for-go/sdk/azcore from 1.13.0 to 1.16.0 Bumps [github.com/Azure/azure-sdk-for-go/sdk/azcore](https://github.com/Azure/azure-sdk-for-go) from 1.13.0 to 1.16.0. - [Release notes](https://github.com/Azure/azure-sdk-for-go/releases) - [Changelog](https://github.com/Azure/azure-sdk-for-go/blob/main/documentation/release.md) - [Commits](Azure/azure-sdk-for-go@sdk/azcore/v1.13.0...sdk/azcore/v1.16.0) --- updated-dependencies: - dependency-name: github.com/Azure/azure-sdk-for-go/sdk/azcore dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> * Update serverless vendor --------- Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Zach Leslie <[email protected]> * Use Prometheus fast regexp (#4329) * basic integration Signed-off-by: Joe Elliott <[email protected]> * patch tests for new meaning Signed-off-by: Joe Elliott <[email protected]> * patch up more tests Signed-off-by: Joe Elliott <[email protected]> * add basic tests Signed-off-by: Joe Elliott <[email protected]> * changelog + docs Signed-off-by: Joe Elliott <[email protected]> * remove benches Signed-off-by: Joe Elliott <[email protected]> * Cleaned up + tests Signed-off-by: Joe Elliott <[email protected]> * comment Signed-off-by: Joe Elliott <[email protected]> * lint Signed-off-by: Joe Elliott <[email protected]> * Update docs/sources/tempo/traceql/_index.md Co-authored-by: Kim Nylander <[email protected]> * comment Signed-off-by: Joe Elliott <[email protected]> --------- Signed-off-by: Joe Elliott <[email protected]> Co-authored-by: Kim Nylander <[email protected]> * Fix broken link in service-graphs docs (#4351) * Fix minor typo in TraceQL docs (#4356) * Bump default memcached version (#4363) * Exemplar fixes (#4366) * Fix exemplars based on duration to convert to seconds, fix various other issues * changelog * fix: initialize histogram buckets to 0 to avoid them being downsampled (#4368) * initialized histogram buckets to 0 to avoid them being downsampled * Ingester/Generator Live trace cleanup (#4365) * moved trace sizes somewhere shareable Signed-off-by: Joe Elliott <[email protected]> * use tracesizes in ingester Signed-off-by: Joe Elliott <[email protected]> * make tests work Signed-off-by: Joe Elliott <[email protected]> * trace bytes in generator Signed-off-by: Joe Elliott <[email protected]> * remove traceCount Signed-off-by: Joe Elliott <[email protected]> * live trace shenanigans Signed-off-by: Joe Elliott <[email protected]> * changelog Signed-off-by: Joe Elliott <[email protected]> * Update modules/generator/processor/localblocks/livetraces.go Co-authored-by: Mario <[email protected]> * Update modules/ingester/instance.go Co-authored-by: Mario <[email protected]> * Test cleanup. Add sz test, restore commented out and fix e2e Signed-off-by: Joe Elliott <[email protected]> * remove todo comment Signed-off-by: Joe Elliott <[email protected]> --------- Signed-off-by: Joe Elliott <[email protected]> Co-authored-by: Mario <[email protected]> * Bump anchore/sbom-action from 0.17.7 to 0.17.8 (#4371) Bumps [anchore/sbom-action](https://github.com/anchore/sbom-action) from 0.17.7 to 0.17.8. - [Release notes](https://github.com/anchore/sbom-action/releases) - [Changelog](https://github.com/anchore/sbom-action/blob/main/RELEASE.md) - [Commits](anchore/sbom-action@v0.17.7...v0.17.8) --- updated-dependencies: - dependency-name: anchore/sbom-action dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Update for IDs change * Only run blockbuilder if ingest enabled --------- Signed-off-by: Joe Elliott <[email protected]> Signed-off-by: dependabot[bot] <[email protected]> Signed-off-by: Alex Bikfalvi <[email protected]> Signed-off-by: Daniel Strobusch <[email protected]> Co-authored-by: Javier Molina Reyes <[email protected]> Co-authored-by: Zach Leslie <[email protected]> Co-authored-by: Joe Elliott <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Ryan Perry <[email protected]> Co-authored-by: Kim Nylander <[email protected]> Co-authored-by: Suraj Nath <[email protected]> Co-authored-by: Alex Bikfalvi <[email protected]> Co-authored-by: Andrey Karpov <[email protected]> Co-authored-by: Jennifer Villa <[email protected]> Co-authored-by: Martin Disibio <[email protected]> Co-authored-by: Markus Toivonen <[email protected]> Co-authored-by: Daniel Strobusch <[email protected]> Co-authored-by: Carles Garcia <[email protected]>
What this PR does:
Adds a
Validate()
method to backend blocks and uses it in ingester startup to do basic confirmation of block health before reloading them. This prevents corrupt blocks from being loaded and eventually pushed to object storage.Which issue(s) this PR fixes:
Fixes #4166 as near as we can tell. At least it covers the instance seen internally.
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]