Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement partition compaction grouper #6172

Merged
merged 12 commits into from
Dec 31, 2024

Conversation

alexqyle
Copy link
Contributor

@alexqyle alexqyle commented Aug 20, 2024

What this PR does:

This PR implements partition compaction grouper.

Introduced new files for partition compaction:

  • partitioned_group_info: This file acts like a compaction plan. It contains the information that how source blocks from compaction time range being assigned to partitions for compaction. partitionedGroupID in the file is unique for particular time range.
  • partition_visit_marker: Visit marker file for each partition under compaction. This could prevent multiple compactors from working on the same partition compaction. Similar to block visit marker.

Here is high level algorithm of partition compaction grouper:

  1. Group blocks by time range
  2. Load existing partitioned_group_info files
  3. Gathering information of each time range and check which time range where grouper can take compaction job from
  4. Create partitioned groups from grouped blocks
  5. Sanitize partitions from each partitioned group
  6. Return ready to compact partitioned groups to Thanos for compaction

Introduced meta_extensions to save partition information of result block in meta.json. This infomation can be used to better assign block to proper partition in the next round of compaction.

Which issue(s) this PR fixes:
NA

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@danielblando
Copy link
Contributor

danielblando commented Dec 20, 2024

Overall looking good to me. Just a few comments.

For user to migrate. Would it just work changing the configuration and deploying? I would imagine yes as we would not find partition data to any block and treat them all as partitionId 0. Correct?

@alexqyle
Copy link
Contributor Author

Overall looking good to me. Just a few comments.

For customer to migrate. Would it just work changing the configuration and deploying? I would imagine yes as we would not find partition data to any block and treat them all as partitionId 0. Correct?

Yes. Switching back and forth between partitioning and non partitioning should not cause any issue. At most, the largest time range block would be recompacted one more time.

@danielblando
Copy link
Contributor

How it works while deployment is happening? Because we can have compactors creating blocks with partition and compactors creating others without and they are seeing different visit markers? Would it create duplicate compaction while deployment is happening?

@alexqyle
Copy link
Contributor Author

How it works while deployment is happening? Because we can have compactors creating blocks with partition and compactors creating others without and they are seeing different visit markers? Would it create duplicate compaction while deployment is happening?

If both are compacting the largest time range blocks, it would create duplicate blocks. For any lower level blocks, it would be compacted into higher level properly after deployment.

Signed-off-by: Alex Le <[email protected]>
Copy link
Contributor

@danielblando danielblando left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this work

Copy link
Contributor

@yeya24 yeya24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Mostly are nits.
We should also update https://cortexmetrics.io/docs/configuration/v1guarantees/ to mention it is experimental feature but we can do it after you finish all partition compactor PRs

pkg/compactor/compactor_metrics.go Outdated Show resolved Hide resolved
pkg/util/validation/limits.go Outdated Show resolved Hide resolved
}

func UpdatePartitionedGroupInfo(ctx context.Context, bkt objstore.InstrumentedBucket, logger log.Logger, partitionedGroupInfo PartitionedGroupInfo) (*PartitionedGroupInfo, error) {
existingPartitionedGroup, _ := ReadPartitionedGroupInfo(ctx, bkt, logger, partitionedGroupInfo.PartitionedGroupID)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it fine to ignore the error here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ignore error in order to always update partitioned group info. There is no harm to put latest version of partitioned group info which is supposed to be the correct grouping based on latest bucket store. We skip updating when the file exist just want to try best finishing previously generated plan. But even the previous partitioned group info got updated in the middle, the new file should consider already compacted partitions into account.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense. Can you add comment for the reason

pkg/compactor/compactor_metrics.go Outdated Show resolved Hide resolved
pkg/compactor/partition_compaction_grouper.go Outdated Show resolved Hide resolved
pkg/compactor/partition_compaction_grouper.go Outdated Show resolved Hide resolved
pkg/compactor/partition_compaction_grouper.go Outdated Show resolved Hide resolved
pkg/compactor/partition_compaction_grouper.go Outdated Show resolved Hide resolved
pkg/compactor/partition_compaction_grouper.go Show resolved Hide resolved
pkg/compactor/partition_visit_marker.go Outdated Show resolved Hide resolved
Signed-off-by: Alex Le <[email protected]>
Copy link
Contributor

@yeya24 yeya24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I think we need some document about how this works but can be done after everything is implemented

Signed-off-by: Alex Le <[email protected]>
@yeya24 yeya24 merged commit 4953086 into cortexproject:master Dec 31, 2024
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants