Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add progress status for partition rebalances #140

Open
wants to merge 21 commits into
base: main
Choose a base branch
from

Conversation

kyguy
Copy link
Member

@kyguy kyguy commented Nov 26, 2024

This proposal introduces a new feature to monitor the progression of an ongoing partition rebalance executed by a Strimzi-managed Cruise Control instance via a KafkaRebalance custom resource. Implementation of this proposal should help to address strimzi/strimzi-kafka-operator#10278

Copy link

@tomncooper tomncooper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had a first past. A lot of my comments are optional style/grammar/formatting suggestions, so feel free to ignore them.

My main comments are:

  • @scholzj makes a very good point about avoiding infinite reconciliation after a status update. You will need to solve that.
  • I think we should include a minimum estimated time for optimization proposals. Even if it is a ball park figure it is very useful guide. But lets see what others think.

088-rebalance-progress-status Outdated Show resolved Hide resolved
088-rebalance-progress-status Outdated Show resolved Hide resolved
088-rebalance-progress-status Outdated Show resolved Hide resolved
088-rebalance-progress-status Outdated Show resolved Hide resolved
088-rebalance-progress-status Outdated Show resolved Hide resolved
088-rebalance-progress-status Outdated Show resolved Hide resolved
088-rebalance-progress-status Outdated Show resolved Hide resolved
088-rebalance-progress-status Outdated Show resolved Hide resolved
088-rebalance-progress-status Outdated Show resolved Hide resolved
088-rebalance-progress-status Outdated Show resolved Hide resolved
Copy link
Contributor

@fvaleri fvaleri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @kyguy, this seems to be useful.

I left few comments for your consideration. Please, also fix formatting.

088-rebalance-progress-status Outdated Show resolved Hide resolved
088-rebalance-progress-status Outdated Show resolved Hide resolved
088-rebalance-progress-status Outdated Show resolved Hide resolved
Copy link
Contributor

@katheris katheris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally the proposal looks good to me. I agree with the comments from others and just had one comment about the field name of percentageComplete and a suggestion for an additional field we could include

088-rebalance-progress-status Outdated Show resolved Hide resolved
088-rebalance-progress-status Outdated Show resolved Hide resolved
088-rebalance-progress-status Outdated Show resolved Hide resolved
088-rebalance-progress-status Outdated Show resolved Hide resolved
088-rebalance-progress-status Outdated Show resolved Hide resolved
088-rebalance-progress-status Outdated Show resolved Hide resolved
088-rebalance-progress-status Outdated Show resolved Hide resolved
088-rebalance-progress-status Outdated Show resolved Hide resolved
088-rebalance-progress-status Outdated Show resolved Hide resolved
088-rebalance-progress-status Outdated Show resolved Hide resolved
@kyguy kyguy force-pushed the kr-exec-progress branch 7 times, most recently from d294906 to ff9df7e Compare December 11, 2024 00:07
@kyguy kyguy force-pushed the kr-exec-progress branch 8 times, most recently from a310c4d to b55824e Compare December 18, 2024 02:10
Signed-off-by: Kyle Liberti <[email protected]>
kyguy added 2 commits January 27, 2025 14:54
Signed-off-by: Kyle Liberti <[email protected]>
Signed-off-by: Kyle Liberti <[email protected]>
@kyguy
Copy link
Member Author

kyguy commented Jan 29, 2025

Thanks again to everyone for the other rounds of review, we are getting pretty close to a proposal which everyone feels comfortable with. I have gone through the threads and marked those addressed as "resolved" so we can focus on the open threads and most recent discussions. If you feel the any threads marked as "resolved" have not been addressed thoroughly feel free to mark them as "unresolved" and I will take a another look.

Right now, what's left:

  • A review of the updated text in the motivation section.
  • A review of the updated notation of the formulas.
  • A review of the updated progress fields we plan to display in each KafkaRebalace state.

088-rebalance-progress-status.md Outdated Show resolved Hide resolved
088-rebalance-progress-status.md Outdated Show resolved Hide resolved
088-rebalance-progress-status.md Outdated Show resolved Hide resolved
088-rebalance-progress-status.md Outdated Show resolved Hide resolved
088-rebalance-progress-status.md Outdated Show resolved Hide resolved
Signed-off-by: Kyle Liberti <[email protected]>
Signed-off-by: Kyle Liberti <[email protected]>
Copy link
Member

@ppatierno ppatierno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. There are a couple of nits but I am fine with the proposal overall, don't need another pass from my side. Thanks Kyle, nice work!

088-rebalance-progress-status.md Show resolved Hide resolved
088-rebalance-progress-status.md Outdated Show resolved Hide resolved
Copy link
Contributor

@tinaselenge tinaselenge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @kyguy. The proposal looks good to me. I left just one question to clarify.

088-rebalance-progress-status.md Show resolved Hide resolved
Copy link

@tomncooper tomncooper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple of small nits and some clarifying comments that need adding IMHO. Then happy to approve.

088-rebalance-progress-status.md Show resolved Hide resolved
088-rebalance-progress-status.md Outdated Show resolved Hide resolved
088-rebalance-progress-status.md Show resolved Hide resolved
088-rebalance-progress-status.md Show resolved Hide resolved
The rebalance is complete so we hardcode the value to `0`
This emphasizes that the rebalance is complete and helps clear up ambiguity surrounding what the `Ready` state means in the `KafkaRebalance` resource.

### Field: `completedByteMovementPercentage`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "byte" part of this smells wrong, why not "data"?

Copy link
Member Author

@kyguy kyguy Feb 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was raised in an earlier comment that "data" movement could be misinterpreted as "partition movement" instead of "byte movement". Naming this field with "byte" removes ambiguity surrounding what is being measured.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure it would be easy to mix up data and partitions personally, but ok. I still think it would be better to have completedDataMovementPercentage and completedPartitionMovementPercentage then you can't mix them up.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think it would be better to have completedDataMovementPercentage and completedPartitionMovementPercentage then you can't mix them up.

I was worried that information from these two fields would be too similar, therefore, I was hoping to only supply one of them. However, I want the field name(s) to be as clear as possible to everyone. I am open to including both, it was something which @katheris raised in an earlier review too.

I would be interested in what @scholzj thinks of this.

$$

**Notes:**
- $DMP$: The percentage of byte data that has been moved as a rounded down integer in the range [0-100], the value of the `completedByteMovementPercentage` field.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"byte data" sounds weird? You can probably just drop it an use data instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does sound a little weird, the addition of "byte" here is related to the comment above, to clear up any ambiguity surrounding what kind of data is being moved.

What if we dropped "data" and just used "bytes" here instead?

088-rebalance-progress-status.md Show resolved Hide resolved
Copy link
Contributor

@fvaleri fvaleri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank @kyguy for addressing my comments and refining the proposal.

I left few more to consider, but the overall approach looks good.

Comment on lines +70 to +71
Since the progress information is constant, we can safely add it to the existing `ConfigMap` maintained for and tied to the `KafkaRebalance` resource.
This keeps `KafkaRebalance` information organized in one place, simplifies the proposal implementation, and has insignificant impact on the storage of the `ConfigMap`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure that mixing unrelated information in the same CM is actually a good idea? What's the complication of having a dedicated progress CM?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure that mixing unrelated information in the same CM is actually a good idea?

Is the load and progress information really unrelated? Couldn't we as easily think of the information as being related to a specific rebalance?

What's the complication of having a dedicated progress CM?

We were debating whether or not to have a dedicated progress CM. One of the arguments against having a dedicated CM was that it would require extra API calls and code wrangling in the KafkaRebalanceAssemblyOperator class all while we already maintain ConfigMap for a KafkaRebalance resource with plenty of space for the progress information. We would also need to change the name of the existing ConfigMap to differentiate it from a new one since the existing name matches the name KafkaRebalance resource.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah having just one simplifies the implementation as well. At the same time one place to look at for the user. Taking into account the amount the information we are adding, it sounded not taking any advantage from having two CMs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarifications. I can live with the single CM, but then I would prefer a single status field with a meaningful name, deprecating the old one. Have you considered this alternative design?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can live with the single CM, but then I would prefer a single status field with a meaningful name, deprecating the old one. Have you considered this alternative design?

As in deprecating .status.optimizationResult.afterBeforeLoadConfigMap in the Kafka resource in favor of some new field like .status.rebalanceConfigMap, right?

This is a fair point.

API changes like this are definitely something to keep in mind before we move to our first major version of Strimzi. This comment along with the comment below are making me the more about the field name progress.rebalanceProgressConfigMap. Maybe it would be more prudent and future-proof to name the field something more generic like status.rebalanceConfigMap, especially if we plan on keeping the information from a rebalance, broker load and progress information, consolidated in a single ConfigMap. I guess it comes down to whether or not we plan on adding any additional rebalance information in the future that would require more space than a single ConfigMap could handle.

If users ever wanted the additional verbose output from the executor state we would definitely need the space of another ConfigMap and a separate field to point to that ConfigMap in the status. For this reason I am still leaning towards keeping separate, distinct fields but I am open to having a single field.

What do you think @fvaleri? Do you still think it would be better to have a single field?

Interested in what @ppatierno and @tomncooper think about too

Copy link
Contributor

@fvaleri fvaleri Feb 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for considering this change. As always, naming things is hard.

The following schema should give more flexibility:

# this proposal
status:
  loadAndProgressConfigMaps:
    - my-rebalance

# glimpse into a possible future
status:
  loadAndProgressConfigMaps:
    - my-rebalance-load
    - my-rebalance-progress
    - my-rebalance-progress-verbose

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this proposal we have 2 fields (load and progress) referencing the same CM. If that CM every becomes to large then we could have each reference its own CM. If either of them become too big we probably need a whole different way of communicating that information to the user.

So I think it is better to stick with the current plan than deprecate an existing field and add 2 more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I agree with Tom subscribing what he said. The proposal from Fede looks to be more "complicated" imho.

Copy link
Member Author

@kyguy kyguy Feb 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless there are any objections, let's stick with the current plan, we should have enough flexibility for now and the future. We can revisit if that changes in the future.

088-rebalance-progress-status.md Show resolved Hide resolved
Comment on lines +58 to +59
progress: [1]
rebalanceProgressConfigMap: my-rebalance [2]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the value of having this nested structure compared to just .status.rebalanceProgressConfigMap?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To further organize/distinguish the progress information from the other status fields.

Signed-off-by: Kyle Liberti <[email protected]>
Copy link
Contributor

@fvaleri fvaleri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Left a suggestion for improving the status part.

Copy link

@tomncooper tomncooper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM @kyguy. My only comment is that completedByteMovement sounds weird and using data would be better, but that is not a blocker for me.

Copy link
Contributor

@PaulRMellor PaulRMellor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like a very useful addition. I left a few minor comments as I read through.

088-rebalance-progress-status.md Outdated Show resolved Hide resolved
088-rebalance-progress-status.md Outdated Show resolved Hide resolved

[3] The “non-verbose” JSON payload from [/kafkacruisecontrol/state?substates=executor](https://github.com/linkedin/cruise-control/wiki/REST-APIs#query-the-state-of-cruise-control) endpoint.

[4] The broker load from the optimization proposal as a JSON string that already maintained in the `ConfigMap`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
[4] The broker load from the optimization proposal as a JSON string that already maintained in the `ConfigMap`.
[4] The broker load from the optimization proposal as a JSON string that is already maintained in the `ConfigMap`.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is available for all states, but should we describe this in the table?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The broker load information is available for all these states but I don't think we should not include it in the table since it isn't progress information that we are adding as part of this proposal. That being said, I think we should add a table like this in the documentation including the broker load information for the implementation of this proposal.

088-rebalance-progress-status.md Show resolved Hide resolved
088-rebalance-progress-status.md Outdated Show resolved Hide resolved
088-rebalance-progress-status.md Outdated Show resolved Hide resolved
Copy link
Contributor

@tinaselenge tinaselenge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @kyguy for the proposal.

Signed-off-by: Kyle Liberti <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants