-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add progress status for partition rebalances #140
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Kyle Liberti <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had a first past. A lot of my comments are optional style/grammar/formatting suggestions, so feel free to ignore them.
My main comments are:
- @scholzj makes a very good point about avoiding infinite reconciliation after a status update. You will need to solve that.
- I think we should include a minimum estimated time for optimization proposals. Even if it is a ball park figure it is very useful guide. But lets see what others think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @kyguy, this seems to be useful.
I left few comments for your consideration. Please, also fix formatting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally the proposal looks good to me. I agree with the comments from others and just had one comment about the field name of percentageComplete and a suggestion for an additional field we could include
d294906
to
ff9df7e
Compare
Signed-off-by: Kyle Liberti <[email protected]>
ff9df7e
to
0f58cbb
Compare
Signed-off-by: Kyle Liberti <[email protected]>
a310c4d
to
b55824e
Compare
Signed-off-by: Kyle Liberti <[email protected]>
b55824e
to
bc7f1ed
Compare
Signed-off-by: Kyle Liberti <[email protected]>
Signed-off-by: Kyle Liberti <[email protected]>
Thanks again to everyone for the other rounds of review, we are getting pretty close to a proposal which everyone feels comfortable with. I have gone through the threads and marked those addressed as "resolved" so we can focus on the open threads and most recent discussions. If you feel the any threads marked as "resolved" have not been addressed thoroughly feel free to mark them as "unresolved" and I will take a another look. Right now, what's left:
|
Signed-off-by: Kyle Liberti <[email protected]>
Signed-off-by: Kyle Liberti <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. There are a couple of nits but I am fine with the proposal overall, don't need another pass from my side. Thanks Kyle, nice work!
Signed-off-by: Kyle Liberti <[email protected]>
Signed-off-by: Kyle Liberti <[email protected]>
Signed-off-by: Kyle Liberti <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @kyguy. The proposal looks good to me. I left just one question to clarify.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a couple of small nits and some clarifying comments that need adding IMHO. Then happy to approve.
The rebalance is complete so we hardcode the value to `0` | ||
This emphasizes that the rebalance is complete and helps clear up ambiguity surrounding what the `Ready` state means in the `KafkaRebalance` resource. | ||
|
||
### Field: `completedByteMovementPercentage` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "byte" part of this smells wrong, why not "data"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was raised in an earlier comment that "data" movement could be misinterpreted as "partition movement" instead of "byte movement". Naming this field with "byte" removes ambiguity surrounding what is being measured.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure it would be easy to mix up data and partitions personally, but ok. I still think it would be better to have completedDataMovementPercentage
and completedPartitionMovementPercentage
then you can't mix them up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think it would be better to have
completedDataMovementPercentage
andcompletedPartitionMovementPercentage
then you can't mix them up.
I was worried that information from these two fields would be too similar, therefore, I was hoping to only supply one of them. However, I want the field name(s) to be as clear as possible to everyone. I am open to including both, it was something which @katheris raised in an earlier review too.
I would be interested in what @scholzj thinks of this.
088-rebalance-progress-status.md
Outdated
$$ | ||
|
||
**Notes:** | ||
- $DMP$: The percentage of byte data that has been moved as a rounded down integer in the range [0-100], the value of the `completedByteMovementPercentage` field. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"byte data" sounds weird? You can probably just drop it an use data instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does sound a little weird, the addition of "byte" here is related to the comment above, to clear up any ambiguity surrounding what kind of data is being moved.
What if we dropped "data" and just used "bytes" here instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank @kyguy for addressing my comments and refining the proposal.
I left few more to consider, but the overall approach looks good.
Since the progress information is constant, we can safely add it to the existing `ConfigMap` maintained for and tied to the `KafkaRebalance` resource. | ||
This keeps `KafkaRebalance` information organized in one place, simplifies the proposal implementation, and has insignificant impact on the storage of the `ConfigMap`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we sure that mixing unrelated information in the same CM is actually a good idea? What's the complication of having a dedicated progress CM?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we sure that mixing unrelated information in the same CM is actually a good idea?
Is the load and progress information really unrelated? Couldn't we as easily think of the information as being related to a specific rebalance?
What's the complication of having a dedicated progress CM?
We were debating whether or not to have a dedicated progress CM. One of the arguments against having a dedicated CM was that it would require extra API calls and code wrangling in the KafkaRebalanceAssemblyOperator
class all while we already maintain ConfigMap
for a KafkaRebalance
resource with plenty of space for the progress information. We would also need to change the name of the existing ConfigMap
to differentiate it from a new one since the existing name matches the name KafkaRebalance
resource.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah having just one simplifies the implementation as well. At the same time one place to look at for the user. Taking into account the amount the information we are adding, it sounded not taking any advantage from having two CMs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the clarifications. I can live with the single CM, but then I would prefer a single status field with a meaningful name, deprecating the old one. Have you considered this alternative design?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can live with the single CM, but then I would prefer a single status field with a meaningful name, deprecating the old one. Have you considered this alternative design?
As in deprecating .status.optimizationResult.afterBeforeLoadConfigMap
in the Kafka
resource in favor of some new field like .status.rebalanceConfigMap
, right?
This is a fair point.
API changes like this are definitely something to keep in mind before we move to our first major version of Strimzi. This comment along with the comment below are making me the more about the field name progress.rebalanceProgressConfigMap
. Maybe it would be more prudent and future-proof to name the field something more generic like status.rebalanceConfigMap
, especially if we plan on keeping the information from a rebalance, broker load and progress information, consolidated in a single ConfigMap
. I guess it comes down to whether or not we plan on adding any additional rebalance information in the future that would require more space than a single ConfigMap
could handle.
If users ever wanted the additional verbose output from the executor state we would definitely need the space of another ConfigMap
and a separate field to point to that ConfigMap
in the status. For this reason I am still leaning towards keeping separate, distinct fields but I am open to having a single field.
What do you think @fvaleri? Do you still think it would be better to have a single field?
Interested in what @ppatierno and @tomncooper think about too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for considering this change. As always, naming things is hard.
The following schema should give more flexibility:
# this proposal
status:
loadAndProgressConfigMaps:
- my-rebalance
# glimpse into a possible future
status:
loadAndProgressConfigMaps:
- my-rebalance-load
- my-rebalance-progress
- my-rebalance-progress-verbose
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With this proposal we have 2 fields (load and progress) referencing the same CM. If that CM every becomes to large then we could have each reference its own CM. If either of them become too big we probably need a whole different way of communicating that information to the user.
So I think it is better to stick with the current plan than deprecate an existing field and add 2 more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I agree with Tom subscribing what he said. The proposal from Fede looks to be more "complicated" imho.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless there are any objections, let's stick with the current plan, we should have enough flexibility for now and the future. We can revisit if that changes in the future.
progress: [1] | ||
rebalanceProgressConfigMap: my-rebalance [2] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the value of having this nested structure compared to just .status.rebalanceProgressConfigMap
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To further organize/distinguish the progress information from the other status fields.
Signed-off-by: Kyle Liberti <[email protected]>
f971608
to
a3ea194
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Left a suggestion for improving the status part.
Signed-off-by: Kyle Liberti <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM @kyguy. My only comment is that completedByteMovement
sounds weird and using data
would be better, but that is not a blocker for me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds like a very useful addition. I left a few minor comments as I read through.
088-rebalance-progress-status.md
Outdated
|
||
[3] The “non-verbose” JSON payload from [/kafkacruisecontrol/state?substates=executor](https://github.com/linkedin/cruise-control/wiki/REST-APIs#query-the-state-of-cruise-control) endpoint. | ||
|
||
[4] The broker load from the optimization proposal as a JSON string that already maintained in the `ConfigMap`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[4] The broker load from the optimization proposal as a JSON string that already maintained in the `ConfigMap`. | |
[4] The broker load from the optimization proposal as a JSON string that is already maintained in the `ConfigMap`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this is available for all states, but should we describe this in the table?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The broker load information is available for all these states but I don't think we should not include it in the table since it isn't progress information that we are adding as part of this proposal. That being said, I think we should add a table like this in the documentation including the broker load information for the implementation of this proposal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @kyguy for the proposal.
Signed-off-by: Kyle Liberti <[email protected]>
This proposal introduces a new feature to monitor the progression of an ongoing partition rebalance executed by a Strimzi-managed Cruise Control instance via a
KafkaRebalance
custom resource. Implementation of this proposal should help to address strimzi/strimzi-kafka-operator#10278