-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add progress status for partition rebalances #140
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Kyle Liberti <[email protected]>
088-rebalance-progress-status
Outdated
### Progress Update Cadence | ||
|
||
For ease of implementation and minimizing the load on the CruiseControl REST API server, we would only query the CruiseControlState endpoint and update the “progress” section upon `KafkaRebalance` resource reconciliation. | ||
The progress section will never be more out of date longer than the reconciliation period and even if the rebalance runs into an error or “NotReady” state, the “progress” section would still be updated on that KafkaRebalance resource reconciliation along with any error. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do you avoid tight reconciliation loop as update to the status will trigger new reconciliation that will update the status, trigger new reconciliation etc.?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a very good point. Maybe we need to post a timestamp of last progress check and if it is less than the reconciliation period then skip?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The general rule is to not include things like that in the status. Using some timestamp for that would probably need to be handled when getting the progress data and not when updating the status, as that is a shared code and it might be complicated to put it there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The progress section will never be more out of date longer than the reconciliation period
this part might not be true always. For example, if CC REST API returned an error for some reason and the executor state could not retrieved, would we wait for the next reconciliation to retry? In which case, the progress section would be out of date.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a very good point. Maybe we need to post a timestamp of last progress check and if it is less than the reconciliation period then skip?
The general rule is to not include things like that in the status. Using some timestamp for that would probably need to be handled when getting the progress data and not when updating the status, as that is a shared code and it might be complicated to put it there.
I should be able to use the existing timestamp in metadata.managedFields[].time
field of KafkaRebalance
resource to know when the resource was last updated, then only update the progress section if that timestamp is older than the reconciliation period.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you should rely on metadata.managedFields[].time as those are internal Kafka fields with completely different purposes.
If we are not allowed to maintain a timestamp in the progress section specifying when it was last changed or rely on the metadata.managedFields[].time
field of the custom resource then we will either have to find another way of tracking when the resource was last updated or try another approach for preventing tight reconciliation loops.
I'll see what I come up with and get back to you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure you cannot have a timestamp in the status. The question is how you work with the timestamp, how you use it, and when/how you update it. But in general, the easiest solution is to store the progress in a config map which you can simply update in very reconciliation and as you don't watch you do not need to b worried about what it triggers. Event might be other option for the progress tracking maybe? I do not like them very much and I think they are pretty useless for tracking the restart events. But if you publish the events to the KafkaRebaance resource, it might be more useful than for Pods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure you cannot have a timestamp in the status. The question is how you work with the timestamp, how you use it, and when/how you update it.
If we were to maintain a timestamp in the status we would add it with the progress
section upon initial creation, then update it on the next reconciliation when its value was older than reconciliation period. With a timestamp of when the progress
was last changed, we could easily avoid triggering unwanted reconciliations.
But in general, the easiest solution is to store the progress in a config map which you can simply update in very reconciliation and as you don't watch you do not need to b worried about what it triggers.
This is a really interesting idea. TBH I hadn't thought of storing the progress information in a ConfigMap
instead of the KafkaRebalance
status. We were planning on maintaining a ConfigMap
for executor state information anyway and yes, in this way we could avoid triggering reconciliations upon progress
updates. We would still need to add a progress
section with a reference to the ConfigMap
but this would only need to be added/removed once per state change.
Although maintaining the progress
information in the ConfigMap
would be the simplest solution, I still feel that the UX of maintaining the progress
information in the KafkaRebalance
status would still be worth the added implementation complexity. Any thoughts on this @ppatierno @tomncooper ?
Event might be other option for the progress tracking maybe? I do not like them very much and I think they are pretty useless for tracking the restart events. But if you publish the events to the KafkaRebaance resource, it might be more useful than for Pods.
I hadn't thought of using events either but I'll think more on this. My only concern for storing the progress
in the events would be the UX of getting the progress information, the initial idea was that the progress
information would be easily found and read by users in the KafkaRebalance
resource.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we were to maintain a timestamp in the status we would add it with the progress section upon initial creation, then update it on the next reconciliation when its value was older than reconciliation period. With a timestamp of when the progress was last changed, we could easily avoid triggering unwanted reconciliations.
Yes, you would need some custom logic such as if the timestamp is older than X minutes, update the progress. If not, just reuse the old progress
.
I hadn't thought of using events either but I'll think more on this. My only concern for storing the progress in the events would be the UX of getting the progress information, the initial idea was that the progress information would be easily found and read by users in the KafkaRebalance resource.
kubectl describe kr
should show you the events I think. Also most UIs would normally show the events when you list the custom resource.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed w/ Paolo and Tom, they agreed storing the progress
information in the ConfigMap
would simplify the implementation and that doing so wouldn't significantly change the UX. Given the executor state information is already going to be stored in the ConfigMap
it probably makes the most sense to maintain our progress information there as well. Let me update the proposal to show what it would look like
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had a first past. A lot of my comments are optional style/grammar/formatting suggestions, so feel free to ignore them.
My main comments are:
- @scholzj makes a very good point about avoiding infinite reconciliation after a status update. You will need to solve that.
- I think we should include a minimum estimated time for optimization proposals. Even if it is a ball park figure it is very useful guide. But lets see what others think.
088-rebalance-progress-status
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Can you add the .md
suffix so GH can apply the right syntax highlighting.
088-rebalance-progress-status
Outdated
|
||
In this “progress” section, we include the following fields: | ||
|
||
- estimatedTimeToCompletion: The minimum estimated amount time it will take in minutes until partition rebalance is complete. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Judging from the formula used from this, this value is a prediction based on the past average data transfer rate. The rate could increase in future, so this estimation is not a minimum.
088-rebalance-progress-status
Outdated
|
||
### Supported KafkaRebalance States | ||
|
||
For initial implementation we will focus on including the “progress” section only in the following KafkaRebalance states: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For initial implementation we will focus on including the “progress” section only in the following KafkaRebalance states: | |
For the initial implementation, we will focus on including the “progress” section only in the following KafkaRebalance states: |
088-rebalance-progress-status
Outdated
|
||
helps users understand the cost of an ongoing partition rebalance, decide whether or not they should continue or cancel it, and know when future operations will be able to be safely executed. | ||
|
||
Further, having this information readily available and easily accessible via `KafkaRebalance` custom resources allows users and third-party tools like the Kubernetes CLI or Strimzi Console to easily track the progression of a partition rebalance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Further, having this information readily available and easily accessible via `KafkaRebalance` custom resources allows users and third-party tools like the Kubernetes CLI or Strimzi Console to easily track the progression of a partition rebalance. | |
Further, having this information readily available and easily accessible via `KafkaRebalance` custom resources, allows users and third-party tools like the Kubernetes CLI or Strimzi Console to easily track the progression of a partition rebalance. |
088-rebalance-progress-status
Outdated
- How much time an ongoing partition rebalance has left to take | ||
- How much data an ongoing partition rebalance has left to transfer | ||
|
||
helps users understand the cost of an ongoing partition rebalance, decide whether or not they should continue or cancel it, and know when future operations will be able to be safely executed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style Nit: I am not sure the bullet points make things clearer? Feels a bit disjointed when you read it. Maybe just make this a single sentence?
088-rebalance-progress-status
Outdated
|
||
#### Adding “progress” section for other KafkaRebalance states | ||
|
||
In addition to the “progress” the “Rebalancing” and “Stopped” KafkaRebalance states, we could provide the “progress” section for other states as well such as the “ProposalReady” and “Ready” states. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have previously suggested putting these states in code quotes (``
), but double-quotes is fine too, so long as they are consistent throughout the doc.
088-rebalance-progress-status
Outdated
|
||
In addition to the “progress” the “Rebalancing” and “Stopped” KafkaRebalance states, we could provide the “progress” section for other states as well such as the “ProposalReady” and “Ready” states. | ||
Firstly, this would help emphasize that a rebalance had not started or had completed by having a percentageComplete: 0% on "ProposalReady" and a percentageComplete: 100% on "Ready". | ||
This emphasis could help clear up ambiguity surrounding what the KafkaRebalance “Ready” state or “optimizationResult” field means. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be nice.
088-rebalance-progress-status
Outdated
This feature would be of great value to users. | ||
However, providing an accurate estimation for this is non-trivial, namely the “estimatedTimeToCompletion” field for “ProposalReady" state, is non-trivial. | ||
|
||
Leveraging the Cruise Control configurations and user-provided network capacity settings, we could provide a rough estimate for “estimatedTimeToCompletetion” field for inter-broker balances. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leveraging the Cruise Control configurations and user-provided network capacity settings, we could provide a rough estimate for “estimatedTimeToCompletetion” field for inter-broker balances. | |
Leveraging the Cruise Control configurations and user-provided network capacity settings, we could provide a rough estimate for “estimatedTimeToCompletetion” field for inter-broker movements. |
088-rebalance-progress-status
Outdated
# The maximum number of partition movements given CC partition movement cap | ||
max_partition_movements= min(<# of brokers> * | ||
num.concurrent.partition.movements.per.broker) | ||
max_partition_movements=min(max_partition_movements, max.num.cluster.partition.movements) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are these two variables called the same thing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering the same thing, like are we overwriting the previous value? I think a different name would be better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed with the reformatting
088-rebalance-progress-status
Outdated
estimatedTimeToCompletion = intraBrokerDataToMoveMB / throughput | ||
``` | ||
|
||
Given that its inclusion is not completely necessary and adds significant complexity to the proposal, it is out of scope for this proposal. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could just ignore intra-broker movements and just base the estimate on inter-broker movements (they will take up the bulk of the time anyway). We could document it as a theoretical minimum and state that it will take longer than this. But it would give a ball park estimate, which is better than the current situation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could just ignore intra-broker movements and just base the estimate on inter-broker movements (they will take up the bulk of the time anyway).
Assuming the disk throughput is always faster than the network throughput!
We could document it as a theoretical minimum and state that it will take longer than this. But it would give a ball park estimate, which is better than the current situation.
Let me think more on this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still looking into this, investigating possible alternatives and hacking it in the prototype to gauge how complicated it would be to implement. If it isn't too complicated, I'll add it into this proposal and we can aim for supporting all KafkaRebalance
states in one go
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @kyguy, this seems to be useful.
I left few comments for your consideration. Please, also fix formatting.
088-rebalance-progress-status
Outdated
[1] The “progress” section will be visible during the KafkaRebalance “Rebalancing” and “Stopped” states. | ||
[2] The minimum estimated time it will take the rebalance to complete. | ||
[3] The percentage complete of the ongoing rebalance in the range [0-100]% | ||
[4] The ConfigMap where “non-verbose” JSON payload from Executor State from CruiseControlState endpoint is stored. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to store the state in a config map? Maybe we could simply document how to recover that from the REST endpoint in case it is needed for troubleshooting.
088-rebalance-progress-status
Outdated
##### Rebalancing | ||
|
||
``` | ||
rate = (finishedDataMovement)/(<task_trigger_time> - <current_time>) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In order to avoid mistakes, we should always specify the unit in the variable's name (e.g. finishedDataMovementMB).
088-rebalance-progress-status
Outdated
When querying the Executor State of the CruiseControlState endpoint directly, we have the option to add a “verbose” parameter to request additional information surrounding the state. | ||
The additional information could be of interest to third-party UI tools for exposing more details of a rebalance or to users debugging a problematic rebalance at the partition level. | ||
However, to reduce the complexity of this initial enhancement, we have chosen not to use the “verbose” parameter. | ||
One concern is that some of the fields like the “pendingParitionMovements” field can cause the JSON output to grow quite large. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One concern is that some of the fields like the “pendingParitionMovements” field can cause the JSON output to grow quite large. | |
One concern is that some of the fields like the “pendingPartitionMovements” field can cause the JSON output to grow quite large. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally the proposal looks good to me. I agree with the comments from others and just had one comment about the field name of percentageComplete and a suggestion for an additional field we could include
088-rebalance-progress-status
Outdated
provisionRecommendation: "" | ||
provisionStatus: RIGHT_SIZED | ||
recentWindows: 1 | ||
progress: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
progress: | |
progress: [1] |
088-rebalance-progress-status
Outdated
In this “progress” section, we include the following fields: | ||
|
||
- estimatedTimeToCompletion: The minimum estimated amount time it will take in minutes until partition rebalance is complete. | ||
- percentageComplete: The percentage of the partition rebalance that is completed e.g. values in the range [0-100]% |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on the calculations listed below I wonder if we should be explicit that this the percentage based on the data movement, rather than percentage of partitions done. We could also consider adding a separate field for percentagePartitionMovementComplete, but depends if that would be interesting to people or not.
- percentageComplete: The percentage of the partition rebalance that is completed e.g. values in the range [0-100]% | |
- percentageDataMovementComplete: The percentage of the partition rebalance that is completed e.g. values in the range [0-100]% |
088-rebalance-progress-status
Outdated
|
||
- estimatedTimeToCompletion: The minimum estimated amount time it will take in minutes until partition rebalance is complete. | ||
- percentageComplete: The percentage of the partition rebalance that is completed e.g. values in the range [0-100]% | ||
- rebalanceProgressConfigMap: The ConfigMap where “non-verbose” JSON payload from Executor State from CruiseControlState endpoint is stored. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would not reference the internal class CruiseControlState
representing such endpoint but more what's the real user facing REST endpoint, so /kafkacruisecontrol/state?substates=executor
088-rebalance-progress-status
Outdated
We could provide the “progress” section for other states as well such as the “ProposalReady” and “Ready” states but it is not completely necessary, nor is it trivial. | ||
Further explanation as to why that is and why it should be saved as a future improvement is explained in the Future Improvements section near the bottom of this proposal. | ||
|
||
All information required for estimating the values of “estimatedTimeToCompletion” and “percentageComplete” fields can be derived from either Cruise Control server configurations or CruiseControlState endpoint. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again let's refer to the user facing REST endpoint not the CruiseControlState
class.
088-rebalance-progress-status
Outdated
Further explanation as to why that is and why it should be saved as a future improvement is explained in the Future Improvements section near the bottom of this proposal. | ||
|
||
All information required for estimating the values of “estimatedTimeToCompletion” and “percentageComplete” fields can be derived from either Cruise Control server configurations or CruiseControlState endpoint. | ||
That being said, the method of estimation for these fields depends on the state of the KafkaRebalance resource. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a language issue on my side but what do you mean by the "method of estimation ... depends on the state ..."?
088-rebalance-progress-status
Outdated
##### Stopped | ||
|
||
Once a rebalance has been stopped, it cannot be completed. | ||
Therefore, there is no “estimationTimeToCompletion” for a stopped rebalance, so we set estimatedTimeToCompletion = null to emphasize this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does it mean we set estimatedTimeToCompletion = null
in terms of custom resource? Do you really want something like estimatedTimeToCompletion: null
? Maybe N/A
or just removing the field? @tomncooper wdyt?
088-rebalance-progress-status
Outdated
|
||
#### rebalanceProgressConfigMap | ||
|
||
Will only be present in “Rebalancing” and “Stopped” states. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it mean that the ConfigMap is deleted when the rebalance is in the other states? Will this field be just removed from the progress as well? should we make it clearer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a line to make this clearer
088-rebalance-progress-status
Outdated
[1] The “progress” section will be visible during the KafkaRebalance “Rebalancing” and “Stopped” states. | ||
[2] The minimum estimated time it will take the rebalance to complete. | ||
[3] The percentage complete of the ongoing rebalance in the range [0-100]% | ||
[4] The ConfigMap where “non-verbose” JSON payload from Executor State from CruiseControlState endpoint is stored. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fvaleri which kind of troubleshooting are you talking about? The verbose information stored in the ConfigMap are coming from the state?substates=executor
endpoint which has data when something is running, otherwise it just returns a NO_TASK_IN_PROGRESS
so in case of issues, you can't get anything interesting from here AFAIK.
088-rebalance-progress-status
Outdated
|
||
The “non-verbose” JSON payload from the ExecutorState is already too verbose to include in the `KafkaRebalance` status in its entirety. | ||
However, having the information available to users is still useful especially when debugging the state of a partition rebalance. | ||
Therefore, we will store the JSON payload in its own ConfigMap, “rebalanceProgressConfigMap”. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also define the name of such ConfigMap. It's not configurable and user cannot decided this name. I think it will be something pre-formatted starting from the KafkaRebalance
name?
088-rebalance-progress-status
Outdated
|
||
Given that its inclusion is not completely necessary and adds significant complexity to the proposal, it is out of scope for this proposal. | ||
|
||
#### Configurable verbosity for Executor State |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure we want to mention this and also having it as a future improvement. Today, we cannot specify verbose
when getting the proposal as well. It's not exposed to the user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we worried that user's might ask for it if it is included in the proposal? Would we ever want to provide the verbose optimization proposal to the user in the future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know to both questions. We can anyway create a new proposal at some point for the verbosity configuration if someone will come to us and ask for that. I would just avoid to make commitment for the future right now. @tomncooper wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if we moved it to the "Rejected Alternatives" section? This way we avoid the commitment and we have the reasons why it was rejected documented there in case users ask for it in the future.
TBH I don't mind stripping the section out completely if we are worried keeping it will result in user fixation or confusion!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it would be a rejected alternatives, but just an addition. I am for removing this section.
d294906
to
ff9df7e
Compare
Signed-off-by: Kyle Liberti <[email protected]>
ff9df7e
to
0f58cbb
Compare
Signed-off-by: Kyle Liberti <[email protected]>
a310c4d
to
b55824e
Compare
Signed-off-by: Kyle Liberti <[email protected]>
b55824e
to
bc7f1ed
Compare
088-rebalance-progress-status.md
Outdated
|
||
## Proposal | ||
|
||
This proposal extends the status section of the `KafkaRebalance` custom resource to include a `progress` section with a nested `rebalanceProgressConfigMap` field that references a `ConfigMap` that contains information related to an ongoing partition rebalance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need a new CM? Can't we use the existing one from the proposal?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We were considering using the existing ConfigMap
, "afterBeforeLoadConfigMap" to store this progress information, but were concerned the additional data would contribute to hitting the 1 MB ConfigMap
limit sooner. It is not so much of an issue for the constant "non-verbose" executor state information we plan on providing as part of this proposal. However, if we were to extend the feature to provide the variable "verbose" executor state information in the future, it would increase the chance of hitting the limit for larger production clusters that have a larger number of brokers and partitions.
If we have no requests/plans for providing "verbose" executor state information in the future, I don't see much of a problem of storing the information in the existing ConfigMap
. At the least, it would simplify the proposal implementation . Any thoughts @ppatierno @tomncooper?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned before, I am not sure we'll never have the support for "verbose" so maybe we could think about the present and not the future. From this perspective, using the same ConfigMap seems to be reasonable.
Anyway you can keep the rebalanceProgressConfigMap
field pointing to that ConfigMap.
When/if one day we have support for "verbose", that field will just point to a different ConfigMap. It should not be a big issue for users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't have to use the same CM just because I asked about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No I just think it's a good compromise. Anyway let's see what @kyguy @tomncooper think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed offline with Paolo and Tom, since the progress information is constant, we can safely add it to the existing ConfigMap
maintained for and tied to the KafkadRebalance
resource. This keeps KafkaRebalance
information organized in one place, simplifies the proposal implementation, and has insignificant impact on the storage of the ConfigMap
. Refactored and added this note to the proposal.
088-rebalance-progress-status.md
Outdated
- lastTransitionTime: "2024-11-05T15:28:23.995129903Z" | ||
status: "True" | ||
type: Rebalancing | ||
message: "Failed to retrieve rebalance progress" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be probably a separate warning condition and not be part of the rebalancing condition as it is not clear what it means really?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you are right, just updated this to a "Warning condition"
088-rebalance-progress-status.md
Outdated
rebalanceProgressConfigMap: my-rebalance-progress | ||
``` | ||
|
||
### Future Improvements |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this chapter empty? Or do you need to fix the headers from here down?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed
088-rebalance-progress-status.md
Outdated
We could provide this information for other states as well, such as the `ProposalReady` and `Ready` states, but it is not completely necessary, nor is it trivial. | ||
Further discussion on the inclusion of the progress information for these other states can be found in the [Future Improvements](#future-improvements) section near the bottom of this proposal. | ||
|
||
All the information required for estimating the values of `estimatedTimeToCompletion` and `percentageDataMovementComplete` fields can be derived from either the Cruise Control server configurations or the [/kafkacruisecontrol/state?substates=executor](https://github.com/linkedin/cruise-control/wiki/REST-APIs#query-the-state-of-cruise-control) REST API endpoint. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"can be derived from either the Cruise Control server configurations or the REST API endpoint." ... this sounds like we have a choice from where estimating the values, while we should be clear in the proposal which one we are using, which I guess is REST API endpoint, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated "'or" -> "and" now that the ProposalReady
state is being implemented as part of the proposal. This statement now makes more sense since the ProposalReady
state estimations depend on the information from the Cruise Control server configurations, while the other states depend on the information from the /kafkacruisecontrol/state?substates=executor
.
088-rebalance-progress-status.md
Outdated
|
||
$$ | ||
\text{executorState} = \langle \text{Previous JSON payload from "/kafkacruisecontrol/state?substates=executor" endpoint} \rangle | ||
$$ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the value of showing the above formulas? I mean we are just saying the executorState
field will contain the above JSON returned by the state endpoint. I think we already mentioned it a few times, or?
088-rebalance-progress-status.md
Outdated
|
||
it is best if we maintain the progress information somewhere else. | ||
|
||
#### Including “ExecutorState” in “afterBeforeLoadConfigmap” |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we go with using the same CM we have to remove the section from here.
088-rebalance-progress-status.md
Outdated
- `Stopped` | ||
|
||
These are the states where this progress information will be able to be most accurately calculated and most useful for users. | ||
We could provide this information for other states as well, such as the `ProposalReady` and `Ready` states, but it is not completely necessary, nor is it trivial. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still argue that having a lower bound for the estimated time of a KafkaRebalance
in the ProposalReady
state, based on the proposed data to move and the throttle settings would be a useful thing to have. It would obviously not be accurate as there is not disk throughput available.
But it is still a useful guide and helps users gauge the impact of a proposed rebalance, which data-to-move values alone don't give.
088-rebalance-progress-status.md
Outdated
$$ | ||
|
||
Notes | ||
- [1] `finishedDataMovement` is the number of megabytes already moved by rebalance, provided by [/kafkacruisecontrol/state?substates=executor](#field-executorstate) REST API endpoint. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need the numerical indicators ([1]
) in the formulas. You state the full name of the variable anyway so they don't help.
088-rebalance-progress-status.md
Outdated
Knowing things like how much time an ongoing partition rebalance has left to take and how much data an ongoing partition rebalance has left to transfer helps users understand the cost of an ongoing partition rebalance. | ||
This information helps users decide whether they should continue or cancel an ongoing rebalance, and know when future operations will be able to be safely executed. | ||
|
||
Further, having this information readily available and easily accessible via Kubernetes primitives, allows users and third-party tools like the Kubernetes CLI or Strimzi Console to easily track the progression of a partition rebalance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a simple example of how the Kubernetes CLI could be used to get the Rebalance progress information?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Strimzi Console"? I think you mean the StreamsHub Console.
088-rebalance-progress-status.md
Outdated
[2] The `ConfigMap` containing information related to the ongoing partition rebalance, generated with the name "<kafka_rebalance_resource_name>-progress". | ||
|
||
In the `ConfigMap`, we will include the following fields: | ||
- **estimatedTimeToCompletion**: The estimated amount time it will take in minutes until partition rebalance is complete. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like in optimizationResult
, I think that it would be good to add a unit suffix to this key, i.e. estimatedCompletionTimeInMinutes
. IMO, we should follow this pattern in general, and avoid using unit suffixes in values. This would also make the formulas self explanatory.
088-rebalance-progress-status.md
Outdated
|
||
In the `ConfigMap`, we will include the following fields: | ||
- **estimatedTimeToCompletion**: The estimated amount time it will take in minutes until partition rebalance is complete. | ||
- **percentageDataMovementComplete**: The percentage of the data movement of the partition rebalance that is completed e.g. values in the range [0-100]% |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be named completedDataMovementPercentage
, similar to monitoredPartitionsPercentage
in optimizationResult
?
088-rebalance-progress-status.md
Outdated
"triggeredUserTaskId":"0230d401-6a36-430e-9858-fac8f2edde93" | ||
} | ||
``` | ||
[1] The estimated time it will take the rebalance to complete based on the average rate of data transfer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which time unit? There are other places in which the time unit is not defined.
Signed-off-by: Kyle Liberti <[email protected]>
088-rebalance-progress-status.md
Outdated
estimatedTimeToCompletionInMinutes: 5m [1] | ||
completedDataMovementPercentage: 80% [2] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should either have the unit in the value and have users parse it or have it in the name and use an integer / double only. I'm fine with both ways, but you should pick one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My view is ...
For the completedDataMovementPercentage
field it doesn't make much sense to have the %
symbol in the value so I would be to just remove it and having completedDataMovementPercentage: 80
.
Regarding the estimatedTimeToCompletionInMinutes
, it could depends if we want the flexibility of showing the value in a different unit, but I don't see any value in it. I mean we could have estimatedTimeToCompletion: 300000ms
or estimatedTimeToCompletion: 5m
to say the same but does it make really sense? for this reason I would be more for just estimatedTimeToCompletionInMinutes: 5
. The rebalancing is a long process and showing 1 minute or just 0 minute for a remaining time which is less than a minute could make sense (instead of something like estimatedTimeToCompletion: 36s
).
088-rebalance-progress-status.md
Outdated
- `ProposalReady` | ||
- `Rebalancing` | ||
- `Stopped` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What will be the actual values in these states?
- Proposal ready -> 0% completion and the estimated time from once it would be approved?
- Rebalancing -> an up to date information?
- Stopped -> the last infor before it was stopped?
- Ready -> 100% and 0 minutes remaining?
Maybe you can describe it here in bullet points in a human readable form and leave the formulas below for experts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a summary like this would be useful
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
088-rebalance-progress-status.md
Outdated
name: my-rebalance | ||
… | ||
data: | ||
estimatedTimeToCompletionInMinutes: 5m [1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is the estimation counted? Is it reliable? How much is it affected by the issues with unknown real network capacity?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is the estimation counted?
Depends on the KafkaRebalance
state, the specific details per state are in the "Field: estimatedTimeToCompletionInMinutes
" section of the proposal.
Is it reliable?
In general, yes. The value for the Stopped
and Ready
states are hardcoded and the value for the Rebalancing
state is based on the average rate of data transfer and easily calculated without the need of any user or capacity settings. The only state that could be potentially problematic is the estimation for the ProposalReady
state which relies on accurate network capacity configuration from the user.
How much is it affected by the issues with unknown real network capacity?
If the default or user-configured network capacity is largely different from the real network capacity, the estimation for the ProposalReady
state could be inaccurate. If the real network capacity is underestimated, the rebalance could take much less time than than estimatedTimeToCompletionInMinutes
to complete. If the real network capacity is overestimated, the rebalance could take much more time than the estimatedTimeToCompletionInMinutes
to complete. The latter case wouldn't be as much of an issue as we advertise estimatedTimeToCompletionInMinutes
to be a theoretical minimum estimation in the ProposalReady
state. However, the former case would be an issue since the estimatedTimeToCompletionInMinutes
value wouldn't be a theoretical minimum.
To avoid issues like this, the current plan is to document the users must provide accurate network capacity settings to have accurate estimatedTimeToCompletionInMinutes
values in the ProposalReady
state. We already documented that users must provide accurate network capacity settings to have accurate rebalances based on network capacity and distribution anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To avoid issues like this, the current plan is to document the users must provide accurate network capacity settings to have accurate estimatedTimeToCompletionInMinutes values in the ProposalReady state. We already documented that users must provide accurate network capacity settings to have accurate rebalances based on network capacity and distribution anyway.
I'm not sure this is a real solution. Do you really believe they configure the accurate network capacity
? Do we even know how would they find out the accurate network capacity
? Or will we solve it on paper but 99% or users will have it miscofigured and these numbers will be useless?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you really believe they configure the accurate network capacity?
Users that are serious about their network resource usage and distribution do
Do we even know how would they find out the accurate network capacity?
I imagine they would use K8s CNI plugins or network performance benchmark tools
Or will we solve it on paper but 99% or users will have it configured and these numbers will be useless?
I'm not sure this is a real solution.
For users that have network capacity properly configured, this feature is still useful. I admit that I don't know how many Strimzi users configure their network capacity settings but I would like to believe that those that are doing it are doing it accurately. In addition to the network capacity documentation, what if we were to only include this estimation in the ProposalReady
state for users that explicitly configured their network capacity settings? Would that be a more reasonable solution?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition to the network capacity documentation, what if we were to only include this estimation in the ProposalReady state for users that explicitly configured their network capacity settings? Would that be a more reasonable solution?
I guess it could be a viable solution. If the user is setting the network capacity I would assume they know what to put there, if not and they put a wrong/bad value, they should know that it's going to screw up the estimation. The documentation should state that. If we think it's not a viable solution then we should remove the estimatedTimeToCompletionInMinutes
from the overall proposal. But I am for taking it and documenting it properly.
@tomncooper wdyt about the above discussion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO PropsalReady
estimation is useful, because what really matters to users is to know if it would take minutes, hours, or days (see Windows file copy). If we could compute the average bandwidth from Kafka metrics, then we could use this value to provide a more accurate estimation independently from the user configuration.
088-rebalance-progress-status.md
Outdated
|
||
The estimated time it will take in minutes for a rebalance to complete based on the average rate of data transfer. | ||
|
||
The formulas used to calculate field value per `KafkaRebalance` state: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the formulas here ... will the CO calculate these? Or does CC calculate this and we just show the numbers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's the CO calculating these.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a sentence in the section above this sentence to make this more clear
"All the information required for the Cluster Operator to estimate the values of estimatedTimeToCompletionInMinutes
and completedDataMovementPercentage
fields"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The proposal clearly outlines the approach for providing progress status and how the information necessary for the calculations will be provided. I’ve left a couple of questions for clarification and some minor suggestions for consistency.
088-rebalance-progress-status.md
Outdated
progress: [1] | ||
rebalanceProgressConfigMap: my-rebalance [2] | ||
``` | ||
[1] The `progress` section will be visible during the `Ready`, `Rebalancing`, `Stopped` and `Ready` states. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why Stopped
but not PausedReconciliation
or Not Ready
? should we explain here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PausedReconciliation
is not a valid rebalancing state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why Stopped but not PausedReconciliation or Not Ready? should we explain here?
The PausedReconciliation
and NotReady
states are not related to the rebalance operation but more to the proposal genration. Therefore, these states don't have any rebalance progress information associated with them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PausedReconciliation and NotReady states are not related to the rebalance operation but more to the proposal genration.
Well, actually even during a rebalancing you can get errors from CC and the KafkaRebalance
ends in the NotReady
state, right? But PausedReconciliation
is not a valid rebalancing state at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, actually even during a rebalancing you can get errors from CC and the KafkaRebalance ends in the NotReady state, right?
Ah, yes you are right, any configuration or rebalance errors will put KafkaRebalance
resource in NotReady
state. Sorry @PaulRMellor, I was incorrect, NotReady
should be supported as well for the same reason the Stopped
state is supported, to show how far the rebalance got before it failed. That was a nice spot!
But PausedReconciliation is not a valid rebalancing state at all.
Is that because it is related to the resource and not the rebalance itself? What determines whether it is a valid rebalancing state? I am confused because there is an enum for PausedReconciliation
listed in the KafkaRebalanceState
class [1]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that because it is related to the resource and not the rebalance itself? What determines whether it is a valid rebalancing state? I am confused because there is an enum for PausedReconciliation listed in the KafkaRebalanceState class [1]
But it's ReconciliationPaused
not PausedReconciliation
! :-P
Joking apart ... I was trying to defend myself because I totally missed this state in the rebalance FSM :-D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't know about it either until Paul mentioned it!
088-rebalance-progress-status.md
Outdated
``` | ||
[1] The `progress` section will be visible during the `Ready`, `Rebalancing`, `Stopped` and `Ready` states. | ||
|
||
[2] The `ConfigMap` containing information related to the ongoing partition rebalance |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[2] The `ConfigMap` containing information related to the ongoing partition rebalance | |
[2] The `ConfigMap` containing information related to the ongoing partition rebalance. |
088-rebalance-progress-status.md
Outdated
In the `ConfigMap`, we will add the following fields: | ||
- **estimatedTimeToCompletionInMinutes**: The estimated amount time it will take in minutes until partition rebalance is complete. | ||
- **completedDataMovementPercentage**: The percentage of the data movement of the partition rebalance that is completed e.g. values in the range [0-100]% | ||
- **executorState**: The “non-verbose” JSON payload from the `/kafkacruisecontrol/state?substates=executor` endpoint. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should be more specific about the contents of the executorState field?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would not provide a detailed list of everything, maybe just a link to the OpenAPI definition of it on the Cruise Control repo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I agree you should give a brief summary of what this is and link to the OpenAPI def upstream.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added the link and text suggested by Paul which should be fair compromise
088-rebalance-progress-status.md
Outdated
In the `ConfigMap`, we will add the following fields: | ||
- **estimatedTimeToCompletionInMinutes**: The estimated amount time it will take in minutes until partition rebalance is complete. | ||
- **completedDataMovementPercentage**: The percentage of the data movement of the partition rebalance that is completed e.g. values in the range [0-100]% | ||
- **executorState**: The “non-verbose” JSON payload from the `/kafkacruisecontrol/state?substates=executor` endpoint. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- **executorState**: The “non-verbose” JSON payload from the `/kafkacruisecontrol/state?substates=executor` endpoint. | |
- **executorState**: The “non-verbose” JSON payload from the` /kafkacruisecontrol/state?substates=executor` endpoint, providing details about the executor's current status, including partition movement progress, concurrency limits, and total data to move. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added
088-rebalance-progress-status.md
Outdated
``` | ||
[1] The estimated time it will take in minutes for the rebalance to complete based on the average rate of data transfer. | ||
|
||
[2] The percentage complete of the ongoing rebalance in the range [0-100]% |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[2] The percentage complete of the ongoing rebalance in the range [0-100]% | |
[2] The percentage complete of the ongoing rebalance in the range [0-100]%. |
|
||
The percentage of the data movement of the partition rebalance that is completed. | ||
|
||
The formulas used to calculate field value per `KafkaRebalance` state: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The formulas used to calculate field value per `KafkaRebalance` state: | |
The formulas used to calculate the field value differ for each applicable `KafkaRebalance` state: |
088-rebalance-progress-status.md
Outdated
progress: | ||
rebalanceProgressConfigMap: my-rebalance-progress | ||
``` | ||
[1] Error message from failed Cruise Control REST API call |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[1] Error message from failed Cruise Control REST API call | |
[1] Error message from failed Cruise Control REST API call. |
conditions: | ||
- lastTransitionTime: "2024-11-05T15:28:23.995129903Z" | ||
status: "True" | ||
type: Warning |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we call out this property to highlight and distinguish between NotReady
and (new?) Warning
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The type of condition in the status could be any one of the KafkaRebalance
states, it could also be Warning
. Why would the NotReady
type be tied/associated to the Warning
type?
088-rebalance-progress-status.md
Outdated
### Accessing progress fields using Kubernetes CLI | ||
|
||
The progress information will be stored in a `ConfigMap` with the same name as the `KafkaRebalance` resource. | ||
Using the name of the ConfigMap we can view its data from the command line using the Kubernetes CLI. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using the name of the ConfigMap we can view its data from the command line using the Kubernetes CLI. | |
Using the name of the `ConfigMap`, we can view its data from the command line using the Kubernetes CLI. |
088-rebalance-progress-status.md
Outdated
|
||
### Rejected Alternatives | ||
|
||
#### Maintaining progress fields in KafkaRebalance resource status |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#### Maintaining progress fields in KafkaRebalance resource status | |
#### Maintaining progress fields in `KafkaRebalance` resource status |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kyguy I had a pass leaving some comments.
I think this proposal is missing the usual "Affected/not affected projects" section.
088-rebalance-progress-status.md
Outdated
@@ -0,0 +1,374 @@ | |||
# Partition Rebalance Progress Status |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be more "Cluster rebalance progress status" (or rebalancing) ... not sure if "partition" (even using the singular) sounds really fine because it's about a cluster rebalancing and about moving one or more partitions across the cluster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to the "Adding progress updates for Cruise Control rebalances"
088-rebalance-progress-status.md
Outdated
Knowing things like how much time an ongoing partition rebalance has left to take and how much data an ongoing partition rebalance has left to transfer helps users understand the cost of an ongoing partition rebalance. | ||
This information helps users decide whether they should continue or cancel an ongoing rebalance, and know when future operations will be able to be safely executed. | ||
|
||
Further, having this information readily available and easily accessible via Kubernetes primitives, allows users and third-party tools like the Kubernetes CLI or StreamsHub Console to easily track the progression of a partition rebalance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How many people in the Strimzi community would know about the "StreamsHub Console"? Maybe it deserves a link to the repo or not mentioning it at all?
088-rebalance-progress-status.md
Outdated
## Proposal | ||
|
||
This proposal extends the status section of the `KafkaRebalance` custom resource to include a `progress` section with a nested `rebalanceProgressConfigMap` field. | ||
This field will reference the `KafkaRebalance`'s existing `ConfigMap`, which will be enhanced to contain information related to an ongoing partition rebalance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"the KafkaRebalance
's existing ConfigMap
" ... maybe we should mention this ConfigMap beforehand to explain what it contains currently and from where it is already referenced (see afterBeforeLoadConfigMap
field).
088-rebalance-progress-status.md
Outdated
|
||
The estimated time it will take in minutes for a rebalance to complete based on the average rate of data transfer. | ||
|
||
The formulas used to calculate field value per `KafkaRebalance` state: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's the CO calculating these.
088-rebalance-progress-status.md
Outdated
**Estimation for intra-broker rebalance:** | ||
|
||
It is challenging to provide an accurate estimate for intra-broker rebalances without an estimate for disk read/write throughput and getting disk throughput is non-trivial for Strimzi. | ||
However, by using the network bandwidth in place of the disk throughput, we can provide a rough estimate of how long the rebalance would take. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"by using the network bandwidth in place of the disk throughput" why do you think that we can do this "replacement" to get a rough estimation? I am not sure about that. I was wondering if we should just avoid this estimation if we don't have a good way for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming the disk throughput is always greater than the network bandwidth, an estimate using the network bandwidth could serve as an upperbound (theoretical maximum) of how long an intra-broker rebalance would take. e.g the rebalance won't take longer than this. However, this contradicts the definition provided by the inter-broker balance and may cause confusion. We could simply set the value to N/A
for now to avoid confusion/inaccuracy.
Would you mind if we left this estimate out for intra-broker
balances @tomncooper?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought that is what we were going to do? Basically we don't include the intra estimate as we can't reliably calculate it. So the time to completion is always a theoretical minimum, it WILL take longer than this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I would leave this estimation out imho.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated proposal to explain this and suggest that we set it to "N/A"
088-rebalance-progress-status.md
Outdated
$$ | ||
|
||
Notes | ||
- [1] The number of megabytes already moved by rebalance, provided by [/kafkacruisecontrol/state?substates=executor](#field-executorstate) REST API endpoint. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we put the specific field we are using from the returned JSON?
088-rebalance-progress-status.md
Outdated
Notes | ||
- [1] The number of megabytes already moved by rebalance, provided by [/kafkacruisecontrol/state?substates=executor](#field-executorstate) REST API endpoint. | ||
- [2] The time when the rebalance task was started, extracted from `triggeredTaskReason` field from the [/kafkacruisecontrol/state?substates=executor](#field-executorstate) for that task. | ||
- [3] The total number of megabytes planned to be moved for rebalance, provided from json payload of the [/kafkacruisecontrol/state?substates=executor](#field-executorstate) REST API endpoint. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we put the specific field we are using from the returned JSON?
088-rebalance-progress-status.md
Outdated
|
||
Notes | ||
- [1] The number of megabytes already moved by rebalance, provided by [/kafkacruisecontrol/state?substates=executor](#field-executorstate) REST API endpoint. | ||
- [2] The total number of megabytes planned to be moved for rebalance, provided by [/kafkacruisecontrol/state?substates=executor](#field-executorstate) REST API endpoint. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we put the specific fields we are using from the returned JSON for both the above?
088-rebalance-progress-status.md
Outdated
|
||
For ease of implementation and minimizing the load on the CruiseControl REST API server, the operator will only query the `/kafkacruisecontrol/state?substates=executor` endpoint and update the `ConfigMap` upon `KafkaRebalance` resource reconciliation. | ||
|
||
In the event that Cruise Control runs into an error when rebalancing, the operator will transition the `KafkaRebalance` resource to the `NotReady` state, remove the `progress` section, and delete the progress `ConfigMap`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"delete the progress ConfigMap
" or "delete the progress section of the ConfigMap
"?
Remember we are using the same ConfigMap
for two purposes (also storing the before/after load data).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this consistent with what currently happens if we fail to get a response from CC in a single reconciliation? Does the CC client retry? Not sure setting the KR CR to NotReady
for what could be a simple network blip is good UX.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
delete the progress ConfigMap" or "delete the progress section of the ConfigMap"?
Remember we are using the same ConfigMap for two purposes (also storing the before/after load data).
I am going to update the behavior described here to retain the progress information and ConfigMap
when the KafkaRebalance
resource moves to the NotReady
state. As discussed in a previous thread with Paul, the progress information in the NotReady
state may just be as useful for debugging as it is in the Stopped
.
Is this consistent with what currently happens if we fail to get a response from CC in a single reconciliation? Does the CC client retry? Not sure setting the KR CR to NotReady for what could be a simple network blip is good UX.
This line was intended to describe how the progress information will be updated when CC server returns "CompletedWithError" status for a task. From what I understood when writing this, this was the only situation where the KR resource was moved to the NotReady
state. But looking closer at the code, it appears I am wrong.
It appears the CO will move the KR resource to the NotReady
state when it fails to get a response from the CC server, it also looks like the CO CC client code does not retry when failing to get a response (unless I am missing some retry logic in the code). This means that if the CO fails to get a response from CC server it will set the KR CR is set to Not Ready
. I thought this only happened when there was an "CompleteWithError" response returned by the CC server, not when there was a failed HTTP request.
The proposal suggests attempting to retrieve the executor status but whether the retrieval succeeds or fails has no affect on the state of the KafkaRebalance
resource.
Of course, the CC client, wherever it is used, should and will be implemented to retry
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed I think you should not estimate time to complete intra-broker movements in the proposal and just exclude them. The estimate will always be a theoretical minimum but it is useful to have a ball park.
088-rebalance-progress-status.md
Outdated
@@ -0,0 +1,374 @@ | |||
# Partition Rebalance Progress Status |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Partition Rebalance Progress Status | |
# Adding progress updates for Cruise Control rebalances |
|
||
At this time, Strimzi users are able to execute partition rebalances via `KafkaRebalance` custom resources but can only monitor the progression of those partition rebalances in two ways: | ||
|
||
- Manually querying the Cruise Control REST API endpoint directly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we lock these down with a Network Policy? Would the user have to alter the default set up to get access?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes there are a couple of things that would need special configuration to enable a user to access the CC REST API directly
088-rebalance-progress-status.md
Outdated
In the `ConfigMap`, we will add the following fields: | ||
- **estimatedTimeToCompletionInMinutes**: The estimated amount time it will take in minutes until partition rebalance is complete. | ||
- **completedDataMovementPercentage**: The percentage of the data movement of the partition rebalance that is completed e.g. values in the range [0-100]% | ||
- **executorState**: The “non-verbose” JSON payload from the `/kafkacruisecontrol/state?substates=executor` endpoint. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I agree you should give a brief summary of what this is and link to the OpenAPI def upstream.
088-rebalance-progress-status.md
Outdated
|
||
For ease of implementation and minimizing the load on the CruiseControl REST API server, the operator will only query the `/kafkacruisecontrol/state?substates=executor` endpoint and update the `ConfigMap` upon `KafkaRebalance` resource reconciliation. | ||
|
||
In the event that Cruise Control runs into an error when rebalancing, the operator will transition the `KafkaRebalance` resource to the `NotReady` state, remove the `progress` section, and delete the progress `ConfigMap`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this consistent with what currently happens if we fail to get a response from CC in a single reconciliation? Does the CC client retry? Not sure setting the KR CR to NotReady
for what could be a simple network blip is good UX.
088-rebalance-progress-status.md
Outdated
progress: [1] | ||
rebalanceProgressConfigMap: my-rebalance [2] | ||
``` | ||
[1] The `progress` section will be visible during the `Ready`, `Rebalancing`, `Stopped` and `Ready` states. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[1] The `progress` section will be visible during the `Ready`, `Rebalancing`, `Stopped` and `Ready` states. | |
[1] The `progress` section will be visible during the `ProposalReady`, `Rebalancing`, `Stopped` and `Ready` states. |
Signed-off-by: Kyle Liberti <[email protected]>
088-rebalance-progress-status.md
Outdated
This estimate will be a theoretical minimum derived from Cruise Control capacity and throttle configurations. | ||
This means that the cluster rebalance would take at least the estimated amount of time to complete. | ||
|
||
$$\text{maxPartitionMovements}_{[1]} = \min(\text{numberOfBrokers} \times \text{num.concurrent.partition.movements.per.broker}),\text{max.num.cluster.partition.movements})$$ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't follow this calculation, is it calculating the maximum number of non-concurrent partition movements? I can't work out why the number of brokers is needed. Isn't the worst case scenario that all movements have to happen from a single broker, so should it be max.num.cluster.partition.movements/num.concurrent.partition.movements.per.broker
? Also it looks like you are missing a bracket somewhere, possible at the beginning of numberOfBrokers x num.concurrent.partition.movements.per.broker
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it calculating the maximum number of non-concurrent partition movements?
The maximum number of concurrent partition movements. Just updated the variable name to make this more clear.
I can't work out why the number of brokers is needed. Isn't the worst case scenario that all movements have to happen from a single broker, so should it be
This calculation is meant to be a theoretical minimum, the best case scenario, the least amount of time a rebalance would take to complete given ideal conditions (if the maximum allowed number of concurrent partitions movements per broker were moved concurrently and the available bandwidth was perfectly utilized). In reality, the rebalance will take longer than the theoretical minimum but it is still useful to know that the rebalance will take at least this estimated amount of time.
In the best case scenario, we are moving as many partitions concurrently as the brokers will allow. To calculate how many partitions can be move concurrently cluster-wide, we need the number of brokers.
Does that make sense? Would it help if I added annotations/descriptions for the CC configurations that are used in the formulas>
Also it looks like you are missing a bracket somewhere, possible at the beginning of numberOfBrokers x num.concurrent.partition.movements.per.broker?
Yes, that is a typo! Thanks for spotting!
088-rebalance-progress-status.md
Outdated
It is challenging to provide an accurate estimate for intra-broker rebalances without an estimate for disk read/write throughput and getting disk throughput is non-trivial for Strimzi. | ||
Since we cannot accurately estimate `estimatedTimeToCompletionInMinutes` without knowing the disk throughput, we set `estimatedTimeToCompletionInMinutes` to `N/A`. | ||
|
||
$$\text{maxPartitionMovements}_{[1]} = \min\left(\text{numberOfBrokers} \times \text{num.concurrent.intra.broker.partition.movements.per.broker}),\text{max.num.cluster.movements}\right)$$ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar here to previous comments, you're missing a bracket in the formula, but I'm also not clear what maxPartitionMovements
is really representing
Signed-off-by: Kyle Liberti <[email protected]>
This proposal introduces a new feature to monitor the progression of an ongoing partition rebalance executed by a Strimzi-managed Cruise Control instance via a
KafkaRebalance
custom resource. Implementation of this proposal should help to address strimzi/strimzi-kafka-operator#10278