-
Notifications
You must be signed in to change notification settings - Fork 228
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add auto deregistration of offline participants after timeout #2932
base: master
Are you sure you want to change the base?
Add auto deregistration of offline participants after timeout #2932
Conversation
_participantDeregistrationTimer | ||
= new Timer("GenericHelixController_" + _clusterName + "_participantDeregistration_Timer", true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since it was a same pipeline, I do suggest to leverage the same timer. This operation also counts as the ondemand rebalancing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @junkaixue , thanks for the review! Just want to clarify your feedback:
(1) Move the ParticipantDeregistrationStage
to the end of the rebalance pipeline - currently it belongs to a new, separate pipeline = ParticipantDeregistrationPipeline
(2) Leverage scheduleOnDemandRebalance
as participant deregistration now occurs as part of onDemand rebalance
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
|
||
Timer _participantDeregistrationTimer = null; | ||
AtomicReference<ParticipantDeregistrationTask> _nextParticipantDeregistrationTask = new AtomicReference<>(); | ||
class ParticipantDeregistrationTask extends TimerTask { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can still leverage the original task with a specified time. It does not have to be compute logic in the task. It can be part of the pipeline stage. Also, you can use the an util "Schedule Pipeline" something. It is in the code base somewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a question in your above comment. Agree that we do not need a new task + timer if we move participant deregistration to be part of the rebalancer.
My view is that increasing the frequency of onDemand rebalances will add more work to the cluster rather than having a separate pipeline for participant deregistration. There may also be a lot of on demand rebalances that are called just for deregistering a participant, which will then trigger its own rebalance.
Are the main benefits of moving deregistration stage to the rebalance pipeline reducing code and keeping ZK writes grouped to end of rebalancer? Thank you for the advice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unify the zk write stage and minimize the write is the major consideration.
But the more important is to reduce the heaviness of the code. For this simple pipeline trigger and we create a "Task" class is too much.
7d1c222
to
ea94999
Compare
|
||
public boolean isParticipantDeregistrationEnabled() { | ||
return _record.getBooleanField(ClusterConfigProperty.PARTICIPANT_DEREGISTRATION_ENABLED.name(), | ||
false); | ||
} | ||
|
||
public void setParticipantDeregistrationEnabled(boolean enabled) { | ||
_record.setBooleanField(ClusterConfigProperty.PARTICIPANT_DEREGISTRATION_ENABLED.name(), enabled); | ||
} | ||
|
||
public long getParticipantDeregistrationTimeout() { | ||
return _record.getLongField(ClusterConfigProperty.PARTICIPANT_DEREGISTRATION_TIMEOUT.name(), | ||
-1); | ||
} | ||
|
||
public void setParticipantDeregistrationTimeout(long timeout) { | ||
_record.setLongField(ClusterConfigProperty.PARTICIPANT_DEREGISTRATION_TIMEOUT.name(), timeout); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's simplify this? If the timeout is not setup or -1, the feature is not enabled? Other cluster config is resource level, so some of resources may not have the configurations. But this is instance level. I dont think we need a switcher to control the feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adjusted to only rely on the timeout value. If timeout is -1 or lower, then it is turned off. If there is no timeout set, the default value is -1 which is interpreted as disabled
Long offlineTime = entry.getValue(); | ||
long deregisterTime = offlineTime + deregisterDelay; | ||
|
||
// Skip if instance is still online |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should rely on LiveInstances instead of history.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adjusted this check to be:
if (cache.getLiveInstances().containsKey(instanceName)) {
if (deregisterTime < earliestDeregisterTime) { | ||
earliestDeregisterTime = deregisterTime; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How this can happen? The else clause condition is deregisterTime > stageStartTime. It will never be triggered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adjusted this to remove the if and clarified variable name
the if branch of (deregisterTime <= stageStartTime) should catch all nodes that need to be deregistered during this run
the else will include all nodes that need to be deregistered at some point in the future. We then look for the node that is next in line to be deregistered and use that time to schedule the next deregistration
now logic is:
// If deregister time is in the past, deregister the instance
if (deregisterTime <= stageStartTime) {
participantsToDeregister.add(instanceName);
} else {
// Otherwise, find the next earliest deregister time
nextDeregisterTime = Math.min(nextDeregisterTime, deregisterTime);
}
if (earliestDeregisterTime != Long.MAX_VALUE) { | ||
long delay = earliestDeregisterTime - stageStartTime; | ||
scheduleOnDemandPipeline(manager.getClusterName(), delay); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not accurate... If this stage is very slow, it already passed the time for next deregistration. Use the next closest deregister instance time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, thank you for calling this out.
Updated calculation of delay
long delay = nextDeregisterTime - System.currentTimeMillis();
Do you think there is an argument for the check of whether a participant should be deregistered
if (deregisterTime <= stageStartTime) {
should also utilize currentTime rather than stageStartTime
ea94999
to
8e2e7ae
Compare
Hi @junkaixue, sorry for the huge delay on responding to feedback. Had 2x oncall rotations, then vacation and got sick. Please take a look at this when you can. Ideally, we would like this to get incorporated into the upcoming release if possible |
8e2e7ae
to
45190e9
Compare
Issues
New feature for allowing controller to purge participants that have been offline for greater than user defined timeout.
Description
Participants can automatically join a Helix cluster when they startup. However when they permanently go down, they must be manually removed or purged by an external workflow in order to actually leave the cluster. These stale participants can have significant negative impact on the clusters in at least 2 ways:
MAX_OFFLINE_INSTANCES_ALLOWED
- If this cluster level config is exceeded, then the cluster will be put into maintenance mode.CRUSHED Calculations
- CRUSHED only guarantees evenness when all nodes in a cluster are online. The more offline nodes in the cluster, the larger the max degree of unevenness that is possible.This causes Helix's view of the cluster's health and the actual health of the cluster to diverge.
This PR introduces a
ParticipantDeregistrationPipeline
which contains theParticipantDeregistrationStage
. When the stage is executed, the controller will look at all offline nodes to determine if any have exceeded the user setPARTICIPANT_DEREGISTRATION_TIMEOUT
in the clusterConfig. If so, the controller will remove the participant from the cluster. If there are offline instances that have not exceeded the timeout, the controller will schedule the deregistration pipeline to run again when the next longest-offline node would exceed the deregistration timeout.There are still areas that need to be discussed:
ParticipantDeregistrationStage
could occur at the end of the rebalancer pipeline, but rescheduling deregistrations would have to trigger an entire rebalance pipeline as well. This would then immediately trigger another rebalance pipeline if any instances are dropped, but would keep controller writes to the ZK ensemble logically grouped to one part of the pipeline.Code Changes:
Added
ParticipantDeregistrationPipeline
andParticipantDeregistrationStage
to handle participant deregistration logic.PARTICIPANT_DEREGISTRATION_ENABLED
andPARTICIPANT_DEREGISTRATION_TIMEOUT
properties toClusterConfig
.ParticipantDeregistrationTask
to trigger a participantDeregistration pipeline at specified time, similar to what is done for scheduling a rebalanceClusterEventType
ofParticipantDeregistration
for scheduled trigger of deregistration pipelineUpdated
ZkTestBase
with addParticipant and dropParticipant methods to be leveraged across different test classes.TestAddResourceWhenRequireDelayedRebalanceOverwrite.java
andTestForceKillInstance.java
to leverage these changes.Tests
The following tests are written for this issue:
TestParticipantDeregistrationStage.java
The following is the result of the "mvn test" command on the appropriate module:
Changes that Break Backward Compatibility (Optional)
N/A. Feature is optional and is default off
Commits
Code Quality
(helix-style-intellij.xml if IntelliJ IDE is used)