Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve on_connected callback to avoid reconnect storm #292

Merged
merged 9 commits into from
Jun 17, 2019

Conversation

anson627
Copy link
Collaborator

@anson627 anson627 commented Jun 13, 2019

The on_connected callback was originally introduced to handle an issue that zookeeper session got stuck (will not receive zookeeper watch event forever), during a connection lost and reconnect.

But we discovered some side-effect of this fix, there are multiple on_connected callback get triggered for a single disconnect and reconnect (here seems other people are facing the same issue: zk-ruby/zk-eventmachine#2). And since we do reset zookeeper watcher by read everything. This creates substantial load on zookeeper.

Here are two optimizations introduced by this change:

  1. ensure there is only one watcher refresh within a time window (e.g. 60 seconds) no matter how many on_connected callback gets triggered
  2. optimize the watcher refresh by just re-enable (parent and children) watcher, to avoid zk get storm

@austin-zhu @Jason-Jian @allenlsy @Ramyak

Before (reboot zookeeper twice, got 3 on_connected each time):
image

After (reboot zookeeper server once, got watcher refresh once):
Screen Shot 2019-06-13 at 7 39 21 PM

reboot zookeeper server twice, only got watcher refresh twice
Screen Shot 2019-06-13 at 8 48 56 PM

@anson627 anson627 force-pushed the anson-remove-connected branch from 0c929a4 to ee71aab Compare June 14, 2019 02:57
lib/synapse/service_watcher/zookeeper.rb Outdated Show resolved Hide resolved
@anson627 anson627 changed the title Remove on_connected callback to avoid reconnect storm Improve on_connected callback to avoid reconnect storm Jun 14, 2019
@anson627 anson627 force-pushed the anson-remove-connected branch from 0ada8a0 to 84da546 Compare June 14, 2019 18:14
@anson627 anson627 merged commit 10c783d into master Jun 17, 2019
anson627 added a commit that referenced this pull request Jun 27, 2019
anson627 added a commit that referenced this pull request Jun 28, 2019
#295)

* Revert "Improve on_connected callback to avoid reconnect storm (#292)"

This reverts commit 10c783d.

* Bump up version to 0.16.13
@panchr panchr deleted the anson-remove-connected branch March 11, 2020 22:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants