You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on May 3, 2022. It is now read-only.
Today our peer selection algorithm only takes account of freshness (identified, ready, fresh, unconnected) and load (random weighted sum of # connections and # requests), which works well in the happy case - all peers are alive.
But whenever a remote peer dies (e.g server tchannel drain or close), the corresponding peer of client tchannel has zero connections and requests, and has been already identified, and therefore have higher score and more likely be selected for new request or request retry.
Ideally the dead peer should be removed from tchannel.
If a peer has no connection, it would not be considered identified. It should have lower score compared to identified peers.
On the other hand, liveness also means connection qualities, e.g., error frame count. In general, it is easy to track error frames. However, the recovering part is a little harder, i.e., when everything goes on well, how to reset the error frame count to zero.
You are right, looking at the code, if there is no connections, getTier returns UNCONNECTED, the score is [0.1, 0.4) range, so it won't be selected if there are still identified peers.
Yeah, I would image some thing like moving window to keep tracking of # of error frames, it fades away along with time. Like what you did in rate limiter.
Today our peer selection algorithm only takes account of freshness (identified, ready, fresh, unconnected) and load (random weighted sum of # connections and # requests), which works well in the happy case - all peers are alive.
But whenever a remote peer dies (e.g server tchannel drain or close), the corresponding peer of client tchannel has zero connections and requests, and has been already identified, and therefore have higher score and more likely be selected for new request or request retry.
Ideally the dead peer should be removed from tchannel.
@Raynos @ShanniLi @jcorbin
The text was updated successfully, but these errors were encountered: