Skip to content
This repository has been archived by the owner on May 3, 2022. It is now read-only.

Peer selection should consider liveness #1342

Open
anson627 opened this issue Sep 17, 2015 · 6 comments
Open

Peer selection should consider liveness #1342

anson627 opened this issue Sep 17, 2015 · 6 comments

Comments

@anson627
Copy link
Contributor

Today our peer selection algorithm only takes account of freshness (identified, ready, fresh, unconnected) and load (random weighted sum of # connections and # requests), which works well in the happy case - all peers are alive.

But whenever a remote peer dies (e.g server tchannel drain or close), the corresponding peer of client tchannel has zero connections and requests, and has been already identified, and therefore have higher score and more likely be selected for new request or request retry.

Ideally the dead peer should be removed from tchannel.

@Raynos @ShanniLi @jcorbin

@Raynos
Copy link
Contributor

Raynos commented Sep 18, 2015

We have talked about penalizing a Peer for error frames by incrementing its pending count.

@Raynos
Copy link
Contributor

Raynos commented Sep 18, 2015

cc @kriskowal

@ShanniLi
Copy link
Contributor

If a peer has no connection, it would not be considered identified. It should have lower score compared to identified peers.

On the other hand, liveness also means connection qualities, e.g., error frame count. In general, it is easy to track error frames. However, the recovering part is a little harder, i.e., when everything goes on well, how to reset the error frame count to zero.

@anson627
Copy link
Contributor Author

You are right, looking at the code, if there is no connections, getTier returns UNCONNECTED, the score is [0.1, 0.4) range, so it won't be selected if there are still identified peers.

Yeah, I would image some thing like moving window to keep tracking of # of error frames, it fades away along with time. Like what you did in rate limiter.

@Torwegia
Copy link

This makes iterating on a new project really rough.

@Raynos
Copy link
Contributor

Raynos commented Sep 24, 2015

@Torwegia please explain more.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants