-
Notifications
You must be signed in to change notification settings - Fork 275
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gossipsub: feature-request: Optional reason code for why a peer was pruned #555
Comments
I dont think this is particulalry useful, most likely you got pruned because of a negative score. |
Correct me if I'm wrong but I believe you can also get pruned if an existing peer has too many peers (above the high watermark) and prunes a bunch of peers as a result Being able to tell if several peers were pruning you due to a negative score, as a result of some activity would become very useful in debugging the sort of issues like #10906 where your node is not receiving blocks and losing sync as a result |
The metric you care about is number of prunes and you already have that.
Your thinking that all peers are pruning you because of oversubscription is
probably going into the realm of the extremely unlikely.
If you see a massive prune spike in your metrics, you should be thinking
score.
Having the extra bit wont add much.
In short I dont think having this field will help you in your problem, and
it is an information leak of sorts I am very reluctant to add.
…On Thu, Jun 29, 2023, 9:31 PM Shrenuj Bansal ***@***.***> wrote:
Correct me if I'm wrong but I believe you can also get pruned if an
existing peer has too many peers (above the high watermark) and prunes a
bunch of peers as a result
Being able to tell if several peers were pruning you due to a negative
score, as a result of some activity would become very useful in debugging
the sort of issues like #10906 where your node is not receiving blocks and
losing sync as a result
If we're able to see this number tick up in a grafana dashboard, it
immediately gives us more clues as to what is going on, rather than
figuring this out via lots of speculation, additional logging and
experiments
—
Reply to this email directly, view it on GitHub
<#555 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAI4SVA7PYNYIZ3UIHJ7QTXNXCWNANCNFSM6AAAAAAZX7XP7Y>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
I should add that even if you are pruned brcause of oversubscription, the
score is factored in; so you are getting pruned because you have a lower
score than you peers. So you see, there is no clear diatiction between the
two and you cant even define the difference without going deep into
internal state, at which point you start to leak.
…On Thu, Jun 29, 2023, 9:49 PM Dimitris Vyzovitis ***@***.***> wrote:
The metric you care about is number of prunes and you already have that.
Your thinking that all peers are pruning you because of oversubscription is
probably going into the realm of the extremely unlikely.
If you see a massive prune spike in your metrics, you should be thinking
score.
Having the extra bit wont add much.
In short I dont think having this field will help you in your problem, and
it is an information leak of sorts I am very reluctant to add.
On Thu, Jun 29, 2023, 9:31 PM Shrenuj Bansal ***@***.***>
wrote:
> Correct me if I'm wrong but I believe you can also get pruned if an
> existing peer has too many peers (above the high watermark) and prunes a
> bunch of peers as a result
>
> Being able to tell if several peers were pruning you due to a negative
> score, as a result of some activity would become very useful in debugging
> the sort of issues like #10906 where your node is not receiving blocks and
> losing sync as a result
> If we're able to see this number tick up in a grafana dashboard, it
> immediately gives us more clues as to what is going on, rather than
> figuring this out via lots of speculation, additional logging and
> experiments
>
> —
> Reply to this email directly, view it on GitHub
> <#555 (comment)>, or
> unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AAAI4SVA7PYNYIZ3UIHJ7QTXNXCWNANCNFSM6AAAAAAZX7XP7Y>
> .
> You are receiving this because you commented.Message ID:
> ***@***.***>
>
|
One thing I wanna confirm is when you say "number of prunes", do you mean the number of prunes by the current node or number of prunes of the current node by others? We want to try to see the latter if possible @MarcoPolo @vyzo mentions that we already have the number of prunes metric. Is this something also immediately visible on grafana or can be made visible easily? |
It is definitely possible, although lotus might not have the right metric
atm.
…On Thu, Jun 29, 2023, 10:44 PM Shrenuj Bansal ***@***.***> wrote:
The metric you care about is number of prunes and you already have that.
Your thinking that all peers are pruning you because of oversubscription is
probably going into the realm of the extremely unlikely. If you see a
massive prune spike in your metrics, you should be thinking score.
One thing I wanna confirm is when you say "number of prunes", do you mean
the number of prunes by the current node or number of prunes of the current
node by others? We want to try to see the latter if possible
@MarcoPolo <https://github.com/MarcoPolo> @vyzo <https://github.com/vyzo>
mentions that we already have the number of prunes metric. Is this
something also immediately visible on grafana or can be made visible easily?
—
Reply to this email directly, view it on GitHub
<#555 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAI4SVWRFUEJJHACMUOYO3XNXLK5ANCNFSM6AAAAAAZX7XP7Y>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
@MarcoPolo do you have any idea? |
It would be here: https://github.com/filecoin-project/lotus/blob/master/node/modules/lp2p/pubsub.go#L562 like how there is stats.Record calls. I don't think this is implemented. |
It would be very helpful for debugging and health monitoring of the network to know why a peer pruned us. Even if they only tell we were pruned because our score became negative, that would be helpful.
I don't think there's a security issue here since a node can essentially infer it is misbehaving if many peers prune at once. This just makes that explicit.
This came up debugging filecoin-project/lotus#10906, and @shrenujbansal suggested this. It would be useful to know that a peer gave us a negative score because it would hint that we did something wrong.
The text was updated successfully, but these errors were encountered: