-
Notifications
You must be signed in to change notification settings - Fork 973
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update network survey script to support V2 survey #4342
Conversation
Resolves stellar#4302 This change updates the network overlay survey script to support the new V2 survey API. To test this, I also updated the simulator to simulate the new API, then I checked that the resulting graph from simulating an old pubnet survey is structurally isomorphic to the input graph. I also checked that the new fields are present in the output graph.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the changes!
endpoint to send requests to. | ||
""" | ||
request_url = url_base + "/surveytopologytimesliced" | ||
logger.info("Requesting survey data from %s peers", len(peer_list)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No more request rate throttling? (previous version of this code limited number of requests per ledger to 10)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the request throttling because stellar-core has its own builtin request throttling (see SurveyManager::topOffRequests
). Specifically, stellar-core sends 10 requests from its queue every 15 seconds.
Even with the old request rate throttling this queue would build up because the script would add 10 nodes every 5 seconds, but stellar-core would only send a batch of 10 every 15 seconds.
That being said, I'm happy to add back in request rate throttling as a extra safety measure if you think it's necessary. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense, I'm just trying to recall why we added this throttling in the first place. I agree though that it's weird to have throttling in both places, and ideally core should properly deal with all the requests. Looks like this PR added throttling because survey on the core side used to expire too quickly, I think this was related to the old -d
flag. This shouldn't be the case with the new-style survey, since our reporting phase is now long by default, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, that makes sense. Digging into it, I think without the throttling in the script there's a weird race condition that would prematurely end the survey on the stellar-core side immediately after processing a large batch of requests. Trickling in requests causes the expiration to be regularly extended and avoids the premature expiration.
The V2 survey removed the duration
parameter and instead uses the survey phase to determine when the survey is finished.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that makes sense. I like that in this version we move the decision to end survey to the script itself after some inactivity rather than try to end in core.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
r+ bcf2e59 |
Description
Resolves #4302
This change updates the network overlay survey script to support the new V2 survey API.
To test this, I also updated the simulator to simulate the new API, then I checked that the resulting graph from simulating an old pubnet survey is structurally isomorphic to the input graph. I also checked that the new fields are present in the output graph.
Checklist
clang-format
v8.0.0 (viamake format
or the Visual Studio extension)