Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reconsider Design for Registry API Call in Server Ad Registration #1859

Open
Saartank opened this issue Dec 30, 2024 · 1 comment
Open

Reconsider Design for Registry API Call in Server Ad Registration #1859

Saartank opened this issue Dec 30, 2024 · 1 comment
Labels
enhancement New feature or request
Milestone

Comments

@Saartank
Copy link
Collaborator

Currently, while accepting a server ad, the director makes an API call to registry/checkNamespaceStatus to verify if the server is approved. We need to reconsider this design.

The issues with the current approach are:

  1. Each server ad registration triggers a call to the registry. If bad server ads are submitted repeatedly, it could overload the registry and increase the risk of a DoS attack.

  2. If the registry goes down, the entire system is immediately impacted. We would like to reduce dependency on the registry.

@Saartank Saartank added the enhancement New feature or request label Dec 30, 2024
@Saartank Saartank added this to the parking-lot milestone Dec 30, 2024
@bbockelm
Copy link
Collaborator

Other ideas to jot down:

  • This can't be "simply solved" by a cache. If a human changes the status of the namespace (for example, approving it), then they will be confused why their change doesn't take effect.
  • We may want to always try the query but cache the response. Then, if subsequent queries start to fail, we can fall back to the cache and allow the director to continue working for some limited amount of time.
    • This would allow the system to continue functioning across a registry restart or short outage.
  • We should have a deadline for registry response -- if the server ad registration takes too long, the client will give up and time out. It's better to have the director respond that the registry is broken.
  • If the registry is down and cached information is used, this fact should be sent to the origin and logged in the director. Make sure to note the age of the information.
  • Consider rate limits -- how much load can we put on the registry? I suspect we need two types of rate limits:
    • Limits on the refreshing data about a known namespace (or for an origin whose request has been authorized).
    • Limits on unknown namespaces or for anonymous clients. Basically, a malicious, unregistered origin shouldn't be able to cause the director to perform a DoS on the registry.
  • Whatever the mechanism we come up with should be made into a standalone module inside Pelican and kept somewhat generic. There's other places in Pelican (including the director) where we have the exact same concerns about rate limits and fallback to cached data on failures; design the safeguards once and use them multiple times so there's predictable behavior within the daemon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants