Reconsider Design for Registry API Call in Server Ad Registration #1859

Saartank · 2024-12-30T17:30:18Z

Currently, while accepting a server ad, the director makes an API call to registry/checkNamespaceStatus to verify if the server is approved. We need to reconsider this design.

The issues with the current approach are:

Each server ad registration triggers a call to the registry. If bad server ads are submitted repeatedly, it could overload the registry and increase the risk of a DoS attack.
If the registry goes down, the entire system is immediately impacted. We would like to reduce dependency on the registry.

The text was updated successfully, but these errors were encountered:

bbockelm · 2024-12-30T18:31:34Z

Other ideas to jot down:

This can't be "simply solved" by a cache. If a human changes the status of the namespace (for example, approving it), then they will be confused why their change doesn't take effect.
We may want to always try the query but cache the response. Then, if subsequent queries start to fail, we can fall back to the cache and allow the director to continue working for some limited amount of time.
- This would allow the system to continue functioning across a registry restart or short outage.
We should have a deadline for registry response -- if the server ad registration takes too long, the client will give up and time out. It's better to have the director respond that the registry is broken.
If the registry is down and cached information is used, this fact should be sent to the origin and logged in the director. Make sure to note the age of the information.
Consider rate limits -- how much load can we put on the registry? I suspect we need two types of rate limits:
- Limits on the refreshing data about a known namespace (or for an origin whose request has been authorized).
- Limits on unknown namespaces or for anonymous clients. Basically, a malicious, unregistered origin shouldn't be able to cause the director to perform a DoS on the registry.
Whatever the mechanism we come up with should be made into a standalone module inside Pelican and kept somewhat generic. There's other places in Pelican (including the director) where we have the exact same concerns about rate limits and fallback to cached data on failures; design the safeguards once and use them multiple times so there's predictable behavior within the daemon.

Saartank added the enhancement New feature or request label Dec 30, 2024

Saartank added this to the parking-lot milestone Dec 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reconsider Design for Registry API Call in Server Ad Registration #1859

Reconsider Design for Registry API Call in Server Ad Registration #1859

Saartank commented Dec 30, 2024

bbockelm commented Dec 30, 2024

Reconsider Design for Registry API Call in Server Ad Registration #1859

Reconsider Design for Registry API Call in Server Ad Registration #1859

Comments

Saartank commented Dec 30, 2024

bbockelm commented Dec 30, 2024