You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, while accepting a server ad, the director makes an API call to registry/checkNamespaceStatus to verify if the server is approved. We need to reconsider this design.
The issues with the current approach are:
Each server ad registration triggers a call to the registry. If bad server ads are submitted repeatedly, it could overload the registry and increase the risk of a DoS attack.
If the registry goes down, the entire system is immediately impacted. We would like to reduce dependency on the registry.
The text was updated successfully, but these errors were encountered:
This can't be "simply solved" by a cache. If a human changes the status of the namespace (for example, approving it), then they will be confused why their change doesn't take effect.
We may want to always try the query but cache the response. Then, if subsequent queries start to fail, we can fall back to the cache and allow the director to continue working for some limited amount of time.
This would allow the system to continue functioning across a registry restart or short outage.
We should have a deadline for registry response -- if the server ad registration takes too long, the client will give up and time out. It's better to have the director respond that the registry is broken.
If the registry is down and cached information is used, this fact should be sent to the origin and logged in the director. Make sure to note the age of the information.
Consider rate limits -- how much load can we put on the registry? I suspect we need two types of rate limits:
Limits on the refreshing data about a known namespace (or for an origin whose request has been authorized).
Limits on unknown namespaces or for anonymous clients. Basically, a malicious, unregistered origin shouldn't be able to cause the director to perform a DoS on the registry.
Whatever the mechanism we come up with should be made into a standalone module inside Pelican and kept somewhat generic. There's other places in Pelican (including the director) where we have the exact same concerns about rate limits and fallback to cached data on failures; design the safeguards once and use them multiple times so there's predictable behavior within the daemon.
Currently, while accepting a server ad, the director makes an API call to
registry/checkNamespaceStatus
to verify if the server is approved. We need to reconsider this design.The issues with the current approach are:
Each server ad registration triggers a call to the registry. If bad server ads are submitted repeatedly, it could overload the registry and increase the risk of a DoS attack.
If the registry goes down, the entire system is immediately impacted. We would like to reduce dependency on the registry.
The text was updated successfully, but these errors were encountered: