-
Notifications
You must be signed in to change notification settings - Fork 126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SSO auth renewal is blocking #232
Comments
overall sounds a good plan. I'ts probably better to still block the request if the access token is completely expired. it doesn't hurt to use a timer on each request. if it times out, it should fail the current request but not impact the renewal thread. |
Access tokens live for 5 min. And it takes a few milliseconds to get it. So we are "paying" a few milliseconds every 5 minutes. I think that's a very small price to pay for the sake of thread safety. |
Good point. However, what if SSO is intermittently down, and we need retries? That takes longer (backoff), and IIUC all this time no API requests will be made, despite auth being good. IOW, we're not as "decoupled" from SSO as we hoped. However, I'm fine with shelving this indefinitely, until actual need is proven 👍. |
an easier change idea: hmm, actually that may already be happening! Lines 152 to 168 in b9d2a1d
This should make backoff.Retry happy and not retry.
So blocking-for-retries scenario won't kick in until tokens are actually expired (at which point we must block anyway)?@tzvatot is this last analysis correct? |
no, I don't think this happens now. when |
token requests latency data from prod: The data is too coarse to take the quantile seriously, but what we know is these requests almost never complete ≤100ms 😞 and almost always complete ≤1sec. |
tokensContext()
grabs a mutex:ocm-sdk-go/token.go
Lines 75 to 79 in 323cd42
IIUC this means that while a refresh/re-auth request to SSO is in progress, all concurrent API requests will be blocked.
This is sub-optimal — usually we renew some time before previous token expires, and could be used meanwhile without delaying requests.
Even the original request that triggered the renewal could proceed without blocking!
A second question is what happens with retries and concurrent requests. The mutex is only held for a single try, but if a retry is needed it means we still don't have valid auth, so the 2nd request will immediately take the mutex; the first retry loop doesn't know that happened, so we can get several retry loops running in parallel. 🔀
Once one of them succeeds, the following
tokensContext()
tries will all see valid auth and succeed immediately, and all retry loops will complete.I propose we move renewal off the requesting goroutine into a separate dedicated goroutine. One per
Connection
object. This will simplify management of retries.We need to decide whether renewal should wait for a request that needs auth (as done now, meaning zero traffic when unused), or be timer-based (minimizes latency).
@igoihman @nimrodshn @tzvatot @vkareh @zgalor @petli-openshift WDYT?
The text was updated successfully, but these errors were encountered: