Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Receiving Decode(MissingKey(...)) error from Google after many hours of running #28

Open
Restioson opened this issue Dec 10, 2021 · 12 comments

Comments

@Restioson
Copy link

Restioson commented Dec 10, 2021

Hi, I've been using openid in my project for a while now (code for it is here) - thanks for the effort you've put into it! I have however run into one small issue, which is that after leaving the server running for a long time (4 days to a week of uptime, perhaps), Google begins returning Decode(MissingKey(...)) errors, so nobody can log in. Upon restarting the server, though, it immediately begins working again. Thus, I'm not really sure which part of the stack is at fault here - my code, Google, or the library. My first thought was maybe it could be this library (something to do with reusing/resuming connections perhaps?) If not, I wonder if you know where I could start investigating this problem or where it's likely to be?

@kilork
Copy link
Owner

kilork commented Dec 10, 2021

I think I know this behaviour, I also have not investigated deep. It seems like metadata from provider has lifetime. Most simple solution would be just refresh client once per day for example.

@Restioson
Copy link
Author

This metadata from the provider - is it tied to the session (client) that is being used to make requests?

@kilork
Copy link
Owner

kilork commented Jan 23, 2022

Yes, it is in client. We do not fetch metadata each time, developers are free to write particular logic to handle this. I recently had a case, there I need to fetch some db description for provider in case of particular exception for particular provider.

But definitely I think it can be done better, because large providers like google are expected to work out of the box without future modifications.

@Restioson
Copy link
Author

For reference, your solution worked! I made it refresh the client once every 24 hours, and I haven't heard any users say that the login isn't working, and nor have I experience it myself. It's a little less pretty, but it works alright :)

@andrewbaxter
Copy link

It would be great if the example showed doing this, assuming it's best practice. I usually rely on examples for idiomatic usage including things like this.

@kilork
Copy link
Owner

kilork commented Nov 13, 2022

@andrewbaxter I do not know, it would be idiomatic only for google. Current example in top level readme is already quite complex. We probably should keep it simple. But more nice examples and building blocks means for me more simple usage for users. Maybe there is a need in separate library or feature to this library, which would have higher level building blocks for typical use-cases. This is why this bug is open :)

@andrewbaxter
Copy link

Are you saying that based on the above report or is there some documentation saying this only applies to google? There are many versioned endpoints for instance and configuration parameters (supported algorithms, etc). Okta docs for instance say that the .well-known endpoint output may be slow to update, implying that the values there can and do change over time.

The response has http caching headers. From Okta the duration is roughly 24h from the time of the request. I think it's unwise in the general case to assume the values are correct longer than that, even if they don't change often in practice.

If there are plans for the client to automatically update that would be great, but otherwise I do feel the example sets a dangerous precedent - the risk is a fairly critical outage, and the error is hard to diagnose without seeing this thread.

@brianmay
Copy link

I am seeing similar errors from Dex also.

@brianmay
Copy link

Would appreciate any help in solving this on Axum. I have no idea how to refresh an Extension value on a regular time based interval.

@brianmay
Copy link

OK, got a solution. Axum does not allow for mutable state values, so I used ArcSwap to create a value with interior mutability. https://github.com/brianmay/robotica-rust/blob/1a2a3e380a6cc419f3e335574729358f6f824f4f/robotica-backend/src/services/http/mod.rs#L122-L143, and I regularly replace the value once an hour.

@teohhanhui
Copy link

There needs to be a mechanism to refetch jwks_uri, periodically and/or on encountering an unrecognized kid (after verifying iss) (but this could still be exploited for DoS, right?)

For example, this is from the documentation of AWS Cognito:

Note
Amazon Cognito might rotate signing keys in your user pool. As a best practice, cache public keys in your app, using the kid as a cache key, and refresh the cache periodically. Compare the kid in the tokens that your app receives to your cache.

If you receive a token with the correct issuer but a different kid, Amazon Cognito might have rotated the signing key. Refresh the cache from your user pool jwks_uri endpoint.

https://docs.aws.amazon.com/cognito/latest/developerguide/amazon-cognito-user-pools-using-tokens-verifying-a-jwt.html

@brianmay
Copy link

Yes, keys do need to be periodically refreshed. Refresh on unrecognized kid might also be sufficient too.

I previously said once an hour before, but that was too long, and I still sometimes get this error. Once every 10 minutes seems to be OK.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants