Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Errors raised by 410 and 404 incorrectly switched #71

Closed
mohamedhafez opened this issue Apr 9, 2019 · 12 comments
Closed

Errors raised by 410 and 404 incorrectly switched #71

mohamedhafez opened this issue Apr 9, 2019 · 12 comments

Comments

@mohamedhafez
Copy link
Contributor

mohamedhafez commented Apr 9, 2019

It looks like a previous commit by @collimarco switched the errors we raise on 410 and 404 in https://github.com/zaru/webpush/blob/master/lib/webpush/request.rb#L29-L32

However according to https://web.dev/articles/push-notifications-web-push-protocol, it was correct initially, with 410 meaning "Subscription No Longer Valid" and 404 meaning "Subscription Expired". RFC 8030 also explicitly says 404 means "Subscription Expired" (410 is more complicated, meaning that the push service couldn't reach the user agent, is done retrying, and won't send anything for this subscription any more)

@mohamedhafez mohamedhafez changed the title meanings of 410 and 404 erroneously switched Errors raised by 410 and 404 incorrectly switched Apr 9, 2019
@collimarco
Copy link
Contributor

Yes, I made that change on purpose, it is not a mistake. Here's the motivations:

  1. At Pushpad (which started in 2015) we have always removed the subscriptions (expired) when they returned 410. The code 404 is very rare (only a few hundreds over the years) and usually is caused by invalid subscriptions (i.e. corrupted subscriptions that have never been valid). This is a practical motivation: i.e. expired subscriptions return 410 and not 404.
  2. Mozilla is very clear and says that 404 is for invalid subscriptions, while 410 is for expired subscriptions: http://autopush.readthedocs.io/en/latest/http.html#error-codes
  3. It also makes sense, based on the HTTP status code definition, to return 410 Gone for expired subscriptions, since the endpoint used to be valid and now is no longer available.

I agree with you that the standard talks about 404 for expired subscriptions... however it seems that currently nobody is doing that.

Note: Recently, in the last weeks, we have noticed a slight increase of 404 compared to the past years. I don't know if something is changing.

@collimarco
Copy link
Contributor

I have opened this related issue for FCM:
https://bugs.chromium.org/p/chromium/issues/detail?id=951131

@mohamedhafez
Copy link
Contributor Author

mohamedhafez commented Apr 9, 2019

Reading that Mozilla link you gave, even that says:

410 - Endpoint Not Valid - The URL specified is no longer valid and should no longer be used. A User has become permanently unavailable at this URL.

errno 103 - Expired URL endpoint
errno 105 - Endpoint became unavailable during request
errno 106 - Invalid subscription

Where it looks like the primary meaning of 410 is "Endpoint Not Valid", and it just lists expired as one of many possible reasons for that. (Granted that that doc says that 404 also means invalid, but that should change to meet RFC 8030 eventually I assume)

My understanding was that an expired subscription has (or at least will have in the near future) a really specific meaning: the push service refreshed the endpoint for a new one, and this old one is now expired, due to no fault or inactivity of either ourselves or the user: its just rotating for extra safety. This isn't happening yet, hence the really low volume of 404's (not sure why we get any at all be we do too). Eventually though, push services will start refreshing the endpoints, and it'll be up to the service worker to give our application server the new endpoint. If we get an "Expired Subscription" 404 message, it means that service worker hasn't given the application server the new endpoint yet (either due to a bug on our end, or maybe its just a bit of a race condition that'll be resolved soon, or maybe in rare cases the endpoint refreshed and the user just stopped visiting the site immediately after that happened). The browsers have been holding off on expiring subscription & rotating subscriptions at all supposedly until the Background Sync API becomes available to make it easier on us to robustly get the new endpoint from the service worker to our application server.

Currently, you'd handle and "expired" subscription and an "invalid" subscription the same way: remove them and never send to them again. But once this new definition of expired subscription comes in to play, you could choose to handle it differently, i.e. waiting for the new one to arrive to update it instead of immediately unsubscribing the user, just in case the service worker is late in relaying the new endpoint for some reason

PS I've also seen an increase in 404's specifically just for Chrome in the last week or two - we shouldn't be getting them at all yet like I said, so who knows whats going on with that:/

@collimarco
Copy link
Contributor

I am pretty sure that FCM changed something in the last weeks: the number of 404s keeps growing, while there are much less 410s. Probably they have switched the code returned... however let's wait for an official confirmation.

For Pushpad at the moment we have decided to delete both the 404s and 410s (using a soft delete as a precaution).

@mohamedhafez
Copy link
Contributor Author

On a side note, @collimarco with the new code do you get many Webpush::Unauthorized errors for Chrome users?

(I remember getting resp.is_a?(Net::HTTPBadRequest) && resp.message == "UnauthorizedRegistration" frequently a couple years ago, but reading the docs I think your new code is correct in this regard, and that this should now only happen if we've implemented something incorrectly on our app server)

@collimarco
Copy link
Contributor

@mohamedhafez Yes, Webpush::Unauthorized should be absent or rare in normal situations. It mainly indicates the usage of the wrong VAPID key for an endpoint, or corrupted subscriptions.

@mohamedhafez
Copy link
Contributor Author

Regarding the increase in 404s, I think I know what is happening. In the most recent version of Chrome, it looks like if a user grants notification permission, and we successfully set up a web push subscription, but then later the user revokes notification permission and chooses to block notifications for our site, then the app server will get a 404 when trying to send to that user.

Firefox sends a 410 in that case, and I believe earlier versions of Chrome did as well. They definitely didn't send 404s in this case, that's for certain.

@collimarco
Copy link
Contributor

the app server will get a 404 when trying to send to that user

Are you sure? Ok, so this has changed recently... I'm 100% sure that Chrome used to return 410 in that case.

Firefox sends a 410 in that case, and I believe earlier versions of Chrome did as well. They definitely didn't send 404s in this case, that's for certain.

Yes, exactly.

@mohamedhafez
Copy link
Contributor Author

@collimarco yup I'm sure, just tried it out right now with the latest version of Chrome on my Mac. Maybe give it a try on your end as well to verify?

@mohamedhafez
Copy link
Contributor Author

@collimarco The 404 issue has been fixed and will be rolled out within the next 2 weeks, and Chrome will once again give a 410 in the case I described above: https://bugs.chromium.org/p/chromium/issues/detail?id=951641&q=404%20app%20server&colspec=ID%20Pri%20M%20Stars%20ReleaseBlock%20Component%20Status%20Owner%20Summary%20OS%20Modified

@collimarco
Copy link
Contributor

@mohamedhafez Perfect. So we can close this issue? Personally I would keep the current naming convention: i.e. call Expired the endpoints that used to be valid, and now are Gone. It makes sense. While Invalid is for subscriptions that don't exist or are invalid (i.e. 404).
If we change the naming convention, we would get unexpected results: for example a corrupted subscription with an invalid token would be called Expired, which doesn't make sense.

@mohamedhafez
Copy link
Contributor Author

I'm mostly just concerned about being able to distinguish between a subscription being deleted / no longer valid because of some action of the user, and expirations where the user still wants to be subscribed and I should be trying to recover from somehow. Seeing as push services aren't yet expiring and rotating endpoints and returning 404 for the old, expired endpoint, I'll close this and we can take it back up when/if the push notification spec is fully implemented by the push services.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants