-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
username uniqueness interpretation issue between database and code #664
Comments
This is meant to be enforced at the code level, in that it's meant to try and look up usernames before insertion to see if they exist (that's what that code block supposedly does) but the transaction isolation clearly isn't good enough to actually pull that off. It might be worth changing the unique indexes to enforce this just as a backstop against transaction race conditions, though those will also need to be fixed to actually work fully, or users will get error pages when this happens rather than polluting the database as they do now (which is an improvement, but not a total fix). |
The transaction isolation is good enough to allow this situation, because two concurrent transactions can pass the code block without running into each others' records prior to commit -- and some of the fetch_actor related paths take fairly long (especially before the sync_pins part was isolated). Because the index doesn't enforce case-insensitive uniqueness, both records get commited. Changing the indexing will enforce the uniqueness independent of transaction isolation, so I'd say that would be the correct action. Users shouldn't see error pages, because it will enforce just one record existing, and the iexact search will then find it. If errors occur, they'd be occuring in stator, not the the UI, and a TryAgain should resolve them as one of the commits should remain.. A clean migration is difficult, though. Many tables refer to users_identity, and if data loss was to be avoided, all those references would need to be updated to point to the record which will be left after deleting duplicates and adding the new unique constraints. Or, if we simply accept that this bug will cause (recoverable) data loss for remote identities, cascade-delete all the offending records first, and then change the indices. In any event, while I know how to do this in SQL, I don't know how to do it in Django migrations... I deleted the offending record earlier, but I still appear to have one which wasn't caught before. The second case appears to be a bunch of identity records will NULL usernames and domain_ids - a separate issue, perhaps.
|
Yeah, this is a pretty nasty problem, because if you take the "correct" activitypub view - that everyone is identified purely by their Actor IRI - then it's actually fine to have two things on the same domain with the same username, as those aren't really first-class concepts in AP (and mostly come from WebFinger and the It might be worth re-examining the whole usage of this function - which really should just be to link mentions and power searches - and see if it's acceptable to have multiple results and just take the newest one. |
Right.. So this becomes a question of is Takahe an ActivityPub implementation (thus keying on actor_uri), or a "fediverse" implementation where both WebFinger and ActivityPub are relevant and discovery is based on WebFinger. I'd argue that in the latter case, though the specs are unclear, we should fall back on convention and expect that username local-parts are case insensitive. In any case, I've been running my instance with the additional unique lowercase index on users_identity for the last couple of weeks with no noticeable adverse effects. |
I pinged the fedi wisdom on this, but so far haven't gotten any closer to a clear spec. Apparently it is undefined whether acct scheme usernames should be treated case insensitive or not (but Mastodon is case insensitive), and failing any specs to the contrary, the default for URLs (which "all" of the fedi servers use for actor id IRIs) will have case-sensitive path parts. Lemmy is particularly stupid by not mapping a canonical form of id even through its WebFinger endpoint. |
Yup, welcome to the world of trying to get fediverse agreement on things! At its heart, Takahē is much more of an ActivityPub server than specifically caring about WebFinger, so I would rather keep things centered around a canonical Actor IRI and then if we need username mapping, doing it case-insensitive as WebFinger would on Mastodon. |
Encountered an error, extract from Sentry below:
https://github.com/jointakahe/takahe/blob/main/users/models/identity.py#L412
the object in question: an actor presented in the database twice, once with
Capitalized
username, once withlowercase
. Hard to say why the remote server changed the username presentation, but that's out of our control. The search above uses iexact (case insensitive) search, while theusername
column uses PostgreSQL's default collation, which is case sensitive, so the database didn't enforce the uniqueness as the code expected.I couldn't find a WebFinger spec mention of username case sensitivity, but Mastodon treats them as case insensitive - so the code appears correct, but the database schema is not.
Possible solutions:
The text was updated successfully, but these errors were encountered: