Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conflicts with JSON-LD specification regarding object identifiers for anonymous objects #476

Open
zotanmew opened this issue Nov 1, 2024 · 23 comments
Labels
Needs Group Input/Decision Needs Primer Page Needs a page in the ActivityPub primer Next version Normative change, requires new version of spec

Comments

@zotanmew
Copy link

zotanmew commented Nov 1, 2024

The AP spec states the following:

3.1 Object Identifiers
All Objects in [ActivityStreams] should have unique global identifiers. ActivityPub extends this requirement; all objects distributed by the ActivityPub protocol MUST have unique global identifiers, unless they are intentionally transient (short lived activities that are not intended to be able to be looked up, such as some kinds of chat messages or game notifications). These identifiers must fall into one of the following groups:

  1. Publicly dereferencable URIs, such as HTTPS URIs, with their authority belonging to that of their originating server. (Publicly facing content SHOULD use HTTPS URIs).
  2. An ID explicitly specified as the JSON null object, which implies an anonymous object (a part of its parent context)

The JSON-LD spec (version 1.0) states the following:

7.4.3) If expanded property is @id and value is not a string, an invalid @id value error has been detected and processing is aborted. Otherwise, set expanded value to the result of using the IRI Expansion algorithm, passing active context, value, and true for document relative.

The JSON-LD spec (version 1.1) states the same thing:

13.4.3) If expanded property is @id:
13.4.3.1) If value is not a string, an invalid @id value error has been detected and processing is aborted. When the frameExpansion flag is set, value MAY be an empty map, or an array of one or more strings.

Since these are in conflict, it is not possible to comply with both the JSON-LD specification and the ActivityPub specification simultaneously.

This was noticed as AP implementer Akkoma has recently started federating anonymous objects in accordance with the AP specification (explicit nulls), which has broken federation with implementations performing JSON-LD expansion (for example, Iceshrimp.NET).

Some solutions were proposed in this Akkoma PR thread.

@trwnh
Copy link

trwnh commented Nov 2, 2024

pretty sure this is an error in the text of the AP spec. the bit about "ID explicitly specified as the JSON null object" is clearly an error and should be reworded or removed. the correct behavior is to omit the id entirely, which triggers the "anonymous object" or "blank node" behavior that one would expect.

the section 3.1 text should read something like:

All Objects in [ActivityStreams] should have unique global identifiers. ActivityPub extends this requirement; all objects distributed by the ActivityPub protocol MUST have unique global identifiers, unless they are intentionally transient (short lived activities that are not intended to be able to be looked up, such as some kinds of chat messages or game notifications) or otherwise anonymous objects (an embedded node that is part of its parent context). These unique global identifiers SHOULD be HTTPS URIs for publicly facing content that is intended to be publicly dereferenceable. These identifiers must fall into one of the following groups: [...]

side note: @id should always be an IRI or expand to an IRI against the @base. the use of a string as @id (with no corresponding @base) might be fine on the JSON side, but it will cause all triples with that object as the subject to be removed from the output if converted to RDF (since at best it is interpreted as a "relative URI reference", which is not allowed for subjects)

other side note: this came up in #396 as well regarding "partial updates", except for properties instead of ids.

@TheOneric
Copy link

the correct behavior is to omit the id entirely, which triggers the "anonymous object" or "blank node" behavior that one would expect.

This removes the ability to distinguish transient from anonymous objects unless they occur on the top-level (cannot be anonymous). I’m fine with this and in fact felt like transient objects only really make sense on the top-level anyway, but to make sure: is there any reason why this distinction should be preserved considering the current spec revision makes an explicit effort for it?

@trwnh
Copy link

trwnh commented Nov 2, 2024

distinguish transient from anonymous objects

“transient” and “anonymous” are different aspects of the same functionality. a “transient activity” is also an “anonymous object”, because activities are objects, and because the thing that makes the activity transient is that it’s anonymous.

example of a transient activity:

{
  “actor”: “https://someone.example”,
  “type”: “InGameNotification”,
  “content”: “The payload is nearing the checkpoint!”
}

example of embedded anonymous objects for attributedTo and attachment, part of the parent context of the Note:

{
  “id”: “https://imageboard.example/19387428939”,
  “type”: “Note”,
  “attributedTo”: {
    “name”: “Anonymous”
  },
  “content”: “>>19387428935 >>19387428938 take a look y’all”,
  “inReplyTo”: [“ https://imageboard.example/19387428935”, “https://imageboard.example/19387428938”]
  “attachment”: {
    “type”: “Image”,
    “name”: “IMG_4634.jpeg”
    “url”: {
      “href”: “https://imageboard.example/attachments/3847374.jpg”,
      “mediaType”: “image/jpeg”,
      “width”: 375,
      “height”: 667
    }
  },
  “tag”: [
  {“type”: “Mention”, “name”: “ >>19387428935”, “href”: “ https://imageboard.example/19387428935”},
  {“type”: “Mention”, “name”: “ >>19387428938”, “href”: “ https://imageboard.example/19387428938”}
  ]
}

@TheOneric
Copy link

a “transient activity” is also an “anonymous object”, because activities are objects, and because the thing that makes the activity transient is that it’s anonymous.

While their effect for receiving servers may usually amount to the same thing, the way current AP spec describes them they are distinct. Anonymous objects are defined as being "part of its parent context" (and thus not able to be looked up on its own), while transient objects are “short lived activities that are not intended to be able to be looked up”.

The described purpose and intent are different and importantly, anonymous objects cannot exist on the top-level, since there is no parent context to be part of. Your example transient activity therefore is not an anonymous object.
The quoted bit above also suggests only activities can be transient, though later on it also refer to general “transient objects”.

If there’s no reason to ever distinguish between them, I’d suggest to further amend the wording to actually merge “transient” into “anonymous” (E.g. allow omitting the id for anonymous objects and then just mention embedded objects and transient activities as examples of anonymous objects)

@trwnh
Copy link

trwnh commented Nov 2, 2024

the way current AP spec describes them they are distinct

the way current AP spec describes them is wrong and misleading. the “id:null” mechanism is invalid should never have been written.

the purpose of the paragraph is to require dereferenceability except in cases where you explicitly don’t want this. in such cases, you leave out the id.

@TheOneric
Copy link

the “id:null” mechanism is invalid should never have been written.

But it was written and provided a distinction between transient and anonymous. This distinction is also the only motivation I can come up with why it was written the way it is in the first place. That’s why I’m asking about whether it is safe to drop the ability to distinguish transient and anonymous objects.

If it is safe to drop, note that your proposed wording still keeps "anonymous" and "transient" distinct in purpose eventhough they’re no longer distinguishable for receivers, thus my suggestion to explicitly merge the description of those categories.

@trwnh
Copy link

trwnh commented Nov 2, 2024

so something like this, then?

All Objects in [ActivityStreams] should have unique global identifiers. ActivityPub extends this requirement; all objects distributed by the ActivityPub protocol MUST have unique global identifiers, unless they are intentionally transient (short lived activities that are not intended to be able to be looked up, such as some kinds of chat messages or game notifications) not intended to be looked up or referred to. In other words,These identifiers must fall into one of the following groups:

  1. Publicly dereferencable URIs, such as HTTPS URIs, with their authority belonging to that of their originating server. (Publicly facing content SHOULD use HTTPS URIs).
  2. An ID explicitly specified as the JSON null object that is explicitly omitted, which implies; for example, an anonymous object (a part of its parent context) or a transient activity (short lived activities that are not intended to be able to be looked up, such as some kinds of chat messages or game notifications) would omit its ID.

(somewhat un/related, but i think the bit about "authority belonging to that of their originating server" also should be changed, since it's not actually logically implied by "unique global identifier" and it makes all objects owned by an HTTPS server instead of by actors. that's a separate issue, though.)

@TheOneric
Copy link

seems good; thx

@silverpill
Copy link

I think "transient activities" should be removed from the spec. It sounds like "looking up" is the only purpose of an identifier, but identifiers can also used for authentication, authorization, de-duplication of incoming activities and synchronization of collections. There is no good reason for a top-level object to not have an identifier. "Short lived" makes it even more confusing, implying that activities have a duration or a lifetime.

@TallTed
Copy link
Member

TallTed commented Nov 4, 2024

@zotanmew — Please edit your initial post, and code fence each instance of @id (like `@id`), so that GitHub user isn't spammed with notifications about this discussion in which they did not choose to participate.

@zotanmew
Copy link
Author

zotanmew commented Nov 4, 2024

@TallTed I'm told that editing it won't remove the mention, though I'm happy to edit it regardless.

@evanp
Copy link
Collaborator

evanp commented Nov 8, 2024

I'd like to test this with JSON-LD parsers to see what the actual behaviour is. I'm particulary interested in if there's any daylight whatsoever between the @id property and the id property that would allow this different behaviour for the latter.

The JSON-LD playground does show a null id value as an error: https://json-ld.org/playground/#startTab=tab-expanded&json-ld=%7B%22%40context%22%3A%22https%3A%2F%2Fwww.w3.org%2Fns%2Factivitystreams%22%2C%22id%22%3Anull%7D

I think we have two possible paths forward:

  1. Publish an erratum that this invalid syntax should never have been specified.
  2. Publish a deprecation, accepting the fact that some of our implementers do not use JSON-LD compliant parsers, so a null value is acceptable for those consumers.

There are a few other ways that we could represent "anonymous" or "transient" or otherwise unidentified objects:

  1. Just don't provide an id value; leave it undefined.
  2. Have a specified term for an anonymous object, such as https://www.w3.org/ns/activitystreams#Anonymous.

@evanp
Copy link
Collaborator

evanp commented Nov 8, 2024

I think an Erratum is necessary here. Taking out the reference to using null, we could have something like the following:

...all objects distributed by the ActivityPub protocol MUST have unique global identifiers, unless they are intentionally transient or anonymous ([examples]) in which case the identifier MAY be omitted. The identifiers must be a publicly dereferencable URIs, such as HTTPS URIs, with their authority belonging to that of their originating server. (Publicly facing content SHOULD use HTTPS URIs).

@evanp evanp added Needs Primer Page Needs a page in the ActivityPub primer Needs errata We need to add errata for this Next version Normative change, requires new version of spec labels Nov 8, 2024
@zotanmew
Copy link
Author

zotanmew commented Nov 8, 2024

Sounds good to me (I'd vastly prefer the erratum option over a deprecation).

@evanp
Copy link
Collaborator

evanp commented Nov 9, 2024

We could also add something like this?

Consumers MAY treat a null value for the id property as if the property was not defined. Publishers SHOULD NOT use null for the id property, as it is not valid JSON-LD.

This gives us a little Postel resilience.

@evanp
Copy link
Collaborator

evanp commented Nov 9, 2024

And, honestly, I hate the "MUST unless you don't want to" phrasing. Is it too late to just do this?

...all objects distributed by the ActivityPub protocol SHOULD have unique global identifiers. The identifiers must be a publicly dereferencable URIs, such as HTTPS URIs, with their authority belonging to that of their originating server. (Publicly facing content SHOULD use HTTPS URIs).

@zotanmew
Copy link
Author

zotanmew commented Nov 9, 2024

Given that most implementations do not do LD processing for most or even all activities, I’d worry people who are aware of that fact might interpret that as it being fine to send null values, so I’d go for a MUST NOT here, as it makes federation with any implementations that do process activities as JSON-LD impossible when the activity contains such a null @id.

@trwnh
Copy link

trwnh commented Nov 9, 2024

I hate the "MUST unless you don't want to" phrasing. Is it too late to just

i don't think saying "AS2 says you SHOULD have unique global identifiers; AP extends this such that all objects SHOULD have unique global identifiers." makes sense. the thing is, it's already a SHOULD in AS2. why have the language about "extending the requirement" in that case?

by contrast, the "MUST but MAY" is not simply "you don't want to". that's what a SHOULD is. "SHOULD" is "do this unless you have a reason not to." "MUST but MAY" is "do this in every circumstance, but we have the following enumerated exceptions." in other words, the anti-fulfillment argument changes from "i have a good reason not to" and becomes "i specifically qualify for this exception".

Postel-wise, the behavior we're trying to go for here is "don't do this, ever; and if you're doing this right now, stop it." otherwise, it sounds like the bit about how consumers MAY strip null ids would perhaps be good guidance for the primer, but it's pretty clear that on the spec level we should just go ahead and remove this unfortunate error once we have a WG.

@daenney
Copy link

daenney commented Dec 22, 2024

Instead of dealing with an @id of null, and the potential unfortunate clashes that arise whether you do or do not do JSON-LD processing, could we not leverage the fact that IRI's don't have to be dereferenceable?

That would allow for the use of something like { "@id": "urn:uuid:<val>"} for anonymous objects. The @id can safely be emitted so JSON-LD processors shouldn't choke on it. Trying to blindly deref it would still trigger an error, but nothing more than that.

@Tamschi
Copy link

Tamschi commented Dec 22, 2024

I'm not sure that's a good idea. That would invite implementations to sometimes persist these IDs, even though it's trivial for an attacker to make them collide (especially since they may be forwarded by some implementations, so they can't be considered private).

Bridgy Fed for example has to persist IDs of many activities that are discarded in other implementations to avoid double-forwarding. It would be possible to additionally key these "anonymous" cases by e.g. signing actor, but it's another security-relevant special case.

@evanp
Copy link
Collaborator

evanp commented Dec 27, 2024

I've added an erratum to remove the erroneous null option. We can review this at the next CG meeting.

@evanp evanp added Needs Group Input/Decision and removed Needs errata We need to add errata for this labels Dec 27, 2024
DarkKirb pushed a commit to DarkKirb/akkoma that referenced this issue Jan 4, 2025
Current AP spec demands anonymous objects to have an id value,
but explicitly set it to JSON null. Howeveras it turns out this is
incompatible with JSON-LD requiring `@id` to be a string and thus AP
spec is incompatible iwth the Ativity Streams spec it is based on.
This is an issue for (the few) AP implementers actually performing
JSON-LD processing, like IceShrimp.NET.
This was uncovered by IceShrimp.NET’s zotan due to our adoption of
anonymous objects for emoj in f101886.

The issues is being discussed by W3C, and will most likely be resolved
via an errata redefining anonymous objects to completely omit the id
field just like transient objects already do. See:
w3c/activitypub#476

Fixes: https://akkoma.dev/AkkomaGang/akkoma/issues/848
@sebilasse
Copy link

Sorry, for the confusion:
@id is not mandatory in JSON-LD and is an "expanded property".

The JSON-LD spec. says "it is important that nodes have an identifier"
I would interpret it as a SHOULD (just like in AS)

The JSON-LD spec. says
"If expanded property is @id "

But in AS a “transient activity” does not have "@id" then (?)

@zotanmew
Copy link
Author

@sebilasse that would be correct. Instead of sending the activity/object with "@id": null, you have to leave out the property entirely.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs Group Input/Decision Needs Primer Page Needs a page in the ActivityPub primer Next version Normative change, requires new version of spec
Projects
None yet
Development

No branches or pull requests

9 participants