Recommend URI syntax normalization + scheme normalization for identifiers? (Also consider query component rules) #483
Labels
Needs Primer Page
Needs a page in the ActivityPub primer
Next version
Normative change, requires new version of spec
Related:
ActivityPub developers and implementers using HTTPS identifiers ought to be aware of the "normalization and comparison" considerations for HTTPS URIs.
For HTTPS scheme normalization, refer to RFC 9110 Section 4.2.3: https://datatracker.ietf.org/doc/html/rfc9110#section-4.2.3
For URI syntax normalization, refer to RFC 3986 Section 6: https://datatracker.ietf.org/doc/html/rfc3986#section-6
Some common considerations in imperative form
HTTPs://Domain.EXAMPLE
SHOULD be normalized tohttps://domain.example
(excluding other considerations)https://domain.example:443
SHOULD be normalized tohttps://domain.example
(excluding other considerations)https://domain.example:
SHOULD be normalized tohttps://domain.example
(excluding other considerations)/
for the path component.https://domain.example
SHOULD be normalized tohttps://domain.example/
(excluding other considerations)Considerations that do not exist at URI/HTTPS level and must be considered at a protocol level
Query component processing
Per https://datatracker.ietf.org/doc/html/rfc3986#section-3.4:
Query components are by default opaque. At the level of an HTTPS URI, the first unencoded
?
delimits the query component, which ends only when encountering a#
(delimiting the start of the fragment component) or the end of the URI.Purely by convention, it is common for application servers to try to parse "query parameters" out of the query component of the URI. Arguably this is a misfeature and an antipattern, since the ordering of such query parameters should not have any bearing on the identity of the resource --
/?foo=1&bar=2
is semantically equivalent to/?bar=2&foo=1
when being used to extract request parameters; such "request parameters" should go on the request itself, not on the identifier (which becomes a completely different identifier when the order of the parameters is changed). But the practice of using=
and&
to parse a query component as a series of request parameters is (unfortunately) quite prevalent, even very widespread (although at some point around the era of HTML4 it was recommended that the delimiter between such "parameters" be;
instead of&
.)ActivityPub should probably also warn about this or give guidance that query components in
id
are opaque and SHOULD NOT be parsed as parameters for the purposes of reference or equivalence.If ActivityPub ever prescribed specific query parameter processing, then the ordering of such query parameters needs to be canonicalized with some kind of normalization algorithm.
At the very least, for implementers using the query component to encode request parameters, these implementers SHOULD normalize/canonicalize the order of these parameters when normalizing/canonicalizing their URIs before including them as
id
on any object(s).Recommendations
The text was updated successfully, but these errors were encountered: