JSON-LD Validation Scoping #33

bleonar5 · 2023-11-29T16:08:48Z

We need to lay out some clear parameters for what we will be considering a "valid" JSON-LD object, and also possibly make some adjustments with respect to the Psych-DS specification as laid out in the Google Doc.

Here are some of the official requirements of JSON-LD, as found here:

A JSON-LD document MUST be able to express a linked data graph* (elaborated below) []
A JSON-LD document MUST be a valid JSON document [X]
All JSON constructs MUST have semantic meaning in a JSON-LD document: [X]
JSON arrays MUST NOT be interpreted as defining an object ordering. [X]

(There are other bullets in their list, but they are all SHOULDS and MAYs, where we are most interested in the MUSTs

Guidelines for a linked data graph: (in the same doc as above)

Subject, objects and edges all SHOULD be identified with IRIs

^^^Part of my issue with this combination of requirements is that they seem so bottom out with "being valid JSON", because:

Even though a JSON-LD document must be expressable as a linked data graph, the requirements for a linked data graph are all non-normative, all SHOULDs.
The requirement that JSON constructs must have meaning refers to something intentional rather than technical. That is, from what I can tell, it's not saying that all JSON constructs must be linked to some informative IRI, it's saying that the user must not use values that don't mean anything
The requirement that arrays must be interpreted as unordered is a matter of interpretation, not computer validation.

In the world of JSON-LD there are an abundance of SHOULDs and barely any MUSTs. What we have to decide is whether to codify our own set of MUSTs for Psych-DS specific JSON-LD, or to just keep valid JSON format as the only MUST and implement the variety of SHOULDs as warnings.

For instance, we allow users to include non-schema.org keys (or rather, string keys that don't link to any IRI) within their metadata, which is allowed according to strict JSON-LD rules, but recommended against. Here are some questions:

Do we want to require that schema.org context MUST be included, and that the required terms of our spec such as "name" and "variableMeasured" MUST expand to their full schema.org IRIs?
Do we want to allow for expanded, contextless JSON-LDs as valid metadata files?
If we do choose to implement the full gamut of JSON-LD SHOULDs, are we prepared to present those recommendations to the user, at risk of overwhelming them?
Do we want to allow for namespaces other than schema.org in the context?
Do we want the validator to check that JSON-LD IRIs actually point to real web pages? [This has implications for our eventual python version, for which offline functionality is a desideratum]

There are other questions, but this set covers the gist of it. Including some misc. references below, such as Best Practices and the official "JSON-LD grammar":

Here are some "best practices" put forth by W3C:

Best Practice 1: Publish data using developer friendly JSON
Best Practice 2: Use a top-level object
Best Practice 3: Use native values
Best Practice 4: Assume arrays are unordered
Best Practice 5: Use well-known identifiers when describing data
Best Practice 6: Provide one or more types for JSON objects
Best Practice 7: Identify objects with a unique identifier
Best Practice 8: Things not strings
Best Practice 9: Nest referenced inline objects
Best Practice 10: When describing an inverse relationship, use a referenced property
Best Practice 11: External references SHOULD use typed term
Best Practice 12: Ordering of array elements
Best Practice 13: Provide a representation of the entity related by URL
Best Practice 14: Cache JSON-LD Contexts

JSON-LD Grammar

(Interesting point from the above grammar: unlinked keys in the JSON-LD MUST be ignored when processed. We may want to remind users that adding unlinked keys to their metadata does not technically add to its richness, since it will be ignored during any official processing on the web)

additional MUSTs that we can glean from the grammar:

A JSON-LD document MUST be a single node object, a map consisting of only the entries @context and/or @graph, or an array of zero or more node objects.
the keys in objects MUST be unique.
A term MUST NOT equal any of the JSON-LD keywords, other than @type.
When used as the prefix in a Compact IRI, to avoid the potential ambiguity of a prefix being confused with an IRI scheme, terms SHOULD NOT come from the list of URI schemes as defined in [IANA-URI-SCHEMES]. Similarly, to avoid confusion between a Compact IRI and a term, terms SHOULD NOT include a colon (:) and SHOULD be restricted to the form of isegment-nz-nc as defined in [RFC3987].
To avoid forward-compatibility issues, a term SHOULD NOT start with an @ character followed exclusively by one or more ALPHA characters (see [RFC5234]) as future versions of JSON-LD may introduce additional keywords. Furthermore, the term MUST NOT be an empty string ("") as not all programming languages are able to handle empty JSON keys.
all of the aspects of context definitions are MUSTS
all of the expanded term definition requirements apply as MUSTs

note:

This refers to the eventual deprecation of non-IRI keys in JSON-LD

bleonar5 · 2023-11-30T15:16:13Z

After doing a deeper dive into the jsonld.js package, I can see that it does produce error messages that correspond directly to a lot of the MUSTs from the JSON-LD Grammar. These mostly seem to revolve around restricted usages for the various "@" keywords.

This is great, because it means we can offload a lot of this fine-grained syntactic validation of json-ld objects to the official package itself, funneling its error messages into our app's validation "issues" that get presented to the user. One nice thing about these error cases is that they only really arise when you begin to use some of JSON-LDs more complex features, so there's not as much of a worry of these checks being prohibitive to beginners.

There's another category of JSON-LD MUSTs that result in ignored content rather than an error message. For instance, in the JSON-LD playground, using a key that resolves to a string instead of an IRI results in that key being dropped. We have to decide whether such violations ought to be errors or warnings.

bleonar5 added the SpecificationIssue Issue concerning the psych-DS specification and schema model label Nov 29, 2023

bleonar5 self-assigned this Nov 29, 2023

mekline mentioned this issue Dec 4, 2023

Scoping the Psych-DS Validator - MKS update/expire/migrate psych-ds/psychds-validator#18

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JSON-LD Validation Scoping #33

JSON-LD Validation Scoping #33

bleonar5 commented Nov 29, 2023 •

edited

Loading

bleonar5 commented Nov 30, 2023

JSON-LD Validation Scoping #33

JSON-LD Validation Scoping #33

Comments

bleonar5 commented Nov 29, 2023 • edited Loading

bleonar5 commented Nov 30, 2023

bleonar5 commented Nov 29, 2023 •

edited

Loading