-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow to circumvent identifying UcoThing's through UUID enforcement for digital resource data #606
Comments
A follow-on patch will regenerate Make-managed files. References: * #606 Signed-off-by: Alex Nelson <[email protected]>
References: * #606 Signed-off-by: Alex Nelson <[email protected]>
…ts example A follow-on patch will regenerate Make-managed files. References: * #606 Signed-off-by: Alex Nelson <[email protected]>
References: * #606 Signed-off-by: Alex Nelson <[email protected]>
A follow-on patch will regenerate Make-managed files. References: * #606 Signed-off-by: Alex Nelson <[email protected]>
A follow-on patch will regenerate Make-managed files. References: * #606 Signed-off-by: Alex Nelson <[email protected]>
References: * #606 Signed-off-by: Alex Nelson <[email protected]>
This proposal was announced for committee discussion last Friday, slated for July 16. I forgot to note, the structure of this proposal changed slightly. Paul and I found it beneficial to move the Solution Suggestion up higher in the proposal before going through its implications. |
After discussion with @plbt5 , we concluded that as we wrote this over several weeks, it evolved into two proposals of sufficiently different purposes that it should be split. I'm closing this issue so it can be split. |
(Submitted by @plbt5 and @ajnelson-nist.)
Background
Currently, UCO must allow for working with data from many different organisations. In order to not enter into a conflict for data uniqueness, it has been decided that data that are to enter the digital CDO-realm must be uniquely identified by an IRI that ends with a UUID. This is being enforced by a SHACL rule applied on each instance of UcoThing to verify the presence of an ending UUID syntax in its URI.
Unfortunately, this rule does not account for data that effectively represent a digital web resource, e.g.,
<https://caseontology.org/index.html>
. For this category of data, adding a UUID to the IRI is invalidating the identification of the web resource, whereas leaving the UUID out would trigger a complaint. Both strategies are impediments to the data interoperability purpose of CDO/UCO/CASE. Moreover, adding a UUID to data that represent a web resource is defeating the point of the rule since web resources are required to be unique in order to be resolvable into a web location. Hence this change request.We remind us of the distinction that has already been identified by RDF between information and non-information resources. This ended up in RFC9110 HTTP Semantics. We have depicted their application and distinctions in Figure 1 below:
Figure 1 - Information and non-information resources: their relationship and differences
(Note: For the purposes of this proposal, please consider URI and IRI as synonymous.)
The distinction between an Information Resource (IR) and Non-Information Resource (NIR) cannot be determined from the URI itself but from the response that one gets from the server. If the URI concerns a NIR the server cannot respond with data because there does not yet exists something like Elephant-Over-IP or Paul-Over-IP a.k.a. "Beam me up, Scotty" in the protocols. Instead, the server will respond with a HTTP-303 status, redirecting to a URI that is an Information Resource. Visiting the NIR thus discloses information about the NIR as opposed to the real thing itself.
This distinction is instrumental for a lot of things that are built with RDF(s) and OWL, and it is something that UCO should at least recognize as current practice.
Requirements
Requirement 1
Allow self-identifying information resources to reside in the graph without any additions to, or changes in, their already unique identification.
Requirement 2
Allow to assert the distinction between NIR's and IRs, or allow that a resource can be both.
A resource can be both an IR and a NIR because it can be perceived as an IR or NIR depending on constraints or business rules as implemented by the server, e.g., serving pages in different languages when requested from different geographical locations.
Requirement 3
A single web resource MUST be able to be represented as an IR or an NIR as appropriate at different times, when analysing a cyber incident such as a domain-hijacking.
Solution suggestion
One solution is to apply reification on the digital resources' identifiers as URLs in order to incorporate them into a UCO knowledge graph. And although this might suffice, it seems a bit wasteful to first obfuscate a valid URI for a digital resource into a UUID-ending URI, followed by a pattern to elucidate it again.
We take the distinction between the NIR and IR as the essence to the solution. The implementation would need to introduce the distinction between Non-Information and Information Resources. This would become two additional near-top-level classes, under
core:UcoThing
. This would be a nod to the concepts really being RDFS concepts, but not defined with RDFS IRIs. We should also avoid entailing RDFS semantics ofrdfs:Resource
being the top-level class, because of the tension such would create with OWL andowl:Thing
being the top-level class.Introducing this distinction within UCO indeed makes it possible to enforce the uniqueness-by-UUID demand for non-information resources only, while still allowing to include information resources by their original URLs. Unfortunately, this does not solve the problem because UCO cannot assume, as RFC9110 HTTP Semantics does, that
core:NonInformationResource
andcore:InformationResource
are always disjoint and remain as such. Instead, UCO must follow the reality where an IR can change into an NIR, as explained in Competencies 1 and 2. In these particular cases, the rule "add an UUID for NIRs only" fails.Therefore, the actual solution is to
To that end, we suggest to introduce the following concepts in UCO:
We also introduce
observable:WebResource
as a parent toobservable:WebPage
, to acknowledge web resources that are not yet known to be an IR or NIR, as well as to show the cyber-domain hijacking event (see Benefits section):Based on this distinction, the specifications as exemplified in the CQ's become syntactically correct and semantically valid. Visually, this renders as follows:
Apart from the above additions to UCO, we suggest to perform an initial alignment. The Risks section should make clear the benefit of such alignment, particularly pertaining to some existing practices (outside of UCO) on designating graph nodes with RDF types analogous to UCO's
identity:Person
andobservable:WebPage
.Competencies demonstrated
Competency 1
Assume data that are containing URL's as digital resources, i.e.
<https://caseontology.org/index.html>
, as well as data that are containing non-information resources, i.e.,identity:Organization
.Competency Question 1.1
Showing that both can be included in the UCO digital realm, where the NIR must carry a UUID in its identification whereas the information resource does not have the same need.
Show all IRIs that identify a webpage:
Result 1.1
Competency Question 1.2
Show the distinction between the NIR that is requested and the IR that is served about the NIR.
For this distinction to be assessed, we introduce an additional relationship that expresses the HTTP 301 Return Code and allows to construct the following data graph:
We formulate the following SPARQL, to find all information resources arrived at by redirection---which suggests an entailment of
?sourceObject
being acore:NonInformationResource
:Result 1.2
<http://caseontology.org/>
<https://caseontology.org/index.html>
By being
rdf:type
'd as aobservable:WebPage
, both of these IRIs are excused from the UUID review rule, even though?sourceObject
is in this case behaving as acore:NonInformationResource
.Competency 2
Say the webpage of a multilingual company (MC) is being accessed by two market analysts in a multinational organization, who routinely contribute to a shared knowledge base in the organization. Their offices are in different countries that happen to use languages MC supports, Japan and France. MC's default language is Japanese.
The Japanese analyst visits the home page,
https://mc.example.co.jp/
, and is served content from that URL. The French analyst visits the home page,https://mc.example.co.jp/
, and is 303-redirected tohttps://mc.example.co.jp/lang-fr/
by server-side client-geolocation rules.Neither analyst knows the other is trying to access
https://mc.example.co.jp/
.Competency Question 2.1
What are the representations of the Japanese analyst and the French analyst, using
InformationResource
,NonInformationResource
,NeverInformationResource
,WebResource
, and/orWebPage
?The Japanese analyst:
The French analyst:
Even if pooled in the shared knowledge base, this total knowledge view remains consistent (i.e. does not raise SHACL validation errors).
This provides an example of a web resource that is, by differential service, contingently a
InformationResource
and/or aNonInformationResource
.Competency Question 2.2
Are the views consistent when pooled into one graph without any notes on time of observation (i.e., does not raise SHACL validation issues)?
Yes. The testing in PR 610 confirms no SHACL violations are raised. The visual display of the classes and how this example doesn't hit a class-disjointedness issue is as follows (using "⊂" for subclassing (
rdfs:subClassOf
), "⋂=∅" for class-disjointedness (owl:disjointWith
), and "∈" for instantiation (rdf:type
)).Risk / Benefit analysis
Benefits
.../index.html
) tomorrow and encounter this:It is likely worthwhile being able to model those two web pages as direct graph individuals,
<https://caseontology.org/index.html>
and<https://caseontology.org/something_really_wretched.html>
, e.g. for describing the non-continuous time intervals in which they resolve.Risks
Prior text revised to read, 2024-07-26: Some web services might choose to not distinguish between IRs and NIRs.
Suppose a personnel indexing service is deployed that uses home pages as person identifiers for an example organization:
Suppose also that
http://example.org/~bob
, when visited, is served as HTML in a browser.This service cannot integrate into an environment where information resources and non-information resources are held disjoint.
foaf:Person
is one of the typical examples of a non-information resource. The home page for Bob is an information resource.Integration of such a data source would need to split the (generic) resource
http://example.org/~bob
into independent entities, likely that follow the UCO UUID IRI naming scheme.Coordination
After discussion in 2024-07-16 meeting, this proposal will be split into two.
The text was updated successfully, but these errors were encountered: