-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adopt OWL duck typing #438
Comments
CASE and UCO lack a formal definition of "Duck typing," and I believe that is the source of much confusion among committee members. By my understanding: Informally, "Duck typing" has implied a combination of two methods of classifying objects, and optionally a third entailment:
In OWL, inferencing capabilities bring M1, M2, and E1 together. I'm not sure CASE and UCO's interpretation of "Duck typing" is more than only M2. I am aware that early drafts of UCO attempted OWL domains for properties erroneously in implementation, and in purpose - domains were attempted for validation use, but OWL doesn't do data validation. Hence we went to SHACL. CASE and UCO need to formalize their interpretation of "Duck typing," and relate it to OWL's, for us to understand the merits of this proposal. For instance, we must understand what is, and is not, meant to be entailed by:
|
@sbarnum, my understanding of the facet has been fuzzy from the beginning. @ajnelson-nist has not been able to clarify it for me. You and I have not had the time nor the incentive to discuss this. My understanding of the design principle, as described in the UCO design document section 5, is to provide the capability to separate the object from properties. Question: is my understanding correct? If so, please provide the capabilities that justify the principle. If not, please provide the essence of your interpretation of the principle. |
In ontology engineering, the prime purpose is to define categories of things. More specific, to commit to the existence of those categories. If we look at a cup of coffee, we all share the intuition that the cup is different from the coffee, and that the two show different behaviour. This is based on what in ontology-speak is termed the "Principle of Identity": what makes that we can point at things in reality and collect things that are very similar to the cup, and other things that are very similar to the coffee. In short: what identifies something as a cup and what as coffee. The answer is probably along the lines of: the cup can hold coffee, whereas the coffee cannot hold something but requires something to hold it; on dividing the coffee in two, both parts remain coffee, whereas dividing the cup turns it disfunctional. The significance about these answers is that they “[...] do not ask for what there is, but for what a given remark or doctrine [...] says there is” (Quine). The prime purpose of UCO and CASE is to make distinctions from the perspective of the Cyber Community; If we adopt a concept, then UCO acnowledges that such particular thing exists in the Cyber Domain: UCO commits to its existence. This is magnified by the objective of UCO to become the standard in the Cyber Domain. In order to fulfill the objective and apply ontology to achieve its purpose, we need the principle of identity. And the one and only means to implement the principle of identity in ontology is to specify what it means to be of a certain kind, to be member of a certain category. In other words, define a Class with a unique set of intentions that remain invariant over all its individual members (instances). (This does not imply, btw, that each and every class implements the principle of identity, there is also the principle of application.) By introduction of the Facet, and by insisting on the potential to decouple between the class and its characterising properties as specified by a facet, the capability to evade the principle of identity has been provided. Because:
In conclusion, by ignoring ontological rigor in general and the principle of identity specifically, the model that is being created is not an ontology anymore. Whether it is modeled in OWL or not is irrelevant. (Note that each and every facet commits to a particular set of characteristics, which, by ontological definition, represents a category of things. I.e., it commits to the existence of such category. Consequently, not following ontological rigor does not imply "no ontology applies", but implies "this categorisation applies" by token of the definition of the category. In other words, the opposite of ontology is not "non ontology" but "bad ontology". ) |
I agree with @plbt5 's remark, and also have some engineering-inspired unsettlements with
They have proven difficult to evolve, and have harmed a non-zero number of change proposals. Issue 370 got stuck when we realized the issue enabled (with our understanding of They are a significant discomfort to program, because at least in Python, They are restricted in ways that make them seem like ...organelles of objects, is the best term I can think of. We've had misunderstandings and disagreements with whether it's ever appropriate to reference them with properties aside from It would be my strong preference to remove the notion of For all the modeling weaknesses that We have an example in this Issue's description that, in some sci-fi contexts, would be a cyborg - a person with a 2TB storage capacity. If the Ontology Committees agree "Please let's not permit that for now," we can stage for UCO 2.0.0 a disjointedness definition between I think this journey starts with making firmer commitments around what I'd named "M1", "M2", and "E1" in my prior comment. |
When the subgroup meets to discuss this Issue, we should be aware of this demonstration in Oresteia: |
I think it is necessary to call together the subgroup (@ajnelson-nist @sbarnum @eoghanscasey @plbt5), but only after @sbarnum has had the opportunity to describe his explanation on the Facet: Purpose and Approach. |
In response to @ajnelson-nist comment above:
I understand the M1 definition to equal the "it's a duck" principle, and M2 as the actual "duck typing" concept. Please correct me if I've understood M1 or M2 incorrectly. |
In general: This reads like a very good idea to me. I can follow the way this could work and how it can make things easier. Having said that, I am not an ontologist. Problem 1. Clear example. Problem 2. I do not see why it is weird to add LatLongCoordinationFacet to a picture having geo-location details in its EXIF. Problem 3. My answer to the specific question would be: YES, the laptop, the iPod and the hard disk are instances of devices allowing data storage. Problem 4. If OWL is the way to go with UCO, then UCO should follow the OWL approach (amongst others to benefit from the tools and techniques available for OWL). |
Re: @plbt5
You understood me correctly. |
Re: @Harm-van-Beek
Yes, this is under the 2.0.0 milestone.
Yes,
Why it's weird is that kb:jpeg-1
a observable:RasterPicture ;
ex:latitude "1.234"^^xsd:double ;
ex:longitude "2.345"^^xsd:double ;
. If ex:LatLongCoordinates
a owl:Class ;
rdfs:subClassOf location:Location ;
.
ex:latitude
a owl:DatatypeProperty ;
rdfs:range xsd:double ;
.
# And sim. for ex:longitude then, as written above, there would be no OWL or RDFS expansion from putting ex:latitude
rdfs:domain ex:LatLongCoordinates ;
. If that domain statement is used, then the presence of kb:jpeg-1
a
ex:LatLongCoordinates ,
location:Location
;
Would it make sense to you for a single graph node to be BOTH a I believe the objective of associating a latitude and longitude with a picture is to say "This picture has a relationship with a location with lat Y and long X", not "This picture is a location with lat Y and long X." That is, however, a significant amount of top-level property design and class separation (using
Now, suppose I say to an analyst "Please image this desktop tower." A tower has characterizations of a storage device, so they say sure thinking it's an easy overnight for their one write blocker, unscrew the case, find eight hard drives in it, and realize what was meant by this graph node handed to them as part of the chain of custody: kb:tower-5b2188da-67a2-40e5-842c-bb582874ca2b
a observable:Computer ;
core:hasFacet [
a ex:StorageMediumFacet ;
rdfs:comment "Heads up - the OS reports having 7TB storage. Didn't know anyone made those. Box is kinda heavy, too."@en ;
observable:storageCapacityInBytes 7696581394432 ;
] ;
. (Aside: Here, the
To continue the OWL conversation, UCO needs technology demonstration pipelines that could be integrated into unit testing. There have been some less-than-successful attempts at this, which was part of what lead to self-building an OWL conformance suite in SHACL. We certainly welcome receiving guidance or demonstration on OWL mechanisms, but for now, UCO's adoption of OWL goes so far as some of the more elementary features (e.g. ontology versioning, disjointedness semantics, some property-range expression beyond RDFS) and, so not yet into OWL inferencing or RDFS |
Thanks @Harm-van-Beek for your very valuable review and comments. Much appreciated. |
Provide for a link to the design doc section that explains the differences between ontologies and schemata |
Background
UCO has implemented Duck Typing for already a long time by application of the facet pattern. As indicated by the UCO Design Document, section 5:
The facet pattern brings about several drawbacks/problems, and we propose to implement Duck Typing by using standard OWL constructs only.
Problem 1 - inconsistent data
Facets, and particularly their subclassing, allow the following inconsistent construct to emerge:
This example shows that the subclass convenience accidentally creates a spot to record inconsistent data.
Problem 2 - strange, if not invalid, implicit commitment to reality
The intended application of the facet pattern is to allow Duck Typing, e.g., a query returns things that have storage capabilities without being enforced to be specified as storage devices:
Unfortunately, the absence of an explicit commitment to a type allows for flexibility in a way that can result in weird data, e.g.,
That example is a perfectly UCO-0.9.0-conformant manner of representing a person who has a 2TB hard drive in their pocket. However, as opposed to "a person carrying a device that has that storage capacity", what the triples actually assert is that "the person itself has storage capacity". This represents an invalid state of affairs, or is at least a rather inaccurate representation of the actual state of affairs: a human cannot be considered to be a storage device, or to have storage capabilities.
One could argue that stakeholders won't construct such weird semantics, however, a community member has said they would happily assign a
location:LatLongCoordinatesFacet
to aobservable:RasterPicture
(a subclass ofobservable:File
) if that picture file was a JPEG with lat/long coordinates embedded in its EXIF.Problem 3 - absence of explicit commitment
Despite the requirement to not enforce strict data typing, the facet defines a set of characteristics in order to represent something. This implies that each and every facet, by token of its specified characteristics, represents a certain typology implicitly: although the facet does not name the type of the typology, the typology de-facto applies.
In accordance to the above SPARQL example, the fact that there is no name attached to the category does not prevent us to conclude that the laptop (a computer) and the iPod (a music player) are similar devices as the hard disk, viz a storage device. The question at the heart of the issue is: do we commit to the conclusion that the laptop, the iPod and the hard disk are instances of one type of thing?
Yes: commit to one type of thing
If we answer the question affirmative, then we accept the behaviour of the facet to commit to the existence of a particular type of thing that gathers computers, music players and storage devices, e.g., devices that allow storage of data. Since we commit to it, there is no reason not to attach a name to the type, e.g.,
ex:DeviceAllowingDataStorage
. Consequently, the following three statements are consider valid:which implies that we have successfully characterised a type of thing by means of its characteristics: indeed a proper implementation of Duck typing.
No: these are different types of thing
If we answer the question negative, then the characterisation of the facet can be considered incomplete or otherwise invalid. We either have to add more characteristics to differentiate between the distinct types of thing, or we have mistakenly conflated the semantics of is_a with those of has_a. In any case, we have not implemented duck typing correctly.
When we combine both answers, then the conclusion is that facets either do NOT implement duck typing or do properly implement duck typing but in an OWL-unfamiliair approach. Considering that it was the intention of facets to make the distinction based on characteristics, i.e., duck typing, I'm inclined to acknowledge the objective of the facet, viz. duck typing is a necessary capability to support, but consider the design pattern to its implementation incorrect due to the absence of the explicit commitment to the existence of the type.
Problem 4 - OWL-unfamilair approach
Another consequence of the application of facets is that this is not how the Semantic Web, i.e., the OWL language, has been designed to work. The facet design pattern is not part of OWL in the sense that it is recognised as such and conclusions are drawn from it out-of-the-box: no code exists to process this design pattern. Consequently, none of the tools that are compliant to OWL will be able to process this design pattern and show the intended behaviour. If one requires the intended behaviour, this behaviour is to be implemented next to the OWL technology by each and every stakeholder that has interest in this behaviour.
This characteristic might be allowed for a local solution to a local problem, however, for a worldwide standard this is odd. Moreover, it is very problematic since it enforces local additions to the technology, additions that might even be stakeholder dependent.
Problem 5 - Undefined relationship with
core:UcoObject
Although "A facet is a grouping of characteristics unique to a particular aspect of an object" (Definition of
core:Facet
), no definition exists about the relationship that apply between the facet and the object. Two related problems arise with the absence of defining the relation:Requirements
The requirements for Duck typing have been specified already in the UCO Design Document, section 5.1, as three separate requirements.
Requirement 1
CASE uses duck typing which allows data to be defined by its inherent characteristics rather than enforcing strict data typing.
Requirement 2
CASE objects can be assigned any rational combination of facets, such as a file that is an image and a thumbnail. When employing this approach, data types are evaluated with the duck test, allowing data to be represented more truly without imposing a rigid class structure. (...)
Requirement 3
For certain common combinations of facets, it is possible to assign them a higher-level class, such a PDF File or WhatsApp Message.
Risk / Benefit analysis
Benefits
Replacing the facet pattern with an OWL-familiair Duck Typing capability, removes the need for stakeholders to provide for additional code to support the intended behaviour of the facet, viz Duck Typing, allows for consistency in its allowed data, and creates the ability to commit explicitly to the inferred typology.
The facet pattern has been used since the start of the development of UCO, and has received questions and confusion since. Replacing it with an OWL supported pattern will clarify how Duck Typing can be applied within UCO. We therefore recommend the CP's implementation in version 1.0.0 to consolidate this clear, supported and simple form of Duck Typing as opposed to suggest that the facet pattern is a necessary pattern for the UCO standard.
Risks
This CP can be considered a significant overhaul of the UCO design with the risk that community members might decide to turn away from the UCO initiative, given the effort required to implement the change.
Competencies demonstrated
Competency 1
Duck typing: When something has one or more properties, infer that it belongs to the category identified by those properties, e.g., assume that everything that allows to store data is a storage device.
Competency Question 1.1
Provided the following data:
kb:object-1 ex:hasStorageCapacityInBytes 2000000000000 .
What is the type of thing the individual
kb:object-1
represents? In SPARQL:Result 1.1
The following triple shall be inferred:
kb:object-1 a ex:StorageDevice .
Competency 2
In terms of the UCO DD:
Competency Question 2.1
Provided the following two sets of data on the same individual:
What is the type of thing the individual
kb:raster-picture-f970b1a2-c6f1-4082-a2fb-3e8f4a7913b2
represents? In SPARQL:Result 2.1
The following triples shall be inferred:
Competency 3
Infer that a datum is a member of a higher-order class, i.e., a superclass, based on the same Duck Typing properties.
Competency Question 3.1
Provided the following data:
ex:dns-server-address-1 observable:addressValue "1.1.1.1" .
What is the (super)type of thing the individual
kb:object-1
represents? In SPARQL:Result 3.1
The following triples shall be inferred:
Solution suggestion
The examples apply namespace abbreviations to separate between their definition as reusable knowledge base,
kb:
, or as exemplifying data to assert a certain state of affairs,ex:
.Solution part 1
Use
rdfs:domain
andrdfs:range
statements to implement Duck Typing, as opposed to the facet pattern, for each facet that has been specified asrdfs:subClassOf core:Facet
.Solution implementation
This implies the following modifications:
observable:FileFacet
-->observable:File a owl:Class
.sh:path observable:fileName ;
-->observable:fileName rdfs:domain observable:File .
observable:fileName a rdfs:DatatypeProperty .
rdfs:ObjectProperty
, dependent on the range of the characteristic.Explanation: The use of
rdfs:domain
andrdfs:range
statements.Consider the following knowledge graph:
Note:
kb:D kb:p kb:R
, only defines thatp
is used to relateD
toR
. This allows us to say thatex:Shakespeare kb:wrote ex:Hamlet
, and subsequently, to get an answer to the question who wrote Hamlet (SELECT ?a WHERE { ?a kb:wrote ex:Hamlet }
==>ex:Shakespeare
).rdfs:domain
andrdfs:range
properties do NOT mean to validate data, i.e., that an instance of the specified object MUST HAVE the specified property. In stead, it is used the other way around to establish the type of a datum. For instance, if a datum applies the property about storage capacity, then that datum is considered to belong to the category of storage device. In pseudo code:Formalised in SPARQL, this results in:
Similarly,
rdfs:range
statements can be made to infer something to be of a certain type based on the range of a property:In contrast to the facet, both CONSTRUCT rules are already part of the set of inference rules that belong to OWL and do not need to be specified; only the domain and range relations that are used as input to these rules are required to be specified.
(CASE users may already have seen some of the impact of these
CONSTRUCT
queries. The RDFLib OWL-RL library provides "Graph expansion" features that perform some of this constructive inference. Users ofcase_validate
can make use of the features via the--inference
flag. RDFS inferencing runs those aboveCONSTRUCT
s forrdfs:domain
andrdfs:range
statements that directly reference classes. OWL inferencing can function with more nuanced domains and ranges, involving anonymous classes andowl:unionOf
/owl:intersectionOf
.)Conformance to competencies
CQ 1
For example:
ex:hasStorageCapacityInBytes
as a property to the classex:StorageDevice
.kb:hasStorageCapacityInBytes a owl:DatatypeProperty ; rdfs:domain kb:StorageDevice .
ex:object-1 kb:hasStorageCapacityInBytes 2000000000000 .
ex:object-1 a kb:StorageDevice .
This meets CQ 1.
CQ 2
Consider the knowledge that:
kb:File
kb:fileName a owl:DatatypeProperty ; rdfs:domain kb:File .
2: everything that has a pictureType is member of the type kb:Picture
Then the following triples will be inferred:
This meets CQ2
Solution part 2
Combine domain and range statements with
rdfs:subClassOf
in order to apply subclassing in the Duck Type pattern.Solution implementation
This implies similar modifications as indicated in Part 1:
observable:DigitalAddressFacet
-->observable:DigitalAddress a owl:Class
.sh:property [ sh:path observable:addressValue ]
-->NIL
observable:IPAddressFacet rdfs:subClassOf observable:DigitalAddressFacet
-->observable:IPAddress rdfs:subClassOf observable:DigitalAddress
.sh:path observable:addressValue ;
-->observable:addressValue rdfs:domain observable:IPAddress .
observable:addressValue a rdfs:DatatypeProperty .
rdfs:ObjectProperty
, dependent on the range of the characteristic.Explanation: combination of inference patterns
The Type Propagation Rule
The basic subclassing inference is induced by
kb:B rdfs:subClassOf kb:A
. The meaning forrdfs:subClassOf
is given by the statements that are inferred from it. In pseudo code:This has been formalised (and included by default) as a knowledge rule in OWL:
Combination of Type Propagation with Domain and Range
The purpose of this combination is to infer that when it is asserted that the
rdfs:domain
of a property is a particular class, then it can be inferred that the property also has the superclass of the particular class in its domain. This also holds forrdfs:range
properties.Conformance to competencies
CQ 3
For example, by specifying the knowledge graph:
and adding the datum triple
ex:dns-server-address-1 observable:addressValue "1.1.1.1" .
allows to infer that
ex:dns-server-address-1 rdf:type observable:DigitalAddress
.This meets CQ 3.
Conclusion
In conclusion, we only need to specify:
in order to induce this particular behaviour of Duck Typing in regular OWL as opposed to adopt the unclear and complicated facet pattern.
The text was updated successfully, but these errors were encountered: