Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stick with RDFS or switch to OWL #61

Open
escowles opened this issue Aug 23, 2016 · 59 comments
Open

Stick with RDFS or switch to OWL #61

escowles opened this issue Aug 23, 2016 · 59 comments

Comments

@escowles
Copy link
Contributor

There is a proposal to switch from encoding the core ontology in RDFS and to use OWL instead because it is more expressive and can encode things like pcdm:hasMember and pcdm:memberOf being reciprocal properties.

Should we switch to encoding the core ontology in OWL?

See #53 for preliminary discussion.

@ruebot
Copy link
Contributor

ruebot commented Aug 23, 2016

👍 OWL

EDITED

@escowles escowles mentioned this issue Aug 23, 2016
@whikloj
Copy link

whikloj commented Aug 23, 2016

I think that the goal from my perspective with PCDM is interoperability.

Because we are looking at multiple very different applications, I feel like the goal is less about RDFS vs OWL and is more about adding some description and (dare I say) restrictions to the ontology.

This would make it much easier for people implementing PCDM to follow the data model. Rather than checking an application's/group's/community's implementation documentation (which could change). But this is just my opinion.

@DiegoPino
Copy link

@whikloj true, but OWL 2 (also a matter of which profile we choose) has the necessary expressiveness to add restrictions and other semantic constructs +opening inference capabilities. RDFS simply not (being domain and range not restrictions just to clarify)

@no-reply
Copy link

I share the concerns expressed by @escowles in #53:

I am very skeptical of OWL: it is much more expressive, but also much more complex, doesn't help with validating that data conforms to the ontology, etc. IMHO, I think we should stick with simple RDFS, and rely on application profiles, good documentation, implementations, etc. to show how the classes and properties should be used together.

I'm interested in hearing OWL use cases, to better understand the motivation behind the suggested changes and how they might impact existing RDF(S) uses.

@azaroth42
Copy link
Contributor

I also am skeptical of the benefits. I would be happy with RDFS + minimal basic OWL such as owl:inverseOf and other individually justified features.

@DiegoPino
Copy link

DiegoPino commented Aug 23, 2016

About those concerns @no-reply

I am very skeptical of OWL: it is much more expressive, but also much more complex,

True, more complex, but not too complex. skeptical is good, leads to further reading

doesn't help with validating that data conforms to the ontology, etc.

Not true. Does help validate a lot, more that just 'range' and 'domain', allows you to infer new classes based on rdf:type intersection, union, etc, restrict properties to certain subjects and objects class membership and also by quantifiers. And lastly, the profiles we are interested in are real time computable (finite time), which will never happen for RDFS.

Very quick example: (beware, i'm changing the original semantics of PCDM here, it's just an example, not the intention)

 <owl:Class rdf:about="http://pcdm.org/models#Object">
     <owl:hasKey rdf:parseType="Collection">
       <owl:ObjectProperty rdf:about="hasUUID">
       <owl:ObjectProperty rdf:about="originatingServerURL">
     </owl:hasKey>
     <owl:equivalentClass>
          <owl:Restriction>
              <owl:onProperty rdf:resource="http://pcdm.org/models#hasFile"/>
              <owl:allValuesFrom rdf:resource="http://pcdm.org/models#File"/>
         </owl:Restriction>
         <owl:Restriction>
              <owl:onProperty rdf:resource="http://pcdm.org/models#isMemberOf"/>
              <owl:allValuesFrom rdf:resource="http://pcdm.org/models#Collection"/>
         </owl:Restriction>
     </owl:equivalentClass>
     <owl:disjointWith rdf:resource="http://pcdm.org/models#Collection">
     <rdfs:subClassOf rdf:resource="http://www.openarchives.org/ore/terms/Aggregation"/>
</owl:Class>

basically saying that an pcdm:Object can't be a pcdm:Collection at the same time, that when using
for subject pcdm:Object predicate http://pcdm.org/models#isMemberOf, the Object of the triple can only be a pcdm:Collection, etc. And lastly that an individual of type pcdm:Object can be identified by an uuid + originating server URL mix.

I'm not saying that my example is what PCDM should be, not is this an OWL 2 primer(many out in internet, can suggest some readings if you want) but what i'm saying is that with the correct restrictive, permissive semantic language the data models that could be constructed out of PCDM could be a lot more specific and easily understood(and validated) by other systems, not only Hydra and Islandora by the way.

Hey, you could even create a pcdm:range Construct that does not need to be explicitly defined (nor used), just the class intersection of IIIF definition (predicates or classes) of range + pcdm:Object. OWL 2 is powerful and used.

Question: Does Hydra use Ontology reasoning somewhere? Or any semantic algorithms? maybe not?

http://www.w3.org/TR/owl2-primer/

@tpendragon
Copy link

tpendragon commented Aug 23, 2016

So I don't know anything about OWL at all, and have a couple questions:

A) Is it hard to look at and explain? I assume the same documentation which pointed to RDFS before could point to OWL?

B) Is it hard to maintain? Is writing the first one so complex that it would take weeks or months of work and we wouldn't be able to work on implementation? We haven't changed PCDM in 11 months - if OWL would be so complex that a timeframe like that is unreasonable, then it's a good argument to stay with RDFS.

If the answer to those are both no, and @diego, @whikloj, and @ruebot are willing to put in the time to help develop an OWL ontology, I think we should - many of us seem to have a vested interest in it.

Question: Does Hydra use Ontology reasoning somewhere? Or any semantic algorithms? maybe not?

No. Does Islandora?

@no-reply
Copy link

no-reply commented Aug 23, 2016

@DiegoPino:

True, more complex, but not too complex.

This seems to be the subject of the ticket. Can we be more concrete about the use cases so we can judge the relative complexity cost?

I appreciate that you're trying to be helpful, but it honestly reads as more than a little condescending to suggest that this is an opportunity for introductory reading. It's also dismissive of the concerns.

To be more specific, I'm concerned that:

  • the model theoretic overhead of OWL will make participation and implementation expensive;
  • OWL would increase the cost of producing relatively simple descriptive metadata. I already hear complaints that domain/range have this effect (e.g. in Dublin Core). Many simply ignore it;
  • in general, OWL (at least in certain profiles) is inappropriate for a low level structural model, on which higher level models (which may not be OWL, or even RDF) are built.

@azaroth42
Copy link
Contributor

To put it a different way: In my opinion, people who want the ontology to change to OWL should describe the changes, and justify why they're individually useful in practice. Actual OWL examples rather than theoretical would be valuable.

@DiegoPino
Copy link

@no-reply sorry if it reads condescending, was not the intention, but all i see are concerns about OWL and no justification i can research on(documented somewhere or that helps me understand them)

I appreciate that you're trying to be helpful

reads on this side also condescending. Semantics everywhere 😄

I do feel there is a miss conception of what OWL is (in specific OWL 2, we haven't talked about profiles yet). It is simpler to reason on under certain Profiles, RDFS === OWL FULL == possible to never reach to a "NO" or "YES" under a given semantic proposition is more flexible but not good for machines.

I know you all are very semantic involved, but since i hope other people are reading this also i will paste this link (simple one) which pretty much explains why OWL

http://www.cambridgesemantics.com/semantic-university/rdfs-vs-owl

I'm not a PCDM committer so clearly i can't ask for a re-vision on your no-OWL policy, but i'm a CLAW Committer and i do work extensively with Semantics and graph traversal algorithms since fedora 3 times and this OWL versus RDFS dilema is not just a think of what each group likes more, it is about what will serve better fedora4 community, Hydra, islandora and i hope others, and will help making interoperability easier:

I see concerns but i don't understand the concerns, so here are my questions:

  • the model theoretic overhead of OWL will make participation and implementation expensive;
    Participation in building the Ontology?

How does the Implementation of an rdfs ontology could be less expensive than of an OWL one?

  • OWL would increase the cost of producing relatively simple descriptive metadata. I already hear complaints that domain/range have this effect (e.g. in Dublin Core). Many simply ignore it;

How does OWL interfere with descriptive metadata? OWL works at a different semantic layer, you are still using rdf:type, still can use dc:, etc. If OWL does not imposes a disjoint or a restriction that limits to just a set, then descriptive metadata and even extra rdf:types comming from rdfs classes if needed. Not that i would infer on them of course.

If domain and range already alienate people, then are we really ready to use any restricted model? Maybe we can just let anything happens.

  • in general, OWL (at least in certain profiles) is inappropriate for a low level structural model, on which higher level models (which may not be OWL, or even RDF) are built.

Why is that so? How? LDP is low level, PCDM is structural but no as low level as you would wish. If already talking about "Works" and "membership to abstract concepts" like a collection then it is not low level anymore.

Use case here, i'm pretty sure @acoburn can add some more.

  • Allow our users to define their own valid constructs based on PCDM but combining high level Semantics (PCDM can model a structure that servers for a book, but to say "this is a book" you need another ontology). Which leads to be able to traverse the PCDM Ontology, infer over it, ask if certain axioms validate, using restrictions on properties, etc, in realtime. Same goes of IIIF, etc.
  • To be able to read a structure built by Hydra on Islandora without having to create a PHP class that implements the same properties/methods that Hydra does. If PCDM allows any construct and the Ontology itself does not imposes restrictions, then mapping gets complex - impossible. If PCDM defines a finite, simple sets of models (thats is what an ontology is, the domain of all models it is able to define, right now almost infinite) then we can do this.
  • To be able to make inferences, inverse properties, graph traversal with precomputed paths
  • Graph validation: does that graph coming into my Microservice (POST?) complies with PCDM?

I don't feel this is something we can push much longer as Islandora Community without feeling that we need to justify ourselves constantly when we do know, to certain extend of course, why we are proposing and talking about this stuff. I understand that there are concerns of course and i'm trying to understand them.

@cmharlow
Copy link

Heya! Sorry, complete outsider questions / thoughts here. I don't really know where I fall now. So, indulge me for a second, I'm writing up here how I understand the question at hand as well as the possible answers, and the pros/questions of each answer. Please correct me if I'm wrong - and I'm sorry if I misinterpret something!

q: Should PCDM use some flavor of OWL or stick with RDFS?

possible answers:

  1. Stick with RDFS.
    • pros: lower barrier to community engagement in understanding, expanding + adopting PCDM because RDFS is "simpler"; it is currently used; it lowers the bar of implementation bc it is "simpler", but 'easier' implementation is then scoped to the currently existing, limited expectations of RDFS; it is meant to be a low-level model to then build upon.
    • my own questions:
      • Do we have any documentation from the origins of PCDM that detail either the original community goals (including what I'm hearing as simplicity for the sake of broader community engagement or low level modeling) and maybe even earlier discussions on RDFS versus other (OWL?)?
      • I.e., do we have the discussions that led to RDFS to begin with? Being someone who wasn't there at the origins of PCDM (granted I'm rarely at the origin of anything), I'd hate to second guess at the reasoning why. These sort of notes/docs might help clarify and avoid any sort of chicken and egg issues around 'RDFS bc its all we need' 'all we need is RDFS' type understandings.
  2. Stick with RDFS, but include some OWL assertions as needed.
    • pros: includes all the pros of 1., but adds extra specificity for PCDM resources as needed.
    • my own questions:
      • does the use of OWL assertions as needed in RDFS support in any way the pros of OWL without declaring OWL? Say, can you perform inference on RDFS ontologies that then have OWL components?
      • Can you perform validation? Other sort of RDF Logic rules like SWRL? I genuinely don't know and would be curious to find out.
  3. Go with OWL (flavor TBD but likely OWL2 from the discussion so far - is this a safe assumption?)
    • pros: you gain all of the possibilities of OWL 2 for more granular control, understanding and manipulation of your data, such as inferencing and validation; it helps the community by supporting conformance to the understandings of PCDM (so one won't just read what an Object is, but have to follow some usage rules defined by the community as to how Objects behave).
    • my own questions:
      • I am curious what specifications we'd want to immediately using an OWL version of PCDM for - what is so granular that we want to capture it now in the core PCDM ontology, versus what is a data goal we are aiming for?
      • As a data person, I love the idea of some kind of validation, but I also wonder how much of that I'd want to be in some sort of validated application profile in the application layer. I torn on this. Is this validation referred to - is this SHACL? If so, is SHACL in any state to be used now if you're not a W3C working group member with an instance of TopBraid?
      • I also wonder about the distinct possibility that more complex definitions (even if not really that complex) will lead to more people ignoring those definitions, i.e., more bad data. And if we take bad data into consideration for these discussions.

I'm also wondering, rather generally, about any directions we're hoping to aim for with PCDM, as this discussion has hit quite a bit on the reasons for and why PCDM. Directions wouldn't just be immediate goals, but also long term directions. Also, do we have any PCDM users who are not using Fedora? Was NYPL in that camp? I'd be very curious to hear their thoughts.

Thank you everyone for your comments - I appreciate them! I always a learn a lot from all of you, and I'm sorry if my above comments/questions miss any points, or have silly questions.

@scossu
Copy link

scossu commented Aug 24, 2016

There are several threads on the API-X topic (here is one of them) about how using OWL to validate data is a bad idea. There are some specific vocabularies such as SHACL to express that.

I can appreciate the huge expressive potential of OWL but I am wary of the barrier it would set. Our Blazegraph instance is already starting to slow down on 15M triples with inference on, even without any OWL axioms loaded.

I second @no-reply about gathering use cases and I think that adopting RDFS (or even RDFS+ as @azaroth42 suggests, if needed) would leave room for plenty of expressiveness. Stepping up to OWL at a later time if needed, or even better, letting an individual adopter add their own OWL ontology on top of PCDM, is logically straightfoward; the opposite is much harder.

Also, I recall PCDM being originally conceived to express basic structural relationships between resources and to be a "minimum common denominator" for richer ontologies that can be overlaid on top of it. I believe that a very expressive ontology may lead into maintenance headaches, development slow-downs and the temptation to use PCDM for what is not meant to do.

@escowles
Copy link
Contributor Author

The main things I've heard as use cases for OWL are:

  • declaring pcdm:hasMember/pcdm:memberOf as inverse predicates
  • declaring pcdm:Object/pcdm:Collection/pcdm:File as disjoint classes

I hadn't heard of RDFS+ before, but it sounds like we could layer those kinds of OWL statements on top of the RDFS ontology. That seems really attractive to me, but I really don't know enough about RDFS+ to know if that's practical or useful.

@escowles
Copy link
Contributor Author

@cmh2166: I think we went with RDFS mostly because it's simpler and covers the use cases we had in mind when we started drafting PCDM. I don't think there's any good documentation of that decision, but I have always thought of the second paragraph of the main PCDM document (now the wiki home page) to be a call to favor simplicity, and to justify complexity when it's needed:

To encourage adoption, this model must support the most complex use cases, which include rich hierarchies of inter-related collections and works, but also elegantly support the simplest use cases, such as a single user-contributed file with a few fields of metadata. It must provide a compact interface that tool developers can easily implement, but also be extensible enough for adopters to customize to their local needs.

That said, to @tpendragon 's question, I don't think OWL has to be any harder to read or understand than RDFS. Fedora's core ontology is an example of a mostly simple OWL ontology. However, notice that it does let you do some interesting things that wouldn't map easily to RDFS.

@cmharlow
Copy link

cmharlow commented Aug 24, 2016

Thanks @escowles - I was sure the decision docs were there, I was just trying to lay this question out methodically for my own understanding. :-/

Thanks @scossu for the additional links and bringing up the issue of performance. Also I really appreciate/agree with this statement:

Stepping up to OWL at a later time if needed, or even better, letting an individual adopter add their own OWL ontology on top of PCDM, is logically straightfoward; the opposite is much harder.

So the state of this question as I understand it:

Possible options:

Stick with RDFS.
pros:

  • lower barrier to community engagement in understanding, expanding + adopting PCDM because RDFS is simpler (fewer classes, predicates; less possible other logic to add);
  • it is currently used as it fit the original use cases for PCDM;
  • it lowers the bar of implementation bc it is simpler, meaning more efficient / possibly more performant with large datasets.
  • it is meant to be a low-level model to then build upon.

RDFS+ i.e. include some OWL assertions as needed
pros:

  • includes all the pros of 1 (I think? Would it still be as performant?)
  • adds extra specificity for PCDM resources as needed.
  • needs to be confirmed how helpful this is, i.e., what is added other than the ability to use some more granular statements (i.e. inverses, disjointness)

Go with OWL (flavor TBD but likely OWL2?)
pros:

  • you gain all of the possibilities of OWL 2 for more granular control, understanding and manipulation of your data, such as inferencing and validation;
  • it helps the community by supporting conformance to the understandings of PCDM including restrictions.

The given (so far) Use Cases (term used loosely) for something more than RDFS:

  • declare inverses of properties in PCDM
    • examples: pcdm:hasMember/pcdm:memberOf
    • could do this in options 2 or 3
  • declare disjointness of core Classes
    • examples: make pcdm:Collection, pcdm:Object, pcdm:File disjoint
    • could do this in options 3 only
  • allow users to define their own valid constructs based on PCDM but with other semantics
    • examples: Creating a profile for a book - do we have more about this example? It might help discuss whether validity checks to this degree happen in the ontology or elsewhere (SHACL, in the application, etc.)
    • could do this ... only in option 3? Becomes question of using OWL to validate or other logic languages, per @scossu's shared other discussion.
  • to capture community-shared restrictions/behaviors in the model instead of in the applications using the data
    • examples: Islandora won't need to recreate in PHP some of the Hydra Ruby code for implementing shared understandings of the data. That's how I'm reading this. I'm sure there are, but do we have examples of this, behaviors that apply to the model, are expected by anyone using PCDM, but are expressed in the Hydra code?
    • could be 2 or 3, depending on what exactly these are, which I think goes to the call for more examples/use cases.
  • better (?) inferencing
    • at a performance cost
    • example: what would we want to get specifically from OWL2 inferencing if we could implement OWL tomorrow?
    • only 3?

@cmharlow
Copy link

I'm happy to break out that long last comment in a Google doc (or other) for the further collection, analysis, and discussion of reasons, use cases, etc. It can then be reported back to this thread, along with other action items/areas of work being surfaced.

@whikloj
Copy link

whikloj commented Aug 24, 2016

I think my concern is that we are using a barebones ontology of 2 classes (I'm not counting the 5 sub-classes of pcdm:Object because they have no semantic difference) and leaving the actually data modeling to be done at the application layer.

It would seem that what I like about OWL is the ability to force a more full semantic data model including restrictions and inferences.

I apologize, because this is clearly something I should have been more involved/aware of earlier in the design of PCDM.

My concern is that implementor will stick the 2 lego blocks together based on their interpretation of the documentation.

@cmharlow
Copy link

leaving the actually data modeling to be done at the application layer.

Yeah, I'm wondering about this too - not in a what did you mean kinda way, since I tend to agree with this, but more wondering how do we tease these modeling decisions out, figure out what modeling needs to be put back in the PCDM ontology, and from that decide if it warrants a flavor of OWL or RDFS+ or RDF. Kinda that second to last bullet point from my list above:

to capture community-shared restrictions/behaviors in the model instead of in the applications using the data

I'm still wrangling in my head what this work would look like, but it looks like an issue that has surfaced in this discussion that's worth pursuing/investigating/reporting back. It could be the type of 'use casing' needed to determine the new needs of the PCDM ontology vis-a-vis modeling languages, other logic languages + their use supporting by the community, etc.

If I understood you correctly - and I apologize if I didn't!

@escowles
Copy link
Contributor Author

@whikloj : I think this is really unhelpful:

a barebones ontology of 2 classes (I'm not counting the 5 sub-classes of pcdm:Object because they have no semantic difference) and leaving the actually data modeling to be done at the application layer

  • you are, at a minimum, forgetting one of Collection/Object/File
  • the subclasses of pcdm:Object differ in their definitions — if types don't mean anything, what are we even talking about here?
  • as someone arguing against using and/or requiring FileSet over in Add a FileSet class #59, I find it a little rich that you're making this kind of swipe over here.

@whikloj
Copy link

whikloj commented Aug 24, 2016

@escowles I did not mean to swipe, so I apologize.

What I mean is that pcdm:Collection and pcdm:Object are both sub-classes of ore:Aggregation (where pcdm:File is a new class) with just documentation to differentiate them.

The name of the class is only important if you have put all the data modelling into your application to know that the difference between a pcdm:Object and a pcdm:Work in practice is X.

I know that people can document how they will implement their objects in PCDM, but for interoperability's sake. I kinda thought it would be neat to have a common definition outlined in the ontology, that seems like the best place for it.

@escowles
Copy link
Contributor Author

@whikloj: OK, but when I look at the ontology, and the changes proposed for adding FileSet and changing predicates, I see the structure outlined: Collections have member Collections and Objects. Objects have FileSets and member Objects. FileSets have Files. The ontology includes definitions of what all of those classes and predicates mean.

So I'm interested in adding more to the ontology (like inverse predicates, or disjoint classes, etc.), but I think the ontology already includes the essence of the model already.

@whikloj
Copy link

whikloj commented Aug 24, 2016

@escowles I agree that I can see it, but I'm not confident that what I am seeing is the same as what you are seeing. (Mostly 'cause I'm starting to see double).

So I think then if we could add some of these statements to the ontology I would be happier.

A lot of them could be expressed in RDFS, but some (I think) need OWL.

For pcdm:Collection, setting the range of pcdm:hasMember to both pcdm:Object and pcdm:Collection.

I think that requires a slightly more than RDFS allows. But if I am wrong, then maybe we can look to add these common understandings to the ontology.

@cmharlow
Copy link

So do we want to start by reviewing what is in PCDM core already and discuss adding:

  • domains, ranges
  • inverses
  • disjointness

and see if that's a starting place for ontology change requests, testing the RDFS+ / OWL needs (or not)?

@cmharlow
Copy link

@azaroth42 would asserting the range of hasMember to be a Union be something that could work in what you proposed: RDFS with some OWL added?

@DiegoPino
Copy link

@whikloj you are not wrong. OWL is need to define ranges as intersection, union, disjoints, etc.
And that means owl:classes, object properties, etc.

@DiegoPino
Copy link

DiegoPino commented Aug 24, 2016

@cmh2166

RDFS with some OWL added?

it's not possible:
it's in OWL space:

<owl:equivalentClass>
            <owl:Class>
                <owl:unionOf ....

@cmharlow
Copy link

@DiegoPino yeah, just trying to work through the proposed options on the table.

@cmharlow
Copy link

so this leads to this use case: do we want to define the range of hasMember, if so, do we agree the range of the union of Collection u Object, and if so, is this important enough to warrant a switch to OWL (or do we need to gather more, similar requests leading to OWL)?

@cmharlow
Copy link

Reading through this, I'm not opposed to some flavor of OWL. But I am someone who thinks we should clarify the model changes requested, then choose the appropriate modeling language based on those requests. I'm also a big fan of methodical reviews when dealing with questions like this where there is a lot of community input, multiple work narratives leading to this point, and really strongly held opinions.

To that end, I've written up my notes so far into an Options / Modeling Cases Not Currently Covered doc: http://bit.ly/2bRCly8 Feel free to read, edit, update, use, not use, ignore, whatever. I'm hoping it can crystallize (maybe just for me) some of the model change requests that have surfaced in this discussion. In particular, I do favor:

  1. Perhaps a staged move to OWL as we go through the modeling needs and requests;
  2. A review of any modeling assumptions that are lurking in application code or previous discussions that should be pushed to the model;
  3. Some discussion about what level of interoperability we're hoping to achieve with this model between systems (builds off 2).

I don't know if this is how PCDM tends to work through these questions though, and I apologize if this is counter the method/procedures for discussion leading to decisions/actions/etc.

@cmharlow
Copy link

IMHO, I think the crux here is complex is in the eye of the beholder.

That said, if we think the model needs restrictions, lets outline them, agree, and then see what complexity is needed to support that.

@acoburn
Copy link
Contributor

acoburn commented Aug 25, 2016

Just to make this conversation more concrete, here is an example of PCDM-as-OWL that I wrote a few months ago for code I was working on. It doesn't have all of the bells and whistles that @DiegoPino describes, but it does use some simple OWL constructs: https://gitlab.amherst.edu/acdc/repository-extension-services/blob/master/acrepo-services-pcdm/src/main/resources/pcdm.owl

@cmharlow
Copy link

Thanks, @acoburn ! I see the only restrictions your work add (I think) that aren't in the RDFS ontology are property inverses (which seems to be something that most people support, but could possibly be handled with RDFS+).

This will be helpful for determining what seems to be the core case here of "community accessibility", as we've now got almost exact copy of the models in two different modeling languages.

@DiegoPino
Copy link

@acoburn nice job. I'm working on adding some restrictions, to avoid pcdm:Object hasMember pcdm:Collection or pcdm:Object hasMember pcdm:Object, which was always complex to follow!

Thanks really

@ruebot
Copy link
Contributor

ruebot commented Aug 25, 2016

Summary of the rationale for OWL from the Islandora perspective:

  • Restrictions should not be in documentation. They should be in the ontology itself.
  • There should be as few restrictions as possible.
  • This means OWL.

@ajs6f
Copy link

ajs6f commented Aug 25, 2016

Without taking sides over this, I want to note that the undecidable nature of RDFS is of debatable practical effect:

http://www.sciencedirect.com/science/article/pii/S1570826805000144
http://ceur-ws.org/Vol-1035/iswc2013_poster_14.pdf

As usual, blank nodes screw everything up. Stupid blank nodes.

@barmintor
Copy link
Contributor

Maybe a tangent, but: If interoperability is a goal, it might be good to ask how successfully a client armed with an OWL ontology can interact with a repository containing only RDFS, and vice-versa. If RDFS is effectively a subset of OWL, it might be worth publishing both and remanding the cost/benefit analysis to the installation sites, right?

@ruebot
Copy link
Contributor

ruebot commented Aug 25, 2016

@barmintor we need to define the level of interoperability. Could it just be publishing a "PCDM compliant" named graph?

@cmharlow
Copy link

hey @ruebot -

I guess what my previous comments were getting towards is that restrictions probably mean OWL, but we should figure out what the restrictions are that we are proposing/in need of.

Armed with that, we can decide OWL versus other (and even if we go with OWL, we still need to do this work to figure out which flavor of OWL, which is still entirely open).

@DiegoPino
Copy link

DiegoPino commented Aug 25, 2016

@barmintor, true, level of interoperability is the question. Just not sure about

RDFS is effectively a subset of OWL

and RDFS and OWL are in different semantic layers. An RDFS Ontology is compatible/is as undecidable( @ajs6f sorry, still not convinced by that paper but i have to read it completely) with an OWL Full Ontology, that is for sure true. And OWL 2 does define a cross domain profile.

@cmh2166 the flavor mostly depends on the complexity, i don't think we need too many fancy owl constructs, just the some of basic ones mabye, i would propose OWL 2 DL, but i could really need revisiting this.

@cmharlow
Copy link

cmharlow commented Aug 25, 2016

@cmh2166 the flavor mostly depends on the complexity, i don't think we need too many fancy owl constructs, just the some of basic ones mabye, i would propose OWL 2 DL, but i could really need revisiting this.

Yes, I'm saying, lets start evaluating which ones*. This is what can help us move forward, looking at these restrictions + cases then figuring out the modeling language (and flavor), not choosing a modeling language first.

*which constructs, which will be based on which restrictions + cases we are looking to add.

@no-reply
Copy link

If RDFS is effectively a subset of OWL, it might be worth publishing both and remanding the cost/benefit analysis to the installation sites, right?

A big part of my concern is that RDFS is decidedly not a subset of OWL. In an important sense (in terms of the Resources in their respective universes) it's a superset.

OWL partitions it's universe into individuals, properties and classes (the details of the partitioning depend on which OWL profile is being used). This means that many perfectly reasonable RDFS graphs are nonsense in OWL.

My question is whether this, and other ontological commitments of OWL, are acceptable. I believe we should take care to minimize their impact.

@DiegoPino
Copy link

@ajs6f, i had a chance to read that: so what it says is:
"completeness and decidability of entailment for RDFS is possible if extended with datatypes and a property-related subset of OWL.", which is one of the reasons OWL 2 was formulated right?

@DiegoPino
Copy link

A question here: when we are talking about clients, do we mean clients that interact with fedora?

If so they interact with RDF resources, not RDFS. The result of an OWL Ontology (individuals that result from the models it allows) are also RDF resources. The lack of S at the end is because right now, at least on our project we don't have a client that is interacting with RDFS because the repo contains RDF (we are not storing the Ontology in Fedora) and will be true also for OWL. We do plan to have OWL armed clients that will have no problem interacting with RDF repo data.

@whikloj
Copy link

whikloj commented Aug 25, 2016

@no-reply can you explain that partitioning stuff. I'm not sure I understand how that might look. I'm still learning this semantic stuff. 😄

@no-reply
Copy link

no-reply commented Aug 25, 2016

I haven't had time to keep up with this fast moving thread properly, but I want to address a couple of specific questions & points from @DiegoPino's earlier reply to me:

How does the Implementation of an rdfs ontology could be less expensive than of an OWL one?

In a bare way, OWL's model is more complex.

How does OWL interfere with descriptive metadata?

OWL's division of resources into disjoint groups (and further division of properties) affects the semantics of the resources described and the kinds of relations that are allowed between them. Introducing formal restrictions at the ontology layer, likewise.

I'm not against these commitments, but I'd like to justify them on a more or less case by case basis. I frequently field questions about whether a given graph is satisfiable or admits the intended interpretation (the questions are never phrased this way). When the answer is no, people generally want to know, concretely, what they gain by making a change.

These changes have costs: objects in their application code, nodes in their instance graphs, internal metadata work, discussion, and training. Which speaks to:

If domain and range already alienate people, then are we really ready to use any restricted model? Maybe we can just let anything happens.

I don't think this is remotely fair, and it reflects precisely my trepidation about jumping into OWL for this model without deep discussion. It's not about being ready: our adopters already use models of varying complexity and restrictiveness. What alienates them is seemingly arbitrary restrictions that make their work harder.


Why is that so? How? LDP is low level, PCDM is structural but no as low level as you would wish. If already talking about "Works" and "membership to abstract concepts" like a collection then it is not low level anymore.

In calling PCDM "low level", I mean in the context of the domain. This is, as I understand it, the unifying goal of the project: to provide a base model appropriate for the repository/DAMS domain. I think it's possible and desirable to do this with fairly generic conceptual modeling patterns that can be implemented in many different ways.

(Note, also, that we went far out of our way to avoid talking about "Works", or anything as opinionated as that. Where "work" is used in comments, it is explicitly shorthand for "an intellectual entity".)

Bringing in OWL's more complex (than RDF/RDFS D semantics) model theoretic semantics, has the potential to limit interoperability along this axis. (I would argue it necessarily limits it, and the question is more about tradeoffs associated with specific assertions).

I'm not against the goals of a machine readable model---far from it---and I'm not trying to block specific uses of OWL. I just want to be thoughtful about the decisions we make in this base model and the ontological commitments they enforce on downstream users.

@ajs6f
Copy link

ajs6f commented Aug 25, 2016

"completeness and decidability of entailment for RDFS is possible if extended with datatypes and a property-related subset of OWL.", which is one of the reasons OWL 2 was formulated right?

@DiegoPino : To my knowledge, OWL 2 was formulated to give folks the ability to choose precisely for the computational complexity of various tasks, so if that's what you mean (not really sure), then sure, I guess.

@azaroth42
Copy link
Contributor

Instead of this emacs vs vim argument, can we instead create new issues for what is currently missing by using RDFS only?

@no-reply
Copy link

@no-reply can you explain that partitioning stuff. I'm not sure I understand how that might look. I'm still learning this semantic stuff. 😄

Both RDF(S) and OWL have an underlying model theoretic semantics. The purpose of these semantics, loosely speaking, is to map expressions in their respective languages to "interpretations" which can be understood to be about real world objects & states of affairs. Again loosely, the interpretations can be true or false, and the expressions (graphs, in the case of RDF) can be "satisfied" or not by a given interpretation.

RDF's semantics posit the existence of one kind of object: a "resource". RDFS adds some facility for us to sort those objects up into different classes and reason about them, but the rules introduced still leave us talking about one kind of generic real world thing.

OWL's semantics do not posit the existence of resources, instead talking about classes, individuals, datatype properties, and object properties. Each of these groups is non-overlapping. There's some magic to allow for "punning"---using one name to refer to different objects, in these different sets.

As an example of potential fallout: declaring a property to be an object property can mean that an OWL interpretation that satisfies a graph must assume the existence of two or more real world things where an RDF interpretation must assume only one.

@whikloj
Copy link

whikloj commented Aug 25, 2016

@no-reply thanks, I'll probably have to read that about 20 more times to fully understand it, but I get your initial point about RDFS being sort of a superset. It is because everything is more... not sure the word... descriptive/well defined/granular. Yes?

@azaroth42 I think we can open a new ticket, I think @cmh2166 has been sort of starting that work.

But I also think this is not an emacs vs vim argument. There are things that you can do in OWL you can't do in RDFS.

So perhaps we can hold this ticket to see what the outcome of the "whats missing in RDFS".

@no-reply
Copy link

but I get your initial point about RDFS being sort of a superset. It is because everything is more... not sure the word... descriptive/well defined/granular. Yes?

It's sort of counter-intuitive. Because RDFS is less descriptive, its universe includes more things---specifically, all the things which OWL's universe excludes because they won't support the kind of reasoning desired (those things which are both instances and classes, etc...).

But I also think this is not an emacs vs vim argument. There are things that you can do in OWL you can't do in RDFS.

Agreed.

On the other hand, this is cutting a little close to the kind of holy war Rob refers to. Apologies for my part in that.

@DiegoPino:

reads on this side also condescending.

Sorry. My intention was to assume good faith. Instead I got snippy. Thanks for your continued efforts, here.

@cmharlow
Copy link

Glad to see we are breaking this out into new tickets with specific restrictions / what we are not doing in rdfs now that we'll want to add (and using that to guide, or not, to OWL or other). Thanks all.

I'll keep my above doc up to date for those who care/want a cheat sheet of restrictions mentioned so far, but this is what I have somewhere brought up in discussions, notes or other:

  • Property Inverses in PCDM - now a ticket
  • Disjointness of Core Classes in PCDM
  • Property Domain/Ranges in PCDM (specifically, using Unions of Classes to explicitly say PCDM:Collection u PCDM:Object instead of just ore:Aggregation)
  • Allow users to define their own valid constructs - gets into if one should use OWL anyway to validate or external logic languages, technologies, other
  • Capture Model Restrictions/Behaviors in the PCDM Ontology instead of the Applications, which I think is a higher level use case requiring further analysis of the PCDM docs, the Hydra docs, and applications to break out what is captured there that should be in the model
  • Improved inferencing generally (pulled from comments like 'To be able to make inferences, inverse properties, graph traversal with precomputed paths' which also covers a bit of the above)

Happy to collect/discuss others proposed restrictions, action items, etc.

@ruebot
Copy link
Contributor

ruebot commented Aug 25, 2016

@cmh2166 thanks for all this Christina!

@cmharlow
Copy link

@ruebot it beats dealing with bibframe. ;-)

@azaroth42
Copy link
Contributor

Propose closing the issue.

@ruebot
Copy link
Contributor

ruebot commented Sep 1, 2016

@azaroth42 do you have a proposed conclusion to close it with? Or just close it for now, and re-open it later if need be?

@azaroth42
Copy link
Contributor

I propose closing because it's not an issue that we can resolve without use cases and more explicit requirements, hence the holy war nature of the discussion. We've agreed above to file new issues with actual requirements and proposed modifications, and hence this issue is unnecessary.

@cmharlow
Copy link

cmharlow commented Sep 2, 2016

I think we've gathered the cases we can from this and can proceed based off those (and other, more specific cases as they surface in other discussions). So I'm good with closing this issue.

@DiegoPino
Copy link

@cmh2166 thank you very much for your work on this. I took some distance of this thread, mainly because i was told i was bringing too much more passion into the discussion and that was not helping too much. I highly appreciate your documentation and also the way you acted as "sanitising middleware" human-being here. 👍 thanks again

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests