-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stick with RDFS or switch to OWL #61
Comments
EDITED |
I think that the goal from my perspective with PCDM is interoperability. Because we are looking at multiple very different applications, I feel like the goal is less about RDFS vs OWL and is more about adding some description and (dare I say) restrictions to the ontology. This would make it much easier for people implementing PCDM to follow the data model. Rather than checking an application's/group's/community's implementation documentation (which could change). But this is just my opinion. |
@whikloj true, but OWL 2 (also a matter of which profile we choose) has the necessary expressiveness to add restrictions and other semantic constructs +opening inference capabilities. RDFS simply not (being domain and range not restrictions just to clarify) |
I share the concerns expressed by @escowles in #53:
I'm interested in hearing OWL use cases, to better understand the motivation behind the suggested changes and how they might impact existing RDF(S) uses. |
I also am skeptical of the benefits. I would be happy with RDFS + minimal basic OWL such as owl:inverseOf and other individually justified features. |
About those concerns @no-reply
True, more complex, but not too complex. skeptical is good, leads to further reading
Not true. Does help validate a lot, more that just 'range' and 'domain', allows you to infer new classes based on Very quick example: (beware, i'm changing the original semantics of PCDM here, it's just an example, not the intention)
basically saying that an pcdm:Object can't be a pcdm:Collection at the same time, that when using I'm not saying that my example is what PCDM should be, not is this an OWL 2 primer(many out in internet, can suggest some readings if you want) but what i'm saying is that with the correct restrictive, permissive semantic language the data models that could be constructed out of PCDM could be a lot more specific and easily understood(and validated) by other systems, not only Hydra and Islandora by the way. Hey, you could even create a Question: Does Hydra use Ontology reasoning somewhere? Or any semantic algorithms? maybe not? |
So I don't know anything about OWL at all, and have a couple questions: A) Is it hard to look at and explain? I assume the same documentation which pointed to RDFS before could point to OWL? B) Is it hard to maintain? Is writing the first one so complex that it would take weeks or months of work and we wouldn't be able to work on implementation? We haven't changed PCDM in 11 months - if OWL would be so complex that a timeframe like that is unreasonable, then it's a good argument to stay with RDFS. If the answer to those are both no, and @diego, @whikloj, and @ruebot are willing to put in the time to help develop an OWL ontology, I think we should - many of us seem to have a vested interest in it.
No. Does Islandora? |
This seems to be the subject of the ticket. Can we be more concrete about the use cases so we can judge the relative complexity cost? I appreciate that you're trying to be helpful, but it honestly reads as more than a little condescending to suggest that this is an opportunity for introductory reading. It's also dismissive of the concerns. To be more specific, I'm concerned that:
|
To put it a different way: In my opinion, people who want the ontology to change to OWL should describe the changes, and justify why they're individually useful in practice. Actual OWL examples rather than theoretical would be valuable. |
@no-reply sorry if it reads condescending, was not the intention, but all i see are concerns about OWL and no justification i can research on(documented somewhere or that helps me understand them)
reads on this side also condescending. Semantics everywhere 😄 I do feel there is a miss conception of what OWL is (in specific OWL 2, we haven't talked about profiles yet). It is simpler to reason on under certain Profiles, RDFS === OWL FULL == possible to never reach to a "NO" or "YES" under a given semantic proposition is more flexible but not good for machines. I know you all are very semantic involved, but since i hope other people are reading this also i will paste this link (simple one) which pretty much explains why OWL http://www.cambridgesemantics.com/semantic-university/rdfs-vs-owl I'm not a PCDM committer so clearly i can't ask for a re-vision on your no-OWL policy, but i'm a CLAW Committer and i do work extensively with Semantics and graph traversal algorithms since fedora 3 times and this OWL versus RDFS dilema is not just a think of what each group likes more, it is about what will serve better fedora4 community, Hydra, islandora and i hope others, and will help making interoperability easier: I see concerns but i don't understand the concerns, so here are my questions:
How does the Implementation of an
How does OWL interfere with descriptive metadata? OWL works at a different semantic layer, you are still using rdf:type, still can use dc:, etc. If OWL does not imposes a disjoint or a restriction that limits to just a set, then descriptive metadata and even extra rdf:types comming from rdfs classes if needed. Not that i would infer on them of course. If domain and range already alienate people, then are we really ready to use any restricted model? Maybe we can just let anything happens.
Why is that so? How? LDP is low level, PCDM is structural but no as low level as you would wish. If already talking about "Works" and "membership to abstract concepts" like a collection then it is not low level anymore. Use case here, i'm pretty sure @acoburn can add some more.
I don't feel this is something we can push much longer as Islandora Community without feeling that we need to justify ourselves constantly when we do know, to certain extend of course, why we are proposing and talking about this stuff. I understand that there are concerns of course and i'm trying to understand them. |
Heya! Sorry, complete outsider questions / thoughts here. I don't really know where I fall now. So, indulge me for a second, I'm writing up here how I understand the question at hand as well as the possible answers, and the pros/questions of each answer. Please correct me if I'm wrong - and I'm sorry if I misinterpret something! q: Should PCDM use some flavor of OWL or stick with RDFS? possible answers:
I'm also wondering, rather generally, about any directions we're hoping to aim for with PCDM, as this discussion has hit quite a bit on the reasons for and why PCDM. Directions wouldn't just be immediate goals, but also long term directions. Also, do we have any PCDM users who are not using Fedora? Was NYPL in that camp? I'd be very curious to hear their thoughts. Thank you everyone for your comments - I appreciate them! I always a learn a lot from all of you, and I'm sorry if my above comments/questions miss any points, or have silly questions. |
There are several threads on the API-X topic (here is one of them) about how using OWL to validate data is a bad idea. There are some specific vocabularies such as SHACL to express that. I can appreciate the huge expressive potential of OWL but I am wary of the barrier it would set. Our Blazegraph instance is already starting to slow down on 15M triples with inference on, even without any OWL axioms loaded. I second @no-reply about gathering use cases and I think that adopting RDFS (or even RDFS+ as @azaroth42 suggests, if needed) would leave room for plenty of expressiveness. Stepping up to OWL at a later time if needed, or even better, letting an individual adopter add their own OWL ontology on top of PCDM, is logically straightfoward; the opposite is much harder. Also, I recall PCDM being originally conceived to express basic structural relationships between resources and to be a "minimum common denominator" for richer ontologies that can be overlaid on top of it. I believe that a very expressive ontology may lead into maintenance headaches, development slow-downs and the temptation to use PCDM for what is not meant to do. |
The main things I've heard as use cases for OWL are:
I hadn't heard of RDFS+ before, but it sounds like we could layer those kinds of OWL statements on top of the RDFS ontology. That seems really attractive to me, but I really don't know enough about RDFS+ to know if that's practical or useful. |
@cmh2166: I think we went with RDFS mostly because it's simpler and covers the use cases we had in mind when we started drafting PCDM. I don't think there's any good documentation of that decision, but I have always thought of the second paragraph of the main PCDM document (now the wiki home page) to be a call to favor simplicity, and to justify complexity when it's needed:
That said, to @tpendragon 's question, I don't think OWL has to be any harder to read or understand than RDFS. Fedora's core ontology is an example of a mostly simple OWL ontology. However, notice that it does let you do some interesting things that wouldn't map easily to RDFS. |
Thanks @escowles - I was sure the decision docs were there, I was just trying to lay this question out methodically for my own understanding. :-/ Thanks @scossu for the additional links and bringing up the issue of performance. Also I really appreciate/agree with this statement:
So the state of this question as I understand it: Possible options: Stick with RDFS.
RDFS+ i.e. include some OWL assertions as needed
Go with OWL (flavor TBD but likely OWL2?)
The given (so far) Use Cases (term used loosely) for something more than RDFS:
|
I'm happy to break out that long last comment in a Google doc (or other) for the further collection, analysis, and discussion of reasons, use cases, etc. It can then be reported back to this thread, along with other action items/areas of work being surfaced. |
I think my concern is that we are using a barebones ontology of 2 classes (I'm not counting the 5 sub-classes of pcdm:Object because they have no semantic difference) and leaving the actually data modeling to be done at the application layer. It would seem that what I like about OWL is the ability to force a more full semantic data model including restrictions and inferences. I apologize, because this is clearly something I should have been more involved/aware of earlier in the design of PCDM. My concern is that implementor will stick the 2 lego blocks together based on their interpretation of the documentation. |
Yeah, I'm wondering about this too - not in a what did you mean kinda way, since I tend to agree with this, but more wondering how do we tease these modeling decisions out, figure out what modeling needs to be put back in the PCDM ontology, and from that decide if it warrants a flavor of OWL or RDFS+ or RDF. Kinda that second to last bullet point from my list above: to capture community-shared restrictions/behaviors in the model instead of in the applications using the data I'm still wrangling in my head what this work would look like, but it looks like an issue that has surfaced in this discussion that's worth pursuing/investigating/reporting back. It could be the type of 'use casing' needed to determine the new needs of the PCDM ontology vis-a-vis modeling languages, other logic languages + their use supporting by the community, etc. If I understood you correctly - and I apologize if I didn't! |
@whikloj : I think this is really unhelpful:
|
@escowles I did not mean to swipe, so I apologize. What I mean is that pcdm:Collection and pcdm:Object are both sub-classes of ore:Aggregation (where pcdm:File is a new class) with just documentation to differentiate them. The name of the class is only important if you have put all the data modelling into your application to know that the difference between a pcdm:Object and a pcdm:Work in practice is X. I know that people can document how they will implement their objects in PCDM, but for interoperability's sake. I kinda thought it would be neat to have a common definition outlined in the ontology, that seems like the best place for it. |
@whikloj: OK, but when I look at the ontology, and the changes proposed for adding FileSet and changing predicates, I see the structure outlined: Collections have member Collections and Objects. Objects have FileSets and member Objects. FileSets have Files. The ontology includes definitions of what all of those classes and predicates mean. So I'm interested in adding more to the ontology (like inverse predicates, or disjoint classes, etc.), but I think the ontology already includes the essence of the model already. |
@escowles I agree that I can see it, but I'm not confident that what I am seeing is the same as what you are seeing. (Mostly 'cause I'm starting to see double). So I think then if we could add some of these statements to the ontology I would be happier. A lot of them could be expressed in RDFS, but some (I think) need OWL. For pcdm:Collection, setting the range of pcdm:hasMember to both pcdm:Object and pcdm:Collection. I think that requires a slightly more than RDFS allows. But if I am wrong, then maybe we can look to add these common understandings to the ontology. |
So do we want to start by reviewing what is in PCDM core already and discuss adding:
and see if that's a starting place for ontology change requests, testing the RDFS+ / OWL needs (or not)? |
@azaroth42 would asserting the range of hasMember to be a Union be something that could work in what you proposed: RDFS with some OWL added? |
@whikloj you are not wrong. OWL is need to define ranges as intersection, union, disjoints, etc. |
@cmh2166
it's not possible:
|
@DiegoPino yeah, just trying to work through the proposed options on the table. |
so this leads to this use case: do we want to define the range of hasMember, if so, do we agree the range of the union of Collection u Object, and if so, is this important enough to warrant a switch to OWL (or do we need to gather more, similar requests leading to OWL)? |
Reading through this, I'm not opposed to some flavor of OWL. But I am someone who thinks we should clarify the model changes requested, then choose the appropriate modeling language based on those requests. I'm also a big fan of methodical reviews when dealing with questions like this where there is a lot of community input, multiple work narratives leading to this point, and really strongly held opinions. To that end, I've written up my notes so far into an Options / Modeling Cases Not Currently Covered doc: http://bit.ly/2bRCly8 Feel free to read, edit, update, use, not use, ignore, whatever. I'm hoping it can crystallize (maybe just for me) some of the model change requests that have surfaced in this discussion. In particular, I do favor:
I don't know if this is how PCDM tends to work through these questions though, and I apologize if this is counter the method/procedures for discussion leading to decisions/actions/etc. |
IMHO, I think the crux here is complex is in the eye of the beholder. That said, if we think the model needs restrictions, lets outline them, agree, and then see what complexity is needed to support that. |
Just to make this conversation more concrete, here is an example of PCDM-as-OWL that I wrote a few months ago for code I was working on. It doesn't have all of the bells and whistles that @DiegoPino describes, but it does use some simple OWL constructs: https://gitlab.amherst.edu/acdc/repository-extension-services/blob/master/acrepo-services-pcdm/src/main/resources/pcdm.owl |
Thanks, @acoburn ! I see the only restrictions your work add (I think) that aren't in the RDFS ontology are property inverses (which seems to be something that most people support, but could possibly be handled with RDFS+). This will be helpful for determining what seems to be the core case here of "community accessibility", as we've now got almost exact copy of the models in two different modeling languages. |
@acoburn nice job. I'm working on adding some restrictions, to avoid pcdm:Object hasMember pcdm:Collection or pcdm:Object hasMember pcdm:Object, which was always complex to follow! Thanks really |
Summary of the rationale for OWL from the Islandora perspective:
|
Without taking sides over this, I want to note that the undecidable nature of RDFS is of debatable practical effect: http://www.sciencedirect.com/science/article/pii/S1570826805000144 As usual, blank nodes screw everything up. Stupid blank nodes. |
Maybe a tangent, but: If interoperability is a goal, it might be good to ask how successfully a client armed with an OWL ontology can interact with a repository containing only RDFS, and vice-versa. If RDFS is effectively a subset of OWL, it might be worth publishing both and remanding the cost/benefit analysis to the installation sites, right? |
@barmintor we need to define the level of interoperability. Could it just be publishing a "PCDM compliant" named graph? |
hey @ruebot - I guess what my previous comments were getting towards is that restrictions probably mean OWL, but we should figure out what the restrictions are that we are proposing/in need of. Armed with that, we can decide OWL versus other (and even if we go with OWL, we still need to do this work to figure out which flavor of OWL, which is still entirely open). |
@barmintor, true, level of interoperability is the question. Just not sure about
and RDFS and OWL are in different semantic layers. An RDFS Ontology is compatible/is as undecidable( @ajs6f sorry, still not convinced by that paper but i have to read it completely) with an OWL Full Ontology, that is for sure true. And OWL 2 does define a cross domain profile. @cmh2166 the flavor mostly depends on the complexity, i don't think we need too many fancy owl constructs, just the some of basic ones mabye, i would propose OWL 2 DL, but i could really need revisiting this. |
Yes, I'm saying, lets start evaluating which ones*. This is what can help us move forward, looking at these restrictions + cases then figuring out the modeling language (and flavor), not choosing a modeling language first. *which constructs, which will be based on which restrictions + cases we are looking to add. |
A big part of my concern is that RDFS is decidedly not a subset of OWL. In an important sense (in terms of the Resources in their respective universes) it's a superset. OWL partitions it's universe into individuals, properties and classes (the details of the partitioning depend on which OWL profile is being used). This means that many perfectly reasonable RDFS graphs are nonsense in OWL. My question is whether this, and other ontological commitments of OWL, are acceptable. I believe we should take care to minimize their impact. |
@ajs6f, i had a chance to read that: so what it says is: |
A question here: when we are talking about clients, do we mean clients that interact with fedora? If so they interact with RDF resources, not RDFS. The result of an OWL Ontology (individuals that result from the models it allows) are also RDF resources. The lack of |
@no-reply can you explain that partitioning stuff. I'm not sure I understand how that might look. I'm still learning this semantic stuff. 😄 |
I haven't had time to keep up with this fast moving thread properly, but I want to address a couple of specific questions & points from @DiegoPino's earlier reply to me:
In a bare way, OWL's model is more complex.
OWL's division of resources into disjoint groups (and further division of properties) affects the semantics of the resources described and the kinds of relations that are allowed between them. Introducing formal restrictions at the ontology layer, likewise. I'm not against these commitments, but I'd like to justify them on a more or less case by case basis. I frequently field questions about whether a given graph is satisfiable or admits the intended interpretation (the questions are never phrased this way). When the answer is no, people generally want to know, concretely, what they gain by making a change. These changes have costs: objects in their application code, nodes in their instance graphs, internal metadata work, discussion, and training. Which speaks to:
I don't think this is remotely fair, and it reflects precisely my trepidation about jumping into OWL for this model without deep discussion. It's not about being ready: our adopters already use models of varying complexity and restrictiveness. What alienates them is seemingly arbitrary restrictions that make their work harder.
In calling PCDM "low level", I mean in the context of the domain. This is, as I understand it, the unifying goal of the project: to provide a base model appropriate for the repository/DAMS domain. I think it's possible and desirable to do this with fairly generic conceptual modeling patterns that can be implemented in many different ways. (Note, also, that we went far out of our way to avoid talking about "Works", or anything as opinionated as that. Where "work" is used in comments, it is explicitly shorthand for "an intellectual entity".) Bringing in OWL's more complex (than RDF/RDFS D semantics) model theoretic semantics, has the potential to limit interoperability along this axis. (I would argue it necessarily limits it, and the question is more about tradeoffs associated with specific assertions). I'm not against the goals of a machine readable model---far from it---and I'm not trying to block specific uses of OWL. I just want to be thoughtful about the decisions we make in this base model and the ontological commitments they enforce on downstream users. |
@DiegoPino : To my knowledge, OWL 2 was formulated to give folks the ability to choose precisely for the computational complexity of various tasks, so if that's what you mean (not really sure), then sure, I guess. |
Instead of this emacs vs vim argument, can we instead create new issues for what is currently missing by using RDFS only? |
Both RDF(S) and OWL have an underlying model theoretic semantics. The purpose of these semantics, loosely speaking, is to map expressions in their respective languages to "interpretations" which can be understood to be about real world objects & states of affairs. Again loosely, the interpretations can be true or false, and the expressions (graphs, in the case of RDF) can be "satisfied" or not by a given interpretation. RDF's semantics posit the existence of one kind of object: a "resource". RDFS adds some facility for us to sort those objects up into different classes and reason about them, but the rules introduced still leave us talking about one kind of generic real world thing. OWL's semantics do not posit the existence of resources, instead talking about classes, individuals, datatype properties, and object properties. Each of these groups is non-overlapping. There's some magic to allow for "punning"---using one name to refer to different objects, in these different sets. As an example of potential fallout: declaring a property to be an object property can mean that an OWL interpretation that satisfies a graph must assume the existence of two or more real world things where an RDF interpretation must assume only one. |
@no-reply thanks, I'll probably have to read that about 20 more times to fully understand it, but I get your initial point about RDFS being sort of a superset. It is because everything is more... not sure the word... descriptive/well defined/granular. Yes? @azaroth42 I think we can open a new ticket, I think @cmh2166 has been sort of starting that work. But I also think this is not an So perhaps we can hold this ticket to see what the outcome of the "whats missing in RDFS". |
It's sort of counter-intuitive. Because RDFS is less descriptive, its universe includes more things---specifically, all the things which OWL's universe excludes because they won't support the kind of reasoning desired (those things which are both instances and classes, etc...).
Agreed. On the other hand, this is cutting a little close to the kind of holy war Rob refers to. Apologies for my part in that.
Sorry. My intention was to assume good faith. Instead I got snippy. Thanks for your continued efforts, here. |
Glad to see we are breaking this out into new tickets with specific restrictions / what we are not doing in rdfs now that we'll want to add (and using that to guide, or not, to OWL or other). Thanks all. I'll keep my above doc up to date for those who care/want a cheat sheet of restrictions mentioned so far, but this is what I have somewhere brought up in discussions, notes or other:
Happy to collect/discuss others proposed restrictions, action items, etc. |
@cmh2166 thanks for all this Christina! |
@ruebot it beats dealing with bibframe. ;-) |
Propose closing the issue. |
@azaroth42 do you have a proposed conclusion to close it with? Or just close it for now, and re-open it later if need be? |
I propose closing because it's not an issue that we can resolve without use cases and more explicit requirements, hence the holy war nature of the discussion. We've agreed above to file new issues with actual requirements and proposed modifications, and hence this issue is unnecessary. |
I think we've gathered the cases we can from this and can proceed based off those (and other, more specific cases as they surface in other discussions). So I'm good with closing this issue. |
@cmh2166 thank you very much for your work on this. I took some distance of this thread, mainly because i was told i was bringing too much more passion into the discussion and that was not helping too much. I highly appreciate your documentation and also the way you acted as "sanitising middleware" human-being here. 👍 thanks again |
There is a proposal to switch from encoding the core ontology in RDFS and to use OWL instead because it is more expressive and can encode things like pcdm:hasMember and pcdm:memberOf being reciprocal properties.
Should we switch to encoding the core ontology in OWL?
See #53 for preliminary discussion.
The text was updated successfully, but these errors were encountered: