Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a FileSet class #59

Closed
escowles opened this issue Aug 23, 2016 · 69 comments
Closed

Add a FileSet class #59

escowles opened this issue Aug 23, 2016 · 69 comments

Comments

@escowles
Copy link
Contributor

The works extension includes a FileSet class that represents an original file and other files derived from it. The Hydra implementation has found this to be a very useful structure, and the key to separating Objects that represent component parts of other Objects from groupings of Files.

Should we add a FileSet class to the core ontology?

See #53 for preliminary discussion.

@ruebot
Copy link
Contributor

ruebot commented Aug 23, 2016

👎 If it is required.

Further discussion in this thread: https://groups.google.com/forum/#!msg/pcdm/Ep1Cty2JDx4/Hdw8I5pbDwAJ

@whikloj
Copy link

whikloj commented Aug 23, 2016

I've said it before, but I think enforcing an additional resource for the (I have no numbers to back this up...but) small percentage of use cases that will need multiple file sets is not to the general benefit.

I understand the opinion around consistency for development, but I don't find this particularly persuasive. It seems that we can find some way to allow for the need of multiple FileSets without requiring them for all PCDM based resources.

@mjgiarlo
Copy link

@whikloj Can you say more about your use cases? What kinds of content do you need to support?

@escowles escowles mentioned this issue Aug 23, 2016
@ruebot
Copy link
Contributor

ruebot commented Aug 23, 2016

@mjgiarlo at bare minimum, the content we support now in Islandora 1.x; which, I am near 100% sure, does not have any public use cases from the Islandora Community on the need for multiple FileSets as described. Should we need FileSets, we'd be perfectly happy using the FileSets extension.

👎 forcing FileSets

@acoburn
Copy link
Contributor

acoburn commented Aug 23, 2016

If the PCDM vocabulary did not use predicates such as pcdm:hasMember, pcdm:hasFileSet and pcdm:hasFile (and encouraged people to use ore:aggregates), then the model would support both the simple and complex use case. (With or without FileSets).

@whikloj
Copy link

whikloj commented Aug 23, 2016

@mjgiarlo I think @ruebot and @acoburn have covered what I would say (and better than I could). Essentially, we could use pcdm:FileSets but for every object in our repository it would be a single lonely pcdm:FileSet and therefore extraneous data with no benefit. So could we not make it optional.

@tpendragon
Copy link

@acoburn

If the PCDM vocabulary did not use predicates such as pcdm:hasMember, pcdm:hasFileSet and pcdm:hasFile (and encouraged people to use ore:aggregates), then the model would support both the simple and complex use case. (With or without FileSets).

#63

@whikloj
Copy link

whikloj commented Aug 23, 2016

@mjgiarlo sorry our data is 90% newspapers. But the question is more about the possibility of multiple FileSets. We scan and, for better or worse, never scan again. So a second FileSet is extremely unlikely in most all of our use cases.

But perhaps I am misunderstanding the use cases that drove this need.

@tpendragon
Copy link

@whikloj @ruebot @awoods The majority of the objects in your repository don't store derivatives? The goal of FileSets is "here's a grouping of binaries and a description about why they're grouped."

If FileSets aren't required, then I assume you just never have multiple FileSets, and your object representing "Page 1" just has three files hooked to it and there's no reason to group the master with the derivatives because only one of those is a master and you can identify it?

So, is optional filesets something we can imagine an ingest routine for? "If it uses hasFileSet, go there for files, if it just hasFile, go there, there's only one grouping, if it has both...I dunno, count it as three filesets?"

@tpendragon
Copy link

Ah, forgot to mention, the reason multiple filesets became a thing was there were institutions who had multiple masters for a single "page".

@DiegoPino
Copy link

@tpendragon if i understand correctly, then the use case that motivated this new rdfs Class is having multiple Masters, with multiple derivatives each one for a same real world entity(page 1 of book 1 for example).

by the way Islandora(and @whikloj) do store Derivatives and a lot of them. I don't feel our use cases are that different, but our approaches are.

Grouping multiple binary resources together under a FileSet Class can be solved(without FileSet) by linking (via a specific predicate, not necessarily in PCDM space) a given Master binary to it's derivatives. You could even make another construct, via proxy's that point to the Binary you want to consider your canonical Master. Just one of many ways. How do you solve the "which of the many masters" problem with FileSet?

I also start to understand that some of this needs are based on programming paradigms, which is maybe the miss understanding we have here, and has probably something to do with how Hydra does data modelling hardcoding structure (based on rdf class to ruby class matching, just guessing) versus what we want to do (trying to extract structure, constraints and requirements from Ontologies and triple store)

If it uses hasFileSet, go there for files, if it just hasFile, go there, there's only one grouping, if it has both...I dunno, count it as three filesets?".

That is what i mean with hardcoded. The ontology itself (the semantic definition of the class + object properties allowed and their target classes) should give us that info instead of the code. I feel that is the idea of using Linked data. I'm not saying one approach is better than other. I'm saying we defer probably on how we deal with the logic over the structure.

@tpendragon
Copy link

based on rdf class to ruby class matching, just guessing

Nah, it's totally arbitrary. We store a model statement on the resource to map to the ruby class.

The ontology itself (the semantic definition of the class + object properties allowed and their target classes) should give us that info instead of the code.

I suppose my hypothesis is that this specification is only as good as the tools we can build which utilize it. Arbitrary ingest was just an example. Someone somewhere will have to build that logic, if we can encode that logic into the ontology via restraints that they the developer will have to follow, then great.

Grouping multiple binary resources together under a FileSet Class can be solved(without FileSet) by linking (via a specific predicate, not necessarily in PCDM space) a given Master binary to it's derivatives.

So I think in order to do this we'd have to redefine what a pcdm:File is, because right now it's a binary file. Bits don't have RDF statements - they can have nodes which describe them, but that's not part of the ontology now (except in the case of FileSets).

@tpendragon
Copy link

Ah, I may be letting Fedora leak into my previous argument. To be clear, you're proposing:

GET x
<x> <hasFile> <y>
<x> <hasFile> <z>
<y> <type> <master>
<z> <type> <derivative>
<z> <derivedFrom> <y>

Which I think solves that case, yeah.

@tpendragon
Copy link

We've always said Files only have access and technical metadata. What's derivedFrom?

@tpendragon
Copy link

The other issue is one of practicality: FileSets are implementable in LDP, and specifically Fedora.

If we say <z> <derivedFrom> <y> is the structure we want, it might not be.

@cmharlow
Copy link

cmharlow commented Aug 23, 2016

Just a comment, not pro or con filesets at this point: < derivedFrom > to my mind would count as technical metadata. It is expressing the technical process (via the relationship) used to generate that binary. There's an ebucore property that could possibly be used there, if we go that route.

@mjgiarlo
Copy link

@ruebot What does "the content [you] support now in Islandora 1.x" look like? If you've already got this jotted down somewhere, I'd be happy to review existing documentation rather than expect you to type it all out. :) I'm struggling to imagine that Islandora doesn't already have robust support for multi-file works.

@acoburn
Copy link
Contributor

acoburn commented Aug 24, 2016

w/r/t FileSets, they are, IMO a very natural way to describe book-like objects. They are, effectively, how we describe thousands of such objects in our current repo: Manuscript -> Page -> (Set of files w/ color targets) and (set of files w/o color target). And I would expect to model them the same way going forward w/ F4.

There are other objects in our repo (hundreds of thousands of them). For these resources, the FileSet abstraction is unnecessary. For these, I could live with that additional layer (FileSets) if necessary, even if not ideal.

Looking into the future, one of the big areas of growth in our repository will be faculty research data. This data does not look anything like books. An example from last semester: perterbations of protein data observed over a period of time (as in several million observations over hundreds of specific protein chains). For this data, not only would FileSets be not quite to the point, the entire PCDM structure would likely get in the way. And no, I do not see "put all the data into a big zip file and be done with it" as an option.

So I am left with a choice: do I model some objects (e.g. book-like things) using PCDM and some objects in ORE or do I attempt to have consistency across the repository in terms of how structural metadata is expressed. Personally, I opt for the latter.

@scossu
Copy link

scossu commented Aug 24, 2016

In this parallel discussion FileSets are regarded as specific aggregations representing "digital content" in an abstract sense. The Files that they aggregate are different manifestations (i.e. different file formats, encodings, derivations, subsets, etc.) of the same digital source, so they have specific common traits.

This gives FileSets a defined role (e.g. a scan of a page) distinguished from other pcdm:Objects which represent higher-level aggregations (e.g. pages, books, collections etc.).

Even with a FileSet with one File (which is quite unlikely because you will almost surely have a thumbnail, an access image, a preservation copy, an OCR or metadata extract, etc.) you would still benefit from having an independent FileSet to put descriptive metadata about the digitized content. The dc:creator of a pcdm:Object may be the monk who wrote a book page, the dc:creator of the pcdm:File the photographer who reproduced it, and so on.

👍 to FileSets.

@scossu
Copy link

scossu commented Aug 24, 2016

Sorry, wrong link for the discussion mentioned. I mean this one: https://groups.google.com/forum/#!topic/hydra-tech/u181eBfgJcU

@escowles
Copy link
Contributor Author

What if we made FileSet not a subclass of Object (the current proposal), but allowed a single resource to be a FileSet and an Object at the same time? This would allow users to skip the extra node if they had no use for it. For example, if you had an Object with both image and text representations, you could have separate FileSets to separate them out:

<o1> a pcdm:Object ;
  rdfs:label "p. 1" ;
  pcdm:hasFileSet <fs1>, <fs2> .

<fs1> a pcdm:FileSet ;
  rdfs:label "page image" ;
  pcdm:hasFile <f1>, <f2> .

<fs2> a pcdm:FileSet ;
  rdfs:label "transcription" ;
  pcdm:hasFile <f3> .

<f1> a pcdm:File ;
  ebucore:filename "page1.tiff" .

<f2> a pcdm:File ;
  ebucore:filename "page1.jp2" .

<f3> a pcdm:File ;
  ebucore:filename "page1.tei" .

But you could also just having a single Object/FileSet combo resource:

<o2> a pcdm:Object, pcdm:FileSet ;
  rdfs:label "p. 1" ;
  pcdm:hasFile <f1>, <f2>, <f3> .

@scossu
Copy link

scossu commented Aug 24, 2016

This again would conflate the concepts of real world object and digital content. I find having a dedicated class fot the latter very useful.

I don't see a problem with a FileSet hanging out by itself or potentially having multiple relationships with other Objects. At AIC we have Assets (wannabe FileSets), such as a photographic portrait, that can be representations of both an artwork and a person (Objects).

@scossu
Copy link

scossu commented Aug 24, 2016

In the above for FileSet I meant "a digital reproduction of a photographic portrait".

Also, in the scenario you lay out:

<o2> a pcdm:Object, pcdm:FileSet ;
  rdfs:label "p. 1" ;
  pcdm:hasFile <f1>, <f2>, <f3> .

you may think you are content with a simple book page that has only one image. But if in 5 years you make a better reproduction of that page, you will have a hard time separating the old reproduction from the new one.

This discussion seems very similar to the one about the ordering ontology and about why we build a complex structure even for a simple scenario: the reason is to be interoperable and allow for expansion.

@escowles
Copy link
Contributor Author

@scossu , I agree that it complicates things if you re-scan a page. But what I'm hearing from @whikloj is that they just don't re-scan things. So separating the page Object from the FileSet would just add an additional node, without providing them any benefit.

@whikloj
Copy link

whikloj commented Aug 24, 2016

@escowles exactly, I'd love to re-scan but the funding is generally always for new digitization, so...

We currently have only 675,000 newspaper pages in Fedora 3.

Every page has a single master (Tiff) and derivatives, and as I said we never re-scan unless the original is useless to use. In which case we don't add the useless scan.

So a set of pcdm:File(s) attached to a pcdm:Object with (perhaps) pcdm use or ebucore predicates is perfectly workable.

Heck I don't even really want my derivatives in Fedora (but that is a Claw discussion).

So while I accept that some people want/need the ability to have multiple FileSets. To me it is just an extra layer to traverse.

@scossu
Copy link

scossu commented Aug 24, 2016

The benefit would be in separating the metadata about the newspaper page and the one about its scan.

they just don't re-scan things

@whikloj is this correct? I wonder how one would not even leave room for an option.

@whikloj
Copy link

whikloj commented Aug 24, 2016

The benefit would be in separating the metadata about the newspaper page and the one about its scan.

So see I would put the metadata about the newspaper page on the pcdm:Object (RdfSource) and the metadata about the scan on the pcdm:File (NonRdfSource).

@scossu, I'm not saying we don't leave room for the ability. But as it has not yet happened, I would love it was an option. But why force the extra layer for everything?

@scossu
Copy link

scossu commented Aug 24, 2016

So see I would put the metadata about the newspaper page on the pcdm:Object (RdfSource) and the metadata about the scan on the pcdm:File (NonRdfSource).

You would put the descriptive metadata about the page (e.g. the author of the article(s), date of publication) in the Object; descriptive metadata about the digitized content (author and date of the scan) in the FileSet and technical metadata about the file itself (characterization, file timestamp, etc.) in the File.

why force the extra layer for everything?

To have one single model to predictably store and find information instead of two different ones depending on whether you plan on having one or more files.

@whikloj
Copy link

whikloj commented Aug 24, 2016

@scossu Why would you not put descriptive metadata about the digitized content on the File itself?

@ruebot
Copy link
Contributor

ruebot commented Aug 24, 2016

If I'm understanding everything right, this is what pcdm:FileSet would like like implemented on the existing Islandora CLAW PCDM diagrams:

islandora-pcdm-large-image

@scossu
Copy link

scossu commented Aug 24, 2016

@ruebot yes, if you intend the pcdm:hasFile relationship to apply to JP2, JPG, and TN as well.

@dannylamb
Copy link

reason for not having the fileset be the direct container itself? not a criticism. just curious.

@escowles
Copy link
Contributor Author

@dannylamb Maybe we could have the FileSet be a DirectContainer itself. Would it be more palatable if the LDP projection was basically the same, we're just calling out existing files DirectContainer and saying it's an appropriate place to attach metadata about the files as a group?

@tpendragon
Copy link

Huh. Does multiple direct containers work? The problem would be you'd have to manage a link that isn't ldp:contains to find the "FileSet"

@scossu
Copy link

scossu commented Aug 25, 2016

I am not quite following the discussion about LDP here and maybe I am missing an important part of the PCDM fundamentals, so bear with me.

How is PCDM related to LDP, and most important, is PCDM membership related to LDP containment? My understanding is that PCDM defines the role of resources and their semantic relationships, while LDP focuses on structure and traversal. If we are talking about implementation examples around @ruebot's graph, I understand. If we are introducing LDP concepts in PCDM I would be OK as well, I just would like to know if this has always been a common understanding.

To this point, I would actually rephrase @ruebot's statement to "Files MUST be members of exactly one FileSet".

@escowles
Copy link
Contributor Author

@scossu There's always been some tension about how to treat LDP: on one hand, PCDM is an abstract model that could be implemented in any number of systems. But on the other hand, most of the people involved in PCDM are planning to implement it with Fedora 4, so how PCDM maps to LDP is an important consideration.

So I would say that LDP is definitely not a part of PCDM or required to use it. But many people who use PCDM are also interested in LDP, so it makes sense to also agree on the mapping (though separately from the modeling discussions).

In this particular case, I think the LDP mapping is relevant to the modeling discussion, because it changes whether adding an extra FileSet node results in adding an extra LDP container or not (with implications for scalability, etc.). If adding a FileSet only results in slightly redefining an existing container in the LDP projection, then maybe that lessens the objection to requiring it.

@escowles
Copy link
Contributor Author

@tpendragon I believe that we could have a pcdm:Object as a BasicContainer and it could have multiple DirectContainers which were FileSets. This would result in the Object having direct hasFile links to each of the Files, and you could also add hasFileSet linking to the FileSets, which would link to their containing Files with ldp:contains. The triples would look like:

<http://example.org/obj1> a pcdm:Object, ldp:BasicContainer ;
  ldp:contains <http://example.org/obj1/files>, <http://example.org/obj1/files2> ;
  pcdm:hasFile <http://example.org/obj1/files/f1>, <http://example.org/obj1/files/f2>,
               <http://example.org/obj1/files2/f3>, <http://example.org/obj1/files2/f4> ;
  pcdm:hasFileSet <http://example.org/obj1/files>, <http://example.org/obj1/files2> .

<http://example.org/obj1/files> a pcdm:FileSet, ldp:DirectContainer ;
  ldp:membershipResource <http://example.org/obj1> ;
  ldp:hasMemberRelation pcdm:hasFile ;
  ldp:contains <http://example.org/obj1/files/f1>, <http://example.org/obj1/files/f2> .

<http://example.org/obj1/files/f1> a pcdm:File ;
  rdfs:label "page1.tiff" .

<http://example.org/obj1/files/f2> a pcdm:File ;
  rdfs:label "page1.jp2" .

<http://example.org/obj1/files2> a pcdm:FileSet, ldp:DirectContainer ;
  ldp:membershipResource <http://example.org/obj1> ;
  ldp:hasMemberRelation pcdm:hasFile ;
  ldp:contains <http://example.org/obj1/files2/f3>, <http://example.org/obj1/files2/f4> .

<http://example.org/obj1/files2/f3> a pcdm:File ;
  rdfs:label "page1.tiff" .

<http://example.org/obj1/files2/f4> a pcdm:File ;
  rdfs:label "page1.jp2" .

@awead
Copy link

awead commented Aug 25, 2016

Can we clarify that we're using LDP or not? see #56

I'm mildly +1 on using LDP, but I think we need a definitive answer on the topic before proceeding with further discussion about FileSets.

@ruebot
Copy link
Contributor

ruebot commented Aug 25, 2016

Here is a summary of issues and questions with pcdm:FileSet from the Islandora perspective:

  1. Can pcdm:Objects no longer contain files? Meaning, only pcdm:FileSets can contain files.
  2. Is the domain of pcdm:hasFiles pcdm:FileSet?
  3. Is there a forced use of pcdm:hasMember, pcdm:hasFileSet, and pcdm:hasFile?
  4. How do you determine which pcdm:FileSet you are using, if you have multiple pcdm:FileSets?
  5. pcdm:FileSet is an aggregate for files, extends ore:aggregate, No forced use of pcdm:FileSet.
  6. We'd like pcdm:Objects to have just one file if they want, witout the use of pcdm:FileSet.
  7. Don't force IIIF presentation structure for things that will never be IIIF.

@whikloj
Copy link

whikloj commented Aug 25, 2016

Mea culpa, question 2 should be
"Is the domain of pcdm:hasFiles pcdm:FileSet?"
which is essentially the same as question 1.

@ruebot
Copy link
Contributor

ruebot commented Aug 25, 2016

@whikloj updated.

@tpendragon
Copy link

tpendragon commented Aug 25, 2016

There's some confusion, because FileSet's been refined multiple times due to the previous large ticket. I think the status of those answers now are these:

  1. Yes.
  2. Yes.
  3. Yes.
  4. Same answer as "how do you determine which file you're using".
  5. Yes-ish. pcdm:FileSet extends ore:aggregate, but -not- pcdm:Object, which means no hasMember.
  6. I'll talk about this in a second.
  7. We're not doing that? Being able to crosswalk to IIIF was a happy accident long after we were using hydra-works. They both have about the same level of structural definition.

So my only point is this: Let's say we don't do FileSets as a required construct - there's no node describing file grouping. We, at Hydra, obviously have use cases and have fallen down on it as a necessary construct. So let's say we keep it in the extension, and don't violate anything PCDM. You don't do that. So let's say we each represent a postcard.

Hydra:

<a> <type> <postcard>
<a> <hasMember> <front>
<a> <hasMember> <back>

<front> <hasMember> <frontFiles>
<frontFiles> <type> <FileSet>
<frontFiles> <hasFile> <front.jpg>

<back> <hasMember> <backFiles>
<backFiles> <type> <FileSet>
<backFiles> <hasFile> <back.jpg>

Islandora

<a> <type> <Postcard>
<a> <hasMember> <front>
<a> <hasMember> <back>

<front> <hasFile> <front.jpg>

<back> <hasFile> <back.jpg>

Is there any sort of useful interop we can have here? Are there any tools we can build off of PCDM 1.0 to generically work with both these models and do something useful? If we're just an extension, and we have to stick to PCDM 1, then FileSet has to be a pcdm:object. That means the graphs for Islandora's <front> and our <frontFiles> is the same. There's no way to tell if we're talking about a group of files or "The Front"

If the answer is no, then I think we need FileSet in some form in the ontology. If it's NOT a required construct, then the rules get more complex, and I would love to see examples of how we can have it be non-required (with graphs and restraints on the predicates defined here, in this ticket) and still talk about one another's models. I think we can all be happy here.

@escowles
Copy link
Contributor Author

What if the Hydra representation was:

<a> <type> <postcard>
<a> <hasMember> <front>
<a> <hasMember> <back>

<front> <hasFile> <front.jpg>
<front> <hasFileSet> <frontFiles>
<frontFiles> <type> <FileSet>
<frontFiles> <hasFileSetMember> <front.jpg>

<back> <hasFile> <back.jpg>
<back> <hasFile> <back.tei>
<back> <hasFileSet> <backFiles>
<backFiles> <type> <FileSet>
<backFiles> <hasFileSetMember> <back.jpg>
<back> <hasFileSet> <backFiles2>
<backFiles2> <type> <FileSet>
<backFiles2> <hasFileSetMember> <back.tei>

I think this lets the Object use hasFile to link to the File, so Islandora and Hydra (and everyone else!) can use the existing pattern. But there is an optional overlay on top of that groups the files, which maps neatly to LDP containers, for purposes of having multiple sets of files, such as both an image and a transcription, or a new digitization, etc., etc.

@whikloj
Copy link

whikloj commented Aug 25, 2016

@escowles Could you use pcdm:hasFile for the pcdm:FileSet to pcdm:File relationship too?

@escowles
Copy link
Contributor Author

@whikloj I think you could use pcdm:hasFile to link to Files both from Objects and from FileSets. In both cases, the File is a representation of the Object/FileSet, so I think it's the same.

@awead
Copy link

awead commented Aug 25, 2016

Could hasFileSetMember be a subclass of hasFile? I'm pondering @escowles model. This also would allow for any object to link to any FileSet and create many-to-many relationships?

@escowles
Copy link
Contributor Author

@awead That's the other option: making hasFileSetMember a subproperty of hasFile instead of just using hasFile. I don't have a strong opinion about which one is better.

Though I'm not sure about linking to FileSets from more than one Object. If Files are part of a single Object, and FileSets serve to group those Files, wouldn't the FileSets also be limited to that Object?

@whikloj
Copy link

whikloj commented Aug 25, 2016

I don't really have a use case for Files/FileSets being attached to more than one Object, but I remember that @scossu made the comment above

I don't see a problem with a FileSet hanging out by itself or potentially having multiple relationships with other Objects.

Just in case he has a use case he'd like to mention.

@tpendragon
Copy link

If Files are part of a single Object, and FileSets serve to group those Files, wouldn't the FileSets also be limited to that Object?

I thought this was the case.

@ruebot
Copy link
Contributor

ruebot commented Aug 25, 2016

@whikloj @dannylamb @DiegoPino @bryjbrown One use case we should think about is how we would do without FileSets is the good old ETD (Electronic Thesis/Dissertation) that is a PDF and associated datasets.

@azaroth42
Copy link
Contributor

I understood the point of a FileSet to be to group together files from the same source bitstream? So PDF plus Datasets is a pcdm:Object, not a FileSet.

@ruebot
Copy link
Contributor

ruebot commented Aug 25, 2016

@azaroth42 Cool. That's exactly what I was thinking, but wanted to make sure. I'm just trying to think of other use cases for FileSets from our perspective.

@awead
Copy link

awead commented Aug 25, 2016

@ruebot more broadly, the FileSet could contain derivatives from the original source, whether auto-generated or not, such as thumbnails, but also derived technical information such as fits xml, or other derivative-like things: TEI representations, full-text extraction, etc.

@bryjbrown
Copy link

bryjbrown commented Aug 25, 2016

@azaroth42 @ruebot A lot of the research data sets that I've worked with are different "views" (for lack of a better term) of the same raw data. Think different tabs on the same spreadsheet, or a chart image file representing the data in a separate CSV file. Not derivatives in the technical sense, but thematically derivative. Would this be a use case for FileSet, or does it just confuse things?

@escowles
Copy link
Contributor Author

@bryjbrown That seems like a reasonable use of a FileSet to me — including, for example, a data file and graphs/visualizations of it.

@cmharlow
Copy link

cmharlow commented Aug 26, 2016

A few questions/thoughts based of these last few comments + ideas:

  • So we're defining the proposed pcdm:FileSet as "A group of related Files, typically a single master File and its derivatives (from the same binary), but also thematic derivates, such as a master Dataset and a chart image file representing the data". (I like + share @bryjbrown's example)
  • pcdm:hasFile:
    • always has range pcdm:File
    • when it has domain pcdm:Object -> it relates an Object directly to a preferred representation File of some sort (it would be preferred if there is only 1 file total, and this could be helpful if there are FileSets?)
    • when it has domain pcdm:FileSet -> this relates a FileSet to its requisite Files*.
    • Perhaps want to make these cases 2 separate properties with explicit domains/ranges (hasFile / hasFileSetMember)
  • Would <front> <hasFile> <front.jpg> always be required once, for interoperability's sake + modeling consistency, when there is 1 or more FileSets?
  • But <front> <hasFile> <front.jpg> does not then require a related or used FileSet (hence the whole start of this).
  • And are we reusing <front.jpg> for the Object hasFile and the Fileset hasFileSetMember ? But not across different Objects.

Sorry, coming from a consistency in modeling is key ideal for me here, as I'll have to do quite a bit of batch metadata updates in a few PCDM implementations.

*edited to avoid presumption of inverses to these properties.

@escowles
Copy link
Contributor Author

escowles commented Jan 5, 2017

Discussion of FileSets has moved on — closing this issue. There is still work going on in the Hydra community about how FileSets should work, and what they represent, and making that compatible with the core ontology.

@escowles escowles closed this as completed Jan 5, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests