This repository was archived by the owner on Jan 5, 2021. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 2
File Specifications
Nathan Tallman edited this page Nov 4, 2019
·
39 revisions
- These file specifications are an extension of Penn State's local digital object guidelines.
- These file specifications apply to batches imported and exported only. CHO will natively store files using a machine-efficient layout and naming convention. For preservation storage CHO will use the Oxford Common File Layout to store individual work and collection packages in (1.x feature).
-
Batches
- Batches can contain one or more works. Batches can only directly contain works, not entire collections or just file sets or files.
- All works have to be in their own directory inside the bag's data directory.
-
Works
- Simple works contain all their file sets in one directory.
- Nested works contain two or more sub-directories, which include file sets. [Subject to change, 1.x feature.]
-
File Sets.
- Normal file sets require a
_preservation
or_service
file to be present. CHO derives normal file sets from the unique characters in the filenames, minus the extension and base. This is usually an incremental counter such as 0000_000. - Representative file sets require an
_access
file to be present. CHO derives representative file sets from filenames with identifiers, suffixes, and no incremental counters. Representative file sets do not need metatdata as they are already described with work-level metadata.
- Normal file sets require a
- File Suffixes (Classes map to PCDM Use [RDF])
-
_preservation
= preservation master (master preservation file, e.g. tiff, wav, avi) -
_preservation-redacted
= redacted preservation master -
_service
= service master, access master (master service file for viewers, e.g. jp2, dv) -
_access
= access derivative for user download (e.g. pdf, jpg) -
_thumb
= display thumbnail -
_text
= plain text, transcription, OCR output -
_caption
= time-coded captions for a media file (e.g. vtt, srt) -
_media
= images of the original physical media or housing
-
The following are file listings from example CHO batches, which consist of a ZIPped bag and a CSV of metadata for one or more works and file sets.
|-- ingest/ |-- pines_2018-09-03.zip/ |-- bag-info.txt |-- bagit.txt |-- manifest-md5.txt |-- tagmanifest-md5.txt |-- data/ |-- pst_9999999999/ |-- pst_9999999999_preservation.tif |-- pst_9999999999_preservation-redacted.tif |-- pst_9999999999_service.jp2 |-- pst_9999999999_text.txt |-- pst_9999999999_thumb.jpg
batch_id* | alternate_ids*+ | title*+ | work_type* | home_collection* |
---|---|---|---|---|
pines_2018-09-03 | pst_9999999999 | Edna Bergleton Thorpe | Still_Image | pst_99999 |
* Required for work metadata.
+ Required for file set metadata.
|-- ingest/ |-- pstsc_99999_ntt7_2018-10-17.zip/ |-- bag-info.txt |-- bagit.txt |-- manifest-md5.txt |-- tagmanifest-md5.txt |-- data/ |-- pstsc_99999_1e172d6ff9h8d032c7e7a8241a56793c/ |-- pstsc_99999_1e172d6ff9h8d032c7e7a8241a56793c_00001_01_preservation.tif |-- pstsc_99999_1e172d6ff9h8d032c7e7a8241a56793c_00001_01_preservation-redacted.tif |-- pstsc_99999_1e172d6ff9h8d032c7e7a8241a56793c_00001_01_service.jp2 |-- pstsc_99999_1e172d6ff9h8d032c7e7a8241a56793c_00001_01_text.txt |-- pstsc_99999_1e172d6ff9h8d032c7e7a8241a56793c_00001_01_thumb.jpg |-- pstsc_99999_1e172d6ff9h8d032c7e7a8241a56793c_00001_02_preservation.tif |-- pstsc_99999_1e172d6ff9h8d032c7e7a8241a56793c_00001_02_service.jp2 |-- pstsc_99999_1e172d6ff9h8d032c7e7a8241a56793c_00001_02_text.txt |-- pstsc_99999_1e172d6ff9h8d032c7e7a8241a56793c_00001_02_thumb.jpg |-- pstsc_99999_1e172d6ff9h8d032c7e7a8241a56793c_00002_01_preservation.tif |-- pstsc_99999_1e172d6ff9h8d032c7e7a8241a56793c_00002_01_service.jp2 |-- pstsc_99999_1e172d6ff9h8d032c7e7a8241a56793c_00002_01_text.txt |-- pstsc_99999_1e172d6ff9h8d032c7e7a8241a56793c_00002_01_thumb.jpg |-- pstsc_99999_1e172d6ff9h8d032c7e7a8241a56793c_00002_02_preservation.tif |-- pstsc_99999_1e172d6ff9h8d032c7e7a8241a56793c_00002_02_service.jp2 |-- pstsc_99999_1e172d6ff9h8d032c7e7a8241a56793c_00002_02_text.txt |-- pstsc_99999_1e172d6ff9h8d032c7e7a8241a56793c_00002_02_thumb.jpg |-- pstsc_99999_1e172d6ff9h8d032c7e7a8241a56793c_access.pdf |-- pstsc_99999_1e172d6ff9h8d032c7e7a8241a56793c_text.txt |-- pstsc_99999_1e172d6ff9h8d032c7e7a8241a56793c_thumb.jpg
batch_id* | alternate_ids*+ | title*+ | work_type* | home_collection* |
---|---|---|---|---|
pstsc_99999_ntt7_2018-10-17 | pstsc_99999_1e172d6ff9h8d032c7e7a8241a56793c | Percival Horace Johnson correspondence to Allen Anderson | Document | pstsc_99999 |
pstsc_99999_1e172d6ff9h8d032c7e7a8241a56793c_00001_01 | Page 1 | |||
pstsc_99999_1e172d6ff9h8d032c7e7a8241a56793c_00001_02 | Page 2 | |||
pstsc_99999_1e172d6ff9h8d032c7e7a8241a56793c_00002_01 | Page 3 | |||
pstsc_99999_1e172d6ff9h8d032c7e7a8241a56793c_00002_02 | Page 4 |
* Required for work metadata.
+ Required for file set metadata.
|-- ingest/ |-- birdsong_ntt7_2017-12-18/ |-- bag-info.txt |-- bagit.txt |-- manifest-md5.txt |-- tagmanifest-md5.txt |-- data/ |-- pstalt_birds02/ |-- pstalt_birds02_00001_01_preservation.wav |-- pstalt_birds02_00001_01_front.jpg |-- pstalt_birds02_00001_02_preservation.wav |-- pstalt_birds02_00001_02_back.jpg |-- pstalt_birds02_service.flac |-- pstalt_birds02_access.mp3 |-- pstalt_birds02_text.txt |-- pstalt_birds02_thumb.jpg
batch_id* | alternate_ids*+ | title*+ | work_type* | home_collection* |
---|---|---|---|---|
birdsong_ntt7_2017-12-18 | pstalt_birds02 | Altoona area birdsong recording by Wally Walton | Audio | pstal_birds |
pstalt_birds02_00001_01 | Cassette 1, Side 1 | |||
pstalt_birds02_00001_02 | Cassette 1, Side 2 |
* Required for work metadata.
+ Required for file set metadata.
NESTED WORKS ARE NOT PART OF MVP AND THE SPECS BELOW MAY NOT HAVE BEEN UPDATED SINCE INITIAL DRAFTING, WILL NEED UPDATING BEFORE 1.X SPRINTING
|-- choStaging/ |-- batchID/ |-- bag-info.txt |-- bagit.txt |-- manifest-md5.txt |-- tagmanifest-md5.txt |-- data/ |-- workID/ |-- workID_00001/ |-- workID_00001_01_preservation.tif |-- workID_00001_01_preservation-redacted.tif |-- workID_00001_01_service.jp2 |-- workID_00001_01_thumb.jpg |-- workID_00001_02_preservation.tif |-- workID_00001_02_service.jp2 |-- workID_00001_02_thumb.jpg |-- workID_00002/ |-- workID_00002_01_preservation.tif |-- workID_00002_01_service.jp2 |-- workID_00002_01_thumb.jpg |-- workID_00002_02_preservation.tif |-- workID_00002_02_service.jp2 |-- workID_00002_02_thumb.jpg |-- workID_service.pdf |-- workID_text.txt |-- workID_thumb.jpg
alternate_ids | home_collection | batch_id | work_type | title |
---|---|---|---|---|
workID | collectionID | batchID | Document | Simple Work |
workID_00001 | collectionID|workID | batchID | Document | Nested Work 1 |
workID_00002 | collectionID|workID | batchID | Document | Nested Work 2 |
|-- choStaging/ |-- batchID/ |-- bag-info.txt |-- bagit.txt |-- manifest-md5.txt |-- tagmanifest-md5.txt |-- data/ |-- workID/ |-- workID_00001/ |-- workID_00001_front.jpg |-- workID_00001_preservation.wav |-- workID_00001_service.flac |-- workID_00002/ |-- workID_00002_back.jpg |-- workID_00002_preservation.wav |-- workID_00002_service.flac |-- workID_service.flac |-- workID_access.mp3 |-- workID_text.txt |-- workID_thumb.jpg
alternate_ids | home_collection | batch_id | work_type | title |
---|---|---|---|---|
workID | collectionID | batchID | Audio | Simple Work |
workID_00001 | collectionID|workID | batchID | Audio | Nested Work 1 |
workID_00002 | collectionID|workID | batchID | Audio | Nested Work 2 |
- Home
- About
- Resources
- Metadata
- Management
- Batch Management
- Development
- Accessibility