Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test and tweak strapi import for different types (RPB-58) #37

Merged
merged 35 commits into from
Sep 21, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
8111944
Test and tweak strapi import for `articles` (RPB-58)
fsteeg Jun 26, 2023
e7d4666
Merge remote-tracking branch 'origin/rpb-28-hbzIds' into rpb-58-types
fsteeg Jun 27, 2023
835cc15
Update README for new unified content type (RPB-58)
fsteeg Aug 8, 2023
70902b6
Merge remote-tracking branch 'origin/main' into rpb-58-types
fsteeg Aug 9, 2023
f9ee6f6
Create title (f20_) and dates (f76a, f76b) for volumes (RPB-58)
fsteeg Aug 9, 2023
f84d5ff
Add test data for Strapi import of monographs (RPB-58)
fsteeg Aug 10, 2023
d6b8728
Tweak rpb-titel-to-strapi.fix for URL and Signatur (RPB-58)
fsteeg Aug 10, 2023
0c8857d
Tweak transformations for full data processing (RPB-58)
fsteeg Aug 10, 2023
5c735bb
Set up workflow for full Strapi import (RPB-58)
fsteeg Aug 10, 2023
0addbaa
Transform `f18_=x` to boolean for Strapi (RPB-58)
fsteeg Aug 11, 2023
a1fa272
Validate URLs, extract macro, add list with invalid URLs (RPB-58)
fsteeg Aug 11, 2023
6173d3f
Add list of monographs (f36_=s) with missing 76a / 76b (RPB-58)
fsteeg Aug 14, 2023
564140d
Tweak handling of required `f70s` pages field (RPB-58)
fsteeg Aug 15, 2023
a4010ff
Handle both upper and lower case 'x' for `f18_` (RPB-58)
fsteeg Aug 15, 2023
521c02f
Use `log-object` in import workflows for improved logging (RPB-58)
fsteeg Aug 16, 2023
2489480
Remove custom `wait` command (RPB-58)
fsteeg Aug 16, 2023
69849e0
Restrict 70 transformation to 701 etc., not 70b etc. (RPB-58)
fsteeg Aug 17, 2023
326b752
Remove printing of records with missing page numbers (RPB-58)
fsteeg Aug 17, 2023
94f494c
Set required `f70` if missing, point to specific fields (RPB-58)
fsteeg Aug 17, 2023
447bc81
Always set `f18_` to true if the field exists (RPB-58)
fsteeg Aug 18, 2023
2b16d02
Set required f70 field only for articles (RPB-58)
fsteeg Aug 18, 2023
92e1952
Don't try to fill f76[ab] from f01_, point to it instead (RPB-58)
fsteeg Aug 22, 2023
a17c861
Add `sm` / `sbd` examples, update README on Strapi import (RPB-58)
fsteeg Aug 23, 2023
45a8386
Update SW lookup table for current data
fsteeg Aug 30, 2023
78033ab
Create URIs for Sach- (30) and Ortsnotation (31) in Strapi (RPB-58)
fsteeg Aug 30, 2023
bd0da0a
Move complex subject splitting and URI creation to Strapi (RPB-58)
fsteeg Aug 30, 2023
bbefc0b
Move URI creation for contrib. agents and roles to Strapi (RPB-58)
fsteeg Aug 31, 2023
9c968b3
Group numbering with superordinate link in `f01` object (RPB-58)
fsteeg Sep 1, 2023
bfe3eb2
Rename `ü` subfields to `u` for Strapi compatibility (RPB-58)
fsteeg Sep 1, 2023
65c4e08
Remove handling of `f36t` from to-lobid transformation (RPB-58)
fsteeg Sep 4, 2023
27abd36
Handle unsupported repetition of `f01_` by picking first (RPB-58)
fsteeg Sep 4, 2023
b1934aa
Rename `id` subfield for Strapi compatibility (RPB-58)
fsteeg Sep 4, 2023
8250a0d
Remove `f01.f01s` if it's not a year, i.e. 4 digits (RPB-58)
fsteeg Sep 5, 2023
d948077
Add monograph without publication dates to sample data (RPB-58)
fsteeg Sep 5, 2023
e71b261
Add manually corrected 036t0115179 to test data (RPB-58, RPB-69)
fsteeg Sep 6, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 16 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,20 +35,31 @@ This writes a `.tsv` file to `output/`, to be used for lookups in the transforma
sbt "runMain rpb.ETL conf/rpb-test-titel-to-strapi.flux"
```

This writes a single `.json` files to `output/` (it's actually JSON lines, but the suffix makes it work with JSON tools, e.g. for syntax coloring and formatting).
This writes a single `.json` file to `output/` (it's actually JSON lines, but the suffix makes it work with JSON tools, e.g. for syntax coloring and formatting).

### Import strapi data

```bash
sbt "runMain rpb.ETL conf/rpb-test-titel-import.flux PICK=all_equal('f36_','u') PATH=articles"
sbt "runMain rpb.ETL conf/rpb-test-titel-import.flux PICK=all_equal('f36_','s') PATH=books"
sbt "runMain rpb.ETL conf/rpb-test-titel-import.flux PICK=all_equal('f36_','sbd') PATH=books"
sbt "runMain rpb.ETL conf/rpb-test-titel-import.flux PICK=exists('f36t') PATH=multi-volume-books"
sbt "runMain rpb.ETL conf/rpb-test-titel-import.flux PICK=all_equal('f36_','sm') PATH=periodicals"
sbt "runMain rpb.ETL conf/rpb-test-titel-import.flux PICK=all_equal('f36_','s') PATH=independent-works"
sbt "runMain rpb.ETL conf/rpb-test-titel-import.flux PICK=all_equal('f36_','sbd') PATH=independent-works"
sbt "runMain rpb.ETL conf/rpb-test-titel-import.flux PICK=all_equal('f36t','MultiVolumeBook') PATH=independent-works"
```

This attempts to import all data selected with the `PICK` variable to the API endpoint in `PATH`, and prints the server response.

To reimport existing entries, these may need to be deleted first, e.g. for `articles/1` to `articles/5`:

```
curl --request DELETE http://test-metadaten-nrw.hbz-nrw.de:1339/api/articles/[1-5]
```

After import they are available at e.g. http://test-metadaten-nrw.hbz-nrw.de:1339/api/articles?populate=*

Entries using the same path can be filtered, e.g. to get only volumes (`f36_=sbd`):

http://test-metadaten-nrw.hbz-nrw.de:1339/api/independent-works?filters[f36_][$eq]=sbd&populate=*

### Run transformation to lobid data

```bash
Expand Down
4 changes: 3 additions & 1 deletion app/rpb/Decode.java
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,9 @@ private void processFields(final String[] vals) {
final String fullRecordId = recordId + "b" + volumeCounter;
getReceiver().startRecord(fullRecordId);
getReceiver().literal(fieldName("#00 "), fullRecordId);
getReceiver().literal(fieldName("#20ü"), recordTitle);
getReceiver().literal(fieldName("#20u"), recordTitle);
String volumeTitle = recordTitle + " : " + v;
getReceiver().literal(fieldName("#20 "), volumeTitle);
}
getReceiver().literal(fieldName(k), v);
}
Expand Down
5 changes: 5 additions & 0 deletions build.sbt
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ libraryDependencies ++= Seq(
"org.metafacture" % "metafacture-flux" % "5.7.0-rc1",
"org.metafacture" % "metafacture-triples" % "5.7.0-rc1",
"org.metafacture" % "metafacture-formatting" % "5.7.0-rc1",
"org.metafacture" % "metafacture-monitoring" % "5.7.0-rc1",
"org.metafacture" % "metafix" % "0.6.0-rc3",
"org.elasticsearch" % "elasticsearch" % "1.7.5" withSources(),
"com.github.jsonld-java" % "jsonld-java" % "0.5.0",
Expand All @@ -29,6 +30,10 @@ libraryDependencies ++= Seq(
"org.mockito" % "mockito-junit-jupiter" % "2.27.0" % "test"
)

excludeDependencies ++= Seq(
SbtExclusionRule("org.slf4j", "slf4j-simple")
)

dependencyOverrides ++= Set(
"org.antlr" % "antlr-runtime" % "3.2",
"org.eclipse.emf" % "org.eclipse.emf.common" % "2.24.0"
Expand Down
Loading
Loading