diff --git a/.gitignore b/.gitignore index 8c46ffe..8ca0265 100644 --- a/.gitignore +++ b/.gitignore @@ -5,3 +5,5 @@ public/ data/config.json out/* out.xml +*.sqlite +!data/example.sqlite \ No newline at end of file diff --git a/README.md b/README.md index e3ba374..b895b92 100644 --- a/README.md +++ b/README.md @@ -70,6 +70,16 @@ editioncrafter process -i -o -u ``` This will create all of the artifacts that EditionCrafter needs in order to display your document on the web, and place them in the specified `` folder. The `` parameter should be the URL at which you intend to host these artifacts. +### `database` + +Process the TEI document into a SQLite file containing a directory of categories and tags. This can be used with the Record List component from the EditionCrafter viewer package, or it can be browsed directly with a SQLite viewer. + +Usage: `editioncrafter database [-i tei_file] [-o output_path]` + +Required parameters: +* -i tei_file +* -o output_path (must end in .sqlite) + ## Command line options ### `-c` or `--config` diff --git a/data/database-example.xml b/data/database-example.xml new file mode 100644 index 0000000..4d0f4e5 --- /dev/null +++ b/data/database-example.xml @@ -0,0 +1,1628 @@ + + + + + msfr40 (Taxonomy Example) + + +

+ + +

+ + + + + + Taxonomy 1 + + animal + + + body part + + + currency + + + definition + + + environment + + + material + + + medical + + + measurement + + + music + + + plant + + + place + + + personal name + + + profession + + + sensory + + + tool + + + temporal + + + arms and armor + + + + Taxonomy 2 + + casting + + + painting + + + metal process + + + varnish + + + arms and armor + + + medicine + + + household and daily life + + + cultivation + + + stones + + + wood and its coloring + + + tool + + + tricks and sleight of hand + + + decorative + + + animal husbandry + + + glass process + + + corrosives + + + dyeing + + + preserving + + + wax process + + + practical optics + + + lists + + + merchants + + + printing + + + la boutique + + + alchemy + + + manuscript structure + + + + + + + + +

+ Counterfeit coral In this period, counterfeit does not necessarily connote a deceptive practice of imitation. See Lores-Chavez, “Imitating Raw Nature.” +

+ + +

+

One needs to first make the branches of wood or take a bizarre thorn branch, then melt a lb of the most beautiful clear pitch resin and put in one ounce of subtly ground vermilion with walnut oil, and if you add in a little Venice laque platte, the color will be more vivid, and stir everything in the resin melted over a charcoal fire and not of flame, for fear that it catches fire. Next dip in your branches while turning, & if any filaments should remain on it, turn the branch over the heat of the charcoal.

+

Colophony is nothing other than recooked resin. To do it well, take a leaded pot & melt the resin, & boil it over the brazier a good hour, & until it appears not to be thick, but clear & liquid like water, & easily runs & flows from the tip of a stick with which you grind it, & test it. Then pour it through a coarse canvas or a very light tammy cloth, such that when pouring it falls into the strongest vinegar that you can find, for the vinegar gives it strength & prevents it from being so fragile. Reiterate this two or three times & it will be beautiful & well purified. For counterfeiting your coral, you can mix a quarter part of mastic into your purified resin to render it more firm and more beautiful, & if you were to take a single tear of mastic, it would be all the better, but it would be too long.

+

Sulfur & vermilion makes the same effect.

+

The coral made of gules red enamel endures the file and polishing.

+

It is made like cement that is stronger mixed with pestled than of glass rather than with brick. Thus, here one mixes well pestled gules red enamel, which is red in body, with the vermilion. Thus with all colors of enamels.

+
+
+ Varnish for panels +

Take a lb of Venice turpentine & heat it in a pot until it simmers, and put in half a lb of the turpentine oil of the whitest you can find, and stir it together well on a charcoal fire and take it off immediately. And elle it is done. But if it seems too thick to you, add in a little more oil. Similarly if it is too clear, you can thicken it by putting in a little turpentine. Thus you will give it whatever body you want. It could be made well without fire, but, when heated, it is more desiccative. It is appropriate for panel paintings and other painted things without corrupting the colors or yellowing. And it dries both in the shade and in the sun, and overnight, and during the winter as well as in the summer. It is commonly sold 15 sous a lb.

+

A little more turpentine than turpentine oil is needed in order to give body to the varnish, which needs to be applied with the finger in order to spread it thinner and less thick, for when it is thick, it turns yellow and sticks. One does not varnish to make paintings shine, for it just takes the light out of them.

+
+
+ Thick varnish for planks +

There is a varnish that takes a long time to dry & drips more than two months after it has been applied to the planks. But this one does not drip like that of times past, which was made of linseed oil, garlic boiled in it to extinguish it & rid it of grease, & with wheat. And this one yellowed & rendered greenish the blue color of paintings. This one is made like the other one except that one puts coarse common turpentine

+
+
+

But it is used to heighten colors which have soaked in and to keep them from dust. Mastic varnish does not resist rain, whereas that of oil and rosin does.

+
+ +
+

instead of fine turpentine. And you can put into two lb of tou common turpentine one lb of fine turpentine oil & do everything as with the other one. This one will cost you no more than five or six sous per lb & is sold for 40 sous per lb.

+
+ + + +
+

This vessel is for making large quantities of turpentine oil, that is to say a bucket an hour, and no matter which turpentine it may be, whether fine or crude. One needs to give, as you know, a little fire at the beginning and always keep cold water in the cooler on the top. The lb is sold at xii sous, & at the bottom of the vessel remains the colophony, or pix græca.Latin: "Greek pitch" In this vessel, eau-de-vie is also made well, and there is no need to distill it again. You do not need a oven for this copper vessel, but only charcoal around it if it has a flat bottom, but if it is round, you will place it on a trivet.

+

It is better to heat the varnish a little bit, rather than to put it out in the sun, because this makes the panel warp.

+

Some say it is not good to distil in this copper vessel because it makes things green. However, when tinned, it is good.

+
+
+ For varnishing +

Turpentine varnish does not need any glue because it is fatty & viscous & it is not absorbed in the wood like that of spike lavender & sandarac. Also, that of spike lavender does not require any glue on iron & similar things that do not absorb. But on wood & on colors which have do not have gum or distemper glue, it is necessary to lay one coat of the said hide glue & to let it dry & to varnish.

+

diff --git a/data/example.sqlite b/data/example.sqlite new file mode 100644 index 0000000..5886a0a Binary files /dev/null and b/data/example.sqlite differ diff --git a/data/images-example.xml b/data/images-example.xml index 6e98ed1..0668c66 100644 --- a/data/images-example.xml +++ b/data/images-example.xml @@ -286,24 +286,24 @@ Flavius Vopiscus - - + + - + [List of books] Aquatilium animalium historiæ, Hypolito Salviano Typhernate authore, Romæ 1554 - + - - + + - + Les Annales de Normandie @@ -429,11 +429,11 @@ Counterfeit coral - +                       + - + One needs to first make the branches of wood or take a @@ -442,7 +442,7 @@ subtly ground vermilion with walnut oil, and if you add in a little Venice laque platte, the color will be more vivid, and stir everything in the resin melted -over a charcoal fire and not of flame, for fear that it catches fire. +over a charcoal fire and not of flame, for fear that it catches fire. Next dip in your branches while turning, & if any filaments should remain on it, turn the branch over the heat of the charcoal. @@ -513,20 +513,20 @@ drips more than two months after it has been applied to the planks. But this one does not drip like that of times past, which was made of linseed oil, garlic boiled in it -to extinguish it & rid it of grease, +to extinguish it & rid it of grease, & with wheat. And this one yellowed & rendered greenish the blue color of paintings. This one is made like the other one except that one puts coarse common turpentine - - + + But it is used to heighten colors which have soaked in and to keep them from dust. Mastic varnish does not resist rain, -whereas that of oil and rosin does. +whereas that of oil and rosin does. - + @@ -575,7 +575,7 @@ of spike lavender & sandarac. Also, that of spike lavender does not require any glue on iron & similar things that do not absorb. But on wood & on colors -which <-have-> do not have gum or +which <-have-> do not have gum or distemper glue, it is necessary to lay one coat of the said hide glue & to let it dry & to varnish. @@ -684,7 +684,7 @@ open space. - + In five or six lb of oil, one must put one lb @@ -730,4 +730,4 @@ - \ No newline at end of file + diff --git a/docs.md b/docs.md index 5845e48..2d37e38 100644 --- a/docs.md +++ b/docs.md @@ -48,6 +48,16 @@ Optional parameters: * -u base_url * -c: Config file +### `database` + +Process the TEI document into a SQLite file containing a directory of categories and tags. This can be used with the Record List component from the EditionCrafter viewer package, or it can be browsed directly with a SQLite viewer. + +Usage: `editioncrafter database [-i tei_file] [-o output_path]` + +Required parameters: +* -i tei_file +* -o output_path (must end in .sqlite) + ### help Displays this help. diff --git a/package-lock.json b/package-lock.json index a089ec4..0e22fb8 100644 --- a/package-lock.json +++ b/package-lock.json @@ -1,16 +1,17 @@ { "name": "@cu-mkp/editioncrafter-cli", - "version": "1.1.0", + "version": "1.2.0", "lockfileVersion": 3, "requires": true, "packages": { "": { "name": "@cu-mkp/editioncrafter-cli", - "version": "1.1.0", + "version": "1.2.0", "license": "MIT", "dependencies": { "@ungap/structured-clone": "^1.2.0", "axios": "^1.4.0", + "better-sqlite3": "^11.6.0", "csv-parse": "^5.5.6", "genversion": "^3.2.0", "jsdom": "^21.1.2", @@ -1143,6 +1144,57 @@ "resolved": "https://registry.npmjs.org/balanced-match/-/balanced-match-1.0.2.tgz", "integrity": "sha512-3oSeUO0TMV67hN1AmbXsK4yaqU7tjiHlbxRDZOpH0KW9+CeX4bRAaX0Anxt0tx2MrpRpWwQaPwIlISEJhYU5Pw==" }, + "node_modules/base64-js": { + "version": "1.5.1", + "resolved": "https://registry.npmjs.org/base64-js/-/base64-js-1.5.1.tgz", + "integrity": "sha512-AKpaYlHn8t4SVbOHCy+b5+KKgvR4vrsD8vbvrbiQJps7fKDTkjkDry6ji0rUJjC0kzbNePLwzxq8iypo41qeWA==", + "funding": [ + { + "type": "github", + "url": "https://github.com/sponsors/feross" + }, + { + "type": "patreon", + "url": "https://www.patreon.com/feross" + }, + { + "type": "consulting", + "url": "https://feross.org/support" + } + ], + "license": "MIT" + }, + "node_modules/better-sqlite3": { + "version": "11.6.0", + "resolved": "https://registry.npmjs.org/better-sqlite3/-/better-sqlite3-11.6.0.tgz", + "integrity": "sha512-2J6k/eVxcFYY2SsTxsXrj6XylzHWPxveCn4fKPKZFv/Vqn/Cd7lOuX4d7rGQXT5zL+97MkNL3nSbCrIoe3LkgA==", + "hasInstallScript": true, + "license": "MIT", + "dependencies": { + "bindings": "^1.5.0", + "prebuild-install": "^7.1.1" + } + }, + "node_modules/bindings": { + "version": "1.5.0", + "resolved": "https://registry.npmjs.org/bindings/-/bindings-1.5.0.tgz", + "integrity": "sha512-p2q/t/mhvuOj/UeLlV6566GD/guowlr0hHxClI0W9m7MWYkL1F0hLo+0Aexs9HSPCtR1SXQ0TD3MMKrXZajbiQ==", + "license": "MIT", + "dependencies": { + "file-uri-to-path": "1.0.0" + } + }, + "node_modules/bl": { + "version": "4.1.0", + "resolved": "https://registry.npmjs.org/bl/-/bl-4.1.0.tgz", + "integrity": "sha512-1W07cM9gS6DcLperZfFSj+bWLtaPGSOHWhPiGzXmvVJbRLdG82sH/Kn8EtW1VqWVA54AKf2h5k5BbnIbwF3h6w==", + "license": "MIT", + "dependencies": { + "buffer": "^5.5.0", + "inherits": "^2.0.4", + "readable-stream": "^3.4.0" + } + }, "node_modules/boolbase": { "version": "1.0.0", "resolved": "https://registry.npmjs.org/boolbase/-/boolbase-1.0.0.tgz", @@ -1201,6 +1253,30 @@ "node": "^6 || ^7 || ^8 || ^9 || ^10 || ^11 || ^12 || >=13.7" } }, + "node_modules/buffer": { + "version": "5.7.1", + "resolved": "https://registry.npmjs.org/buffer/-/buffer-5.7.1.tgz", + "integrity": "sha512-EHcyIPBQ4BSGlvjB16k5KgAJ27CIsHY/2JBmCRReo48y9rQ3MaUzWX3KVlBa4U7MyX02HdVj0K7C3WaB3ju7FQ==", + "funding": [ + { + "type": "github", + "url": "https://github.com/sponsors/feross" + }, + { + "type": "patreon", + "url": "https://www.patreon.com/feross" + }, + { + "type": "consulting", + "url": "https://feross.org/support" + } + ], + "license": "MIT", + "dependencies": { + "base64-js": "^1.3.1", + "ieee754": "^1.1.13" + } + }, "node_modules/builtin-modules": { "version": "3.3.0", "resolved": "https://registry.npmjs.org/builtin-modules/-/builtin-modules-3.3.0.tgz", @@ -1570,6 +1646,30 @@ "url": "https://github.com/sponsors/wooorm" } }, + "node_modules/decompress-response": { + "version": "6.0.0", + "resolved": "https://registry.npmjs.org/decompress-response/-/decompress-response-6.0.0.tgz", + "integrity": "sha512-aW35yZM6Bb/4oJlZncMH2LCoZtJXTRxES17vE3hoRiowU2kWHaJKFkSBDnDR+cm9J+9QhXmREyIfv0pji9ejCQ==", + "license": "MIT", + "dependencies": { + "mimic-response": "^3.1.0" + }, + "engines": { + "node": ">=10" + }, + "funding": { + "url": "https://github.com/sponsors/sindresorhus" + } + }, + "node_modules/deep-extend": { + "version": "0.6.0", + "resolved": "https://registry.npmjs.org/deep-extend/-/deep-extend-0.6.0.tgz", + "integrity": "sha512-LOHxIOaPYdHlJRtCQfDIVZtfw/ufM8+rVj649RIHzcm/vGwQRXFt6OPqIFWsm2XEMrNIEtWR64sY1LEKD2vAOA==", + "license": "MIT", + "engines": { + "node": ">=4.0.0" + } + }, "node_modules/deep-is": { "version": "0.1.4", "resolved": "https://registry.npmjs.org/deep-is/-/deep-is-0.1.4.tgz", @@ -1595,6 +1695,15 @@ "node": ">=6" } }, + "node_modules/detect-libc": { + "version": "2.0.3", + "resolved": "https://registry.npmjs.org/detect-libc/-/detect-libc-2.0.3.tgz", + "integrity": "sha512-bwy0MGW55bG41VqxxypOsdSdGqLwXPI/focwgTYCFMbdUiBAxLg9CFzG08sz2aqzknwiX7Hkl0bQENjg8iLByw==", + "license": "Apache-2.0", + "engines": { + "node": ">=8" + } + }, "node_modules/devlop": { "version": "1.1.0", "resolved": "https://registry.npmjs.org/devlop/-/devlop-1.1.0.tgz", @@ -1662,6 +1771,15 @@ "resolved": "https://registry.npmjs.org/emojilib/-/emojilib-2.4.0.tgz", "integrity": "sha512-5U0rVMU5Y2n2+ykNLQqMoqklN9ICBT/KsvC1Gz6vqHbz2AXXGkG+Pm5rMWk/8Vjrr/mY9985Hi8DYzn1F09Nyw==" }, + "node_modules/end-of-stream": { + "version": "1.4.4", + "resolved": "https://registry.npmjs.org/end-of-stream/-/end-of-stream-1.4.4.tgz", + "integrity": "sha512-+uw1inIHVPQoaVuHzRyXd21icM+cnt4CzD5rW+NC1wjOUSTOs+Te7FOv7AhN7vS9x/oIyhLP5PR1H+phQAHu5Q==", + "license": "MIT", + "dependencies": { + "once": "^1.4.0" + } + }, "node_modules/enhanced-resolve": { "version": "5.17.1", "resolved": "https://registry.npmjs.org/enhanced-resolve/-/enhanced-resolve-5.17.1.tgz", @@ -2539,6 +2657,15 @@ "node": ">=0.10.0" } }, + "node_modules/expand-template": { + "version": "2.0.3", + "resolved": "https://registry.npmjs.org/expand-template/-/expand-template-2.0.3.tgz", + "integrity": "sha512-XYfuKMvj4O35f/pOXLObndIRvyQ+/+6AhODh+OKWj9S9498pHHn/IMszH+gt0fBCRWMNfk1ZSp5x3AifmnI2vg==", + "license": "(MIT OR WTFPL)", + "engines": { + "node": ">=6" + } + }, "node_modules/fast-deep-equal": { "version": "3.1.3", "resolved": "https://registry.npmjs.org/fast-deep-equal/-/fast-deep-equal-3.1.3.tgz", @@ -2614,6 +2741,12 @@ "node": ">=16.0.0" } }, + "node_modules/file-uri-to-path": { + "version": "1.0.0", + "resolved": "https://registry.npmjs.org/file-uri-to-path/-/file-uri-to-path-1.0.0.tgz", + "integrity": "sha512-0Zt+s3L7Vf1biwWZ29aARiVYLx7iMGnEUl9x33fbB/j3jR81u/O2LbqK+Bm1CDSNDKVtJ/YjwY7TUd5SkeLQLw==", + "license": "MIT" + }, "node_modules/filelist": { "version": "1.0.4", "resolved": "https://registry.npmjs.org/filelist/-/filelist-1.0.4.tgz", @@ -2738,6 +2871,12 @@ "node": ">= 6" } }, + "node_modules/fs-constants": { + "version": "1.0.0", + "resolved": "https://registry.npmjs.org/fs-constants/-/fs-constants-1.0.0.tgz", + "integrity": "sha512-y6OAwoSIf7FyjMIv94u+b5rdheZEjzR63GTyZJm5qh4Bi+2YgwLCcI/fPFZkL5PSixOt6ZNKm+w+Hfp/Bciwow==", + "license": "MIT" + }, "node_modules/function-bind": { "version": "1.1.2", "resolved": "https://registry.npmjs.org/function-bind/-/function-bind-1.1.2.tgz", @@ -2783,6 +2922,12 @@ "url": "https://github.com/privatenumber/get-tsconfig?sponsor=1" } }, + "node_modules/github-from-package": { + "version": "0.0.0", + "resolved": "https://registry.npmjs.org/github-from-package/-/github-from-package-0.0.0.tgz", + "integrity": "sha512-SyHy3T1v2NUXn29OsWdxmK6RwHD+vkj3v8en8AOBZ1wBQ/hCAQ5bAQTD02kW4W9tUp/3Qh6J8r9EvntiyCmOOw==", + "license": "MIT" + }, "node_modules/glob-parent": { "version": "6.0.2", "resolved": "https://registry.npmjs.org/glob-parent/-/glob-parent-6.0.2.tgz", @@ -2902,6 +3047,26 @@ "node": ">=0.10.0" } }, + "node_modules/ieee754": { + "version": "1.2.1", + "resolved": "https://registry.npmjs.org/ieee754/-/ieee754-1.2.1.tgz", + "integrity": "sha512-dcyqhDvX1C46lXZcVqCpK+FtMRQVdIMN6/Df5js2zouUsqG7I6sFxitIC+7KYK29KdXOLHdu9zL4sFnoVQnqaA==", + "funding": [ + { + "type": "github", + "url": "https://github.com/sponsors/feross" + }, + { + "type": "patreon", + "url": "https://www.patreon.com/feross" + }, + { + "type": "consulting", + "url": "https://feross.org/support" + } + ], + "license": "BSD-3-Clause" + }, "node_modules/ignore": { "version": "5.3.2", "resolved": "https://registry.npmjs.org/ignore/-/ignore-5.3.2.tgz", @@ -2949,6 +3114,18 @@ "node": ">=8" } }, + "node_modules/inherits": { + "version": "2.0.4", + "resolved": "https://registry.npmjs.org/inherits/-/inherits-2.0.4.tgz", + "integrity": "sha512-k/vGaX4/Yla3WzyMCvTQOXYeIHvqOKtnqBduzTHpzpQZzAskKMhZ2K+EnBiSM9zGSoIFeMpXKxa4dYeZIQqewQ==", + "license": "ISC" + }, + "node_modules/ini": { + "version": "1.3.8", + "resolved": "https://registry.npmjs.org/ini/-/ini-1.3.8.tgz", + "integrity": "sha512-JV/yugV2uzW5iMRSiZAyDtQd+nxtUnjeLt0acNdw98kKLrvuRVyB80tsREOE7yvGVgalhZ6RNXCmEHkUKBKxew==", + "license": "ISC" + }, "node_modules/is-arrayish": { "version": "0.2.1", "resolved": "https://registry.npmjs.org/is-arrayish/-/is-arrayish-0.2.1.tgz", @@ -4190,6 +4367,18 @@ "node": ">= 0.6" } }, + "node_modules/mimic-response": { + "version": "3.1.0", + "resolved": "https://registry.npmjs.org/mimic-response/-/mimic-response-3.1.0.tgz", + "integrity": "sha512-z0yWI+4FDrrweS8Zmt4Ej5HdJmky15+L2e6Wgn3+iK5fWzb6T3fhNFq2+MeTRb064c6Wr4N/wv0DzQTjNzHNGQ==", + "license": "MIT", + "engines": { + "node": ">=10" + }, + "funding": { + "url": "https://github.com/sponsors/sindresorhus" + } + }, "node_modules/min-indent": { "version": "1.0.1", "resolved": "https://registry.npmjs.org/min-indent/-/min-indent-1.0.1.tgz", @@ -4219,6 +4408,21 @@ "concat-map": "0.0.1" } }, + "node_modules/minimist": { + "version": "1.2.8", + "resolved": "https://registry.npmjs.org/minimist/-/minimist-1.2.8.tgz", + "integrity": "sha512-2yyAR8qBkN3YuheJanUpWC5U3bb5osDywNB8RzDVlDwDHbocAJveqqj1u8+SVD7jkWT4yvsHCpWqqWqAxb0zCA==", + "license": "MIT", + "funding": { + "url": "https://github.com/sponsors/ljharb" + } + }, + "node_modules/mkdirp-classic": { + "version": "0.5.3", + "resolved": "https://registry.npmjs.org/mkdirp-classic/-/mkdirp-classic-0.5.3.tgz", + "integrity": "sha512-gKLcREMhtuZRwRAfqP3RFW+TK4JqApVBtOIftVgjuABpAtpxhPGaDcfvbhNvD0B8iD1oUr/txX35NjcaY6Ns/A==", + "license": "MIT" + }, "node_modules/mlly": { "version": "1.7.3", "resolved": "https://registry.npmjs.org/mlly/-/mlly-1.7.3.tgz", @@ -4266,6 +4470,12 @@ "node": "^10 || ^12 || ^13.7 || ^14 || >=15.0.1" } }, + "node_modules/napi-build-utils": { + "version": "1.0.2", + "resolved": "https://registry.npmjs.org/napi-build-utils/-/napi-build-utils-1.0.2.tgz", + "integrity": "sha512-ONmRUqK7zj7DWX0D9ADe03wbwOBZxNAfF20PlGfCWQcD3+/MakShIHrMqx9YwPTfxDdF1zLeL+RGZiR9kGMLdg==", + "license": "MIT" + }, "node_modules/natural-compare": { "version": "1.4.0", "resolved": "https://registry.npmjs.org/natural-compare/-/natural-compare-1.4.0.tgz", @@ -4313,6 +4523,18 @@ "node": ">=0.10.0" } }, + "node_modules/node-abi": { + "version": "3.71.0", + "resolved": "https://registry.npmjs.org/node-abi/-/node-abi-3.71.0.tgz", + "integrity": "sha512-SZ40vRiy/+wRTf21hxkkEjPJZpARzUMVcJoQse2EF8qkUWbbO2z7vd5oA/H6bVH6SZQ5STGcu0KRDS7biNRfxw==", + "license": "MIT", + "dependencies": { + "semver": "^7.3.5" + }, + "engines": { + "node": ">=10" + } + }, "node_modules/node-emoji": { "version": "2.1.3", "resolved": "https://registry.npmjs.org/node-emoji/-/node-emoji-2.1.3.tgz", @@ -4379,6 +4601,15 @@ "node": ">=0.10.0" } }, + "node_modules/once": { + "version": "1.4.0", + "resolved": "https://registry.npmjs.org/once/-/once-1.4.0.tgz", + "integrity": "sha512-lNaJgI+2Q5URQBkccEKHTQOPaXdUxnZZElQTZY0MFUAuaEqe1E+Nyvgdz/aIyNi6Z9MzO5dv1H8n58/GELp3+w==", + "license": "ISC", + "dependencies": { + "wrappy": "1" + } + }, "node_modules/optionator": { "version": "0.9.4", "resolved": "https://registry.npmjs.org/optionator/-/optionator-0.9.4.tgz", @@ -4654,6 +4885,32 @@ "node": ">=4" } }, + "node_modules/prebuild-install": { + "version": "7.1.2", + "resolved": "https://registry.npmjs.org/prebuild-install/-/prebuild-install-7.1.2.tgz", + "integrity": "sha512-UnNke3IQb6sgarcZIDU3gbMeTp/9SSU1DAIkil7PrqG1vZlBtY5msYccSKSHDqa3hNg436IXK+SNImReuA1wEQ==", + "license": "MIT", + "dependencies": { + "detect-libc": "^2.0.0", + "expand-template": "^2.0.3", + "github-from-package": "0.0.0", + "minimist": "^1.2.3", + "mkdirp-classic": "^0.5.3", + "napi-build-utils": "^1.0.1", + "node-abi": "^3.3.0", + "pump": "^3.0.0", + "rc": "^1.2.7", + "simple-get": "^4.0.0", + "tar-fs": "^2.0.0", + "tunnel-agent": "^0.6.0" + }, + "bin": { + "prebuild-install": "bin.js" + }, + "engines": { + "node": ">=10" + } + }, "node_modules/prelude-ls": { "version": "1.2.1", "resolved": "https://registry.npmjs.org/prelude-ls/-/prelude-ls-1.2.1.tgz", @@ -4686,6 +4943,16 @@ "punycode": "^2.3.1" } }, + "node_modules/pump": { + "version": "3.0.2", + "resolved": "https://registry.npmjs.org/pump/-/pump-3.0.2.tgz", + "integrity": "sha512-tUPXtzlGM8FE3P0ZL6DVs/3P58k9nk8/jZeQCurTJylQA8qFYzHFfhBJkuqyE0FifOsQ0uKWekiZ5g8wtr28cw==", + "license": "MIT", + "dependencies": { + "end-of-stream": "^1.1.0", + "once": "^1.3.1" + } + }, "node_modules/punycode": { "version": "2.3.1", "resolved": "https://registry.npmjs.org/punycode/-/punycode-2.3.1.tgz", @@ -4719,6 +4986,30 @@ } ] }, + "node_modules/rc": { + "version": "1.2.8", + "resolved": "https://registry.npmjs.org/rc/-/rc-1.2.8.tgz", + "integrity": "sha512-y3bGgqKj3QBdxLbLkomlohkvsA8gdAiUQlSBJnBhfn+BPxg4bc62d8TcBW15wavDfgexCgccckhcZvywyQYPOw==", + "license": "(BSD-2-Clause OR MIT OR Apache-2.0)", + "dependencies": { + "deep-extend": "^0.6.0", + "ini": "~1.3.0", + "minimist": "^1.2.0", + "strip-json-comments": "~2.0.1" + }, + "bin": { + "rc": "cli.js" + } + }, + "node_modules/rc/node_modules/strip-json-comments": { + "version": "2.0.1", + "resolved": "https://registry.npmjs.org/strip-json-comments/-/strip-json-comments-2.0.1.tgz", + "integrity": "sha512-4gB8na07fecVVkOI6Rs4e7T6NOTki5EmL7TUduTs6bu3EdnSycntVJ4re8kgZA+wx9IueI2Y11bfbgwtzuE0KQ==", + "license": "MIT", + "engines": { + "node": ">=0.10.0" + } + }, "node_modules/read-pkg": { "version": "5.2.0", "resolved": "https://registry.npmjs.org/read-pkg/-/read-pkg-5.2.0.tgz", @@ -4812,6 +5103,20 @@ "node": ">=8" } }, + "node_modules/readable-stream": { + "version": "3.6.2", + "resolved": "https://registry.npmjs.org/readable-stream/-/readable-stream-3.6.2.tgz", + "integrity": "sha512-9u/sniCrY3D5WdsERHzHE4G2YCXqoG5FTHUiCC4SIbr6XcLZBY05ya9EKjYek9O5xOAwjGq+1JdGBAS7Q9ScoA==", + "license": "MIT", + "dependencies": { + "inherits": "^2.0.3", + "string_decoder": "^1.1.1", + "util-deprecate": "^1.0.1" + }, + "engines": { + "node": ">= 6" + } + }, "node_modules/refa": { "version": "0.12.1", "resolved": "https://registry.npmjs.org/refa/-/refa-0.12.1.tgz", @@ -4955,6 +5260,26 @@ "queue-microtask": "^1.2.2" } }, + "node_modules/safe-buffer": { + "version": "5.2.1", + "resolved": "https://registry.npmjs.org/safe-buffer/-/safe-buffer-5.2.1.tgz", + "integrity": "sha512-rp3So07KcdmmKbGvgaNxQSJr7bGVSVk5S9Eq1F+ppbRo70+YeaDxkw5Dd8NPN+GD6bjnYm2VuPuCXmpuYvmCXQ==", + "funding": [ + { + "type": "github", + "url": "https://github.com/sponsors/feross" + }, + { + "type": "patreon", + "url": "https://www.patreon.com/feross" + }, + { + "type": "consulting", + "url": "https://feross.org/support" + } + ], + "license": "MIT" + }, "node_modules/safer-buffer": { "version": "2.1.2", "resolved": "https://registry.npmjs.org/safer-buffer/-/safer-buffer-2.1.2.tgz", @@ -4994,7 +5319,6 @@ "version": "7.6.3", "resolved": "https://registry.npmjs.org/semver/-/semver-7.6.3.tgz", "integrity": "sha512-oVekP1cKtI+CTDvHWYFUcMtsK/00wmAEfyqKfNdARm8u1wNVhSgaX7A8d4UuIlUI5e84iEwOhs7ZPYRmzU9U6A==", - "dev": true, "bin": { "semver": "bin/semver.js" }, @@ -5027,6 +5351,51 @@ "node": ">=8" } }, + "node_modules/simple-concat": { + "version": "1.0.1", + "resolved": "https://registry.npmjs.org/simple-concat/-/simple-concat-1.0.1.tgz", + "integrity": "sha512-cSFtAPtRhljv69IK0hTVZQ+OfE9nePi/rtJmw5UjHeVyVroEqJXP1sFztKUy1qU+xvz3u/sfYJLa947b7nAN2Q==", + "funding": [ + { + "type": "github", + "url": "https://github.com/sponsors/feross" + }, + { + "type": "patreon", + "url": "https://www.patreon.com/feross" + }, + { + "type": "consulting", + "url": "https://feross.org/support" + } + ], + "license": "MIT" + }, + "node_modules/simple-get": { + "version": "4.0.1", + "resolved": "https://registry.npmjs.org/simple-get/-/simple-get-4.0.1.tgz", + "integrity": "sha512-brv7p5WgH0jmQJr1ZDDfKDOSeWWg+OVypG99A/5vYGPqJ6pxiaHLy8nxtFjBA7oMa01ebA9gfh1uMCFqOuXxvA==", + "funding": [ + { + "type": "github", + "url": "https://github.com/sponsors/feross" + }, + { + "type": "patreon", + "url": "https://www.patreon.com/feross" + }, + { + "type": "consulting", + "url": "https://feross.org/support" + } + ], + "license": "MIT", + "dependencies": { + "decompress-response": "^6.0.0", + "once": "^1.3.1", + "simple-concat": "^1.0.0" + } + }, "node_modules/sisteransi": { "version": "1.0.5", "resolved": "https://registry.npmjs.org/sisteransi/-/sisteransi-1.0.5.tgz", @@ -5139,6 +5508,15 @@ "resolved": "https://registry.npmjs.org/ms/-/ms-2.0.0.tgz", "integrity": "sha512-Tpp60P6IUJDTuOq/5Z8cdskzJujfwqfOTkrwIwj7IRISpnkJnT6SyJ4PCPnGMoFjC9ddhal5KVIYtAt97ix05A==" }, + "node_modules/string_decoder": { + "version": "1.3.0", + "resolved": "https://registry.npmjs.org/string_decoder/-/string_decoder-1.3.0.tgz", + "integrity": "sha512-hkRX8U1WjJFd8LsDJ2yQ/wWWxaopEsABU1XfkM8A+j0+85JAGppt16cr1Whg6KIbb4okU6Mql6BOj+uup/wKeA==", + "license": "MIT", + "dependencies": { + "safe-buffer": "~5.2.0" + } + }, "node_modules/string-width": { "version": "4.2.3", "resolved": "https://registry.npmjs.org/string-width/-/string-width-4.2.3.tgz", @@ -5257,6 +5635,40 @@ "node": ">=6" } }, + "node_modules/tar-fs": { + "version": "2.1.1", + "resolved": "https://registry.npmjs.org/tar-fs/-/tar-fs-2.1.1.tgz", + "integrity": "sha512-V0r2Y9scmbDRLCNex/+hYzvp/zyYjvFbHPNgVTKfQvVrb6guiE/fxP+XblDNR011utopbkex2nM4dHNV6GDsng==", + "license": "MIT", + "dependencies": { + "chownr": "^1.1.1", + "mkdirp-classic": "^0.5.2", + "pump": "^3.0.0", + "tar-stream": "^2.1.4" + } + }, + "node_modules/tar-fs/node_modules/chownr": { + "version": "1.1.4", + "resolved": "https://registry.npmjs.org/chownr/-/chownr-1.1.4.tgz", + "integrity": "sha512-jJ0bqzaylmJtVnNgzTeSOs8DPavpbYgEr/b0YL8/2GO3xJEhInFmhKMUnEJQjZumK7KXGFhUy89PrsJWlakBVg==", + "license": "ISC" + }, + "node_modules/tar-stream": { + "version": "2.2.0", + "resolved": "https://registry.npmjs.org/tar-stream/-/tar-stream-2.2.0.tgz", + "integrity": "sha512-ujeqbceABgwMZxEJnk2HDY2DlnUZ+9oEcb1KzTVfYHio0UE6dG71n60d8D2I4qNvleWrrXpmjpt7vZeF1LnMZQ==", + "license": "MIT", + "dependencies": { + "bl": "^4.0.3", + "end-of-stream": "^1.4.1", + "fs-constants": "^1.0.0", + "inherits": "^2.0.3", + "readable-stream": "^3.1.1" + }, + "engines": { + "node": ">=6" + } + }, "node_modules/thenify": { "version": "3.3.1", "resolved": "https://registry.npmjs.org/thenify/-/thenify-3.3.1.tgz", @@ -5364,6 +5776,18 @@ "integrity": "sha512-oJFu94HQb+KVduSUQL7wnpmqnfmLsOA/nAh6b6EH0wCEoK0/mPeXU6c3wKDV83MkOuHPRHtSXKKU99IBazS/2w==", "dev": true }, + "node_modules/tunnel-agent": { + "version": "0.6.0", + "resolved": "https://registry.npmjs.org/tunnel-agent/-/tunnel-agent-0.6.0.tgz", + "integrity": "sha512-McnNiV1l8RYeY8tBgEpuodCC1mLUdbSN+CYBL7kJsJNInOP8UjDDEwdk6Mw60vdLLrr5NHKZhMAOSrR2NZuQ+w==", + "license": "Apache-2.0", + "dependencies": { + "safe-buffer": "^5.0.1" + }, + "engines": { + "node": "*" + } + }, "node_modules/type-check": { "version": "0.4.0", "resolved": "https://registry.npmjs.org/type-check/-/type-check-0.4.0.tgz", @@ -5530,8 +5954,7 @@ "node_modules/util-deprecate": { "version": "1.0.2", "resolved": "https://registry.npmjs.org/util-deprecate/-/util-deprecate-1.0.2.tgz", - "integrity": "sha512-EPD5q1uXyFxJpCrLnCc1nHnq3gOa6DZBocAIiI2TaSCA7VCJ1UJDMagCzIkXNsUYfD1daK//LTEQ8xiIbrHtcw==", - "dev": true + "integrity": "sha512-EPD5q1uXyFxJpCrLnCc1nHnq3gOa6DZBocAIiI2TaSCA7VCJ1UJDMagCzIkXNsUYfD1daK//LTEQ8xiIbrHtcw==" }, "node_modules/validate-npm-package-license": { "version": "3.0.4", @@ -5716,6 +6139,12 @@ "url": "https://github.com/chalk/wrap-ansi?sponsor=1" } }, + "node_modules/wrappy": { + "version": "1.0.2", + "resolved": "https://registry.npmjs.org/wrappy/-/wrappy-1.0.2.tgz", + "integrity": "sha512-l4Sp/DRseor9wL6EvV2+TuQn63dMkPjZ/sp9XkghTEbV9KlPS1xUsZ3u7/IQO4wxtcFB4bgpQPRcR3QCvezPcQ==", + "license": "ISC" + }, "node_modules/ws": { "version": "8.18.0", "resolved": "https://registry.npmjs.org/ws/-/ws-8.18.0.tgz", diff --git a/package.json b/package.json index c31a8df..57626c2 100644 --- a/package.json +++ b/package.json @@ -1,7 +1,7 @@ { "name": "@cu-mkp/editioncrafter-cli", "type": "module", - "version": "1.1.0", + "version": "1.2.0", "description": "This is the command line tool to take a TEI XML file and turn it into a IIIF Manifest and the necessary Web Annotations to display the text in EditionCrafter.", "author": "Nick Laiacona ", "license": "MIT", @@ -20,6 +20,7 @@ "dependencies": { "@ungap/structured-clone": "^1.2.0", "axios": "^1.4.0", + "better-sqlite3": "^11.6.0", "csv-parse": "^5.5.6", "genversion": "^3.2.0", "jsdom": "^21.1.2", diff --git a/src/db.js b/src/db.js new file mode 100644 index 0000000..3289fe9 --- /dev/null +++ b/src/db.js @@ -0,0 +1,345 @@ +// the database script creates a SQLite database containing the keywords +// and other info from the document + +import fs from 'node:fs' +import process from 'node:process' +import jsdom from 'jsdom' + +// Note: Node 22 includes built-in SQLite support. +// If we ever drop support folder older Node versions, +// we should refactor this to move away from the +// third-party better-sqlite3 package. +import Database from 'better-sqlite3' +import { scrubTree } from './render.js' + +const { JSDOM } = jsdom + +function populateTables(db) { + db.exec(` + CREATE TABLE documents ( + id INTEGER PRIMARY KEY, + name STRING + ); + CREATE TABLE surfaces ( + id INTEGER PRIMARY KEY, + xml_id STRING, + name STRING, + position INTEGER, + document_id INTEGER REFERENCES documents(id) + ); + CREATE TABLE layers ( + id INTEGER PRIMARY KEY, + xml_id STRING, + document_id INTEGER REFERENCES documents(id) + ); + CREATE TABLE taxonomies ( + id INTEGER PRIMARY KEY, + name STRING, + xml_id STRING + ); + CREATE TABLE tags ( + id INTEGER PRIMARY KEY, + name STRING, + xml_id STRING, + taxonomy_id INTEGER REFERENCES taxonomies(id) + ); + CREATE TABLE elements ( + id INTEGER PRIMARY KEY, + name STRING NULL, + type STRING, + layer_id INTEGER REFERENCES layers(id), + surface_id INTEGER REFERENCES surfaces(id), + parent_id INTEGER REFERENCES elements(id) + ); + CREATE TABLE taggings ( + id INTEGER PRIMARY KEY, + element_id INTEGER REFERENCES elements(id), + tag_id INTEGER REFERENCES tags(id) + );`, + ) +} + +async function createDatabase(options) { + if (fs.existsSync(options.outputPath)) { + fs.rmSync(options.outputPath) + } + + const db = new Database(options.outputPath) + + // the better-sqlite3 docs suggest this line for better performance + db.pragma('journal_mode = WAL') + + populateTables(db) + + await parseXml(db, options.inputPath) + + process.on('exit', () => db.close()) +} + +async function parseXml(db, path) { + const xmlFile = fs.readFileSync(path).toString() + + const xml = new JSDOM(xmlFile, { contentType: 'text/xml' }).window.document + + const taxonomies = xml.querySelectorAll('taxonomy') + + const documentId = parseDocument(db, xml) + + for (const tax of taxonomies) { + const xmlId = tax.getAttribute('xml:id') + const biblEl = tax.querySelector('bibl') + + if (!biblEl) { + console.error(`Taxonomy ${xmlId} does not have a name (a element) and will be skipped.`) + continue + } + + const name = biblEl.textContent + + const { lastInsertRowid } = db + .prepare(`INSERT INTO taxonomies (name, xml_id) VALUES (?, ?)`) + .run(name, xmlId) + + parseTaxonomy(db, tax, lastInsertRowid) + } + + parseSurfaces(db, xml, documentId) + parseLayers(db, xml, documentId) +} + +function parseDocument(db, xml) { + const titleEl = xml.querySelector('teiHeader > fileDesc > titleStmt > title') + const name = titleEl?.textContent + ? titleEl.textContent.trim() + : undefined + + if (!name) { + console.error('Document has no title. Please add one.') + process.exit(1) + } + + const { lastInsertRowid } = db + .prepare('INSERT INTO documents (name) VALUES (?)') + .run(name) + + return lastInsertRowid +} + +function parseTaxonomy(db, el, taxonomyId) { + const categories = el.querySelectorAll(':scope > category') + + for (const cat of categories) { + const xmlId = cat.getAttribute('xml:id') + const desc = cat.querySelector('catDesc') + + if (!desc) { + console.error(`Category ${xmlId} does not have a name (which should be contained in a element) and will be skipped.`) + continue + } + + const name = desc.textContent + + db + .prepare('INSERT INTO tags (name, xml_id, taxonomy_id) VALUES (?, ?, ?)') + .run(name, xmlId, taxonomyId) + + const childCategories = cat.querySelectorAll(':scope > category') + + if (childCategories.length > 0) { + console.warn(`Nested category found under ${name}. EditionCrafter does not support nested categories, so this will be skipped.`) + } + } +} + +function parseLayers(db, doc, documentId) { + const layers = doc.querySelectorAll('text, sourceDoc') + + for (const layer of layers) { + const xmlId = layer.getAttribute('xml:id') + + const { lastInsertRowid } = db + .prepare('INSERT INTO layers (xml_id, document_id) VALUES (?, ?)') + .run(xmlId, documentId) + + const pbEls = layer.querySelectorAll('pb') + + parsePbs(db, pbEls, layer, lastInsertRowid) + } +} + +function parsePbs(db, pbEls, layerEl, layerDbId) { + for (const pb of pbEls) { + const surfaceXmlId = pb.getAttribute('facs') + + if (!surfaceXmlId) { + continue + } + + const surfaceLookup = db + .prepare('SELECT id FROM surfaces WHERE surfaces.xml_id = ?') + .get(surfaceXmlId.slice(1)) + + const surfaceDbId = surfaceLookup.id + + if (!surfaceDbId) { + console.log(` element refers to ${surfaceXmlId}, which does not exist.`) + continue + } + + const contents = extractPb(layerEl, surfaceXmlId) + parseTaggedDivs(db, contents, layerDbId, surfaceDbId) + } +} + +function extractPb(layerEl, surfaceID) { + const pbElCount = layerEl.querySelectorAll('pb').length + + for (let i = 0; i < pbElCount; i++) { + // since this function mutates the XML, we need to clone the + // layer element each time + const layerClone = new JSDOM(layerEl.outerHTML, { contentType: 'text/xml' }).window.document + + const pbEls = layerClone.querySelectorAll('pb') + const pbEl = pbEls[i] + const pbSurfaceID = pbEl.getAttribute('facs') + + if (pbSurfaceID && pbSurfaceID === surfaceID) { + const nextPbEl = pbEls[i + 1] + scrubTree(pbEl, 'prev') + if (nextPbEl) { + scrubTree(nextPbEl, 'next') + nextPbEl.parentNode.removeChild(nextPbEl) + } + return layerClone + } + } + return null +} + +function ingestTaggedElement(db, el, type, layerId, surfaceId, parentId) { + const name = getElementName(el) + + let elementDbId + + if (parentId) { + const { lastInsertRowid } = db + .prepare('INSERT INTO elements (name, type, layer_id, surface_id, parent_id) VALUES (?, ?, ?, ?, ?)') + .run(name, type, layerId, surfaceId, parentId) + + elementDbId = lastInsertRowid + } + else { + const { lastInsertRowid } = db + .prepare('INSERT INTO elements (name, type, layer_id, surface_id) VALUES (?, ?, ?, ?)') + .run(name, type, layerId, surfaceId) + + elementDbId = lastInsertRowid + } + + const tagXmlIds = el + .getAttribute('ana') + .split(' ') + // remove the # before each ID + .map(str => str.slice(1)) + + for (const tagXmlId of tagXmlIds) { + const tagLookup = db + .prepare('SELECT id FROM tags WHERE tags.xml_id = ?') + .get(tagXmlId) + + const tagDbId = tagLookup?.id + + if (!tagDbId) { + console.log(`Tag #${tagXmlId} not found in taxonomy element.`) + continue + } + + db + .prepare('INSERT INTO taggings (element_id, tag_id) VALUES (?, ?)') + .run(elementDbId, tagDbId) + } + + return elementDbId +} + +function parseTaggedSegs(db, div, divId, layerId, surfaceId) { + const taggedSegs = div.querySelectorAll('seg[ana]') + + for (const seg of taggedSegs) { + ingestTaggedElement(db, seg, 'seg', layerId, surfaceId, divId) + } +} + +function parseTaggedDivs(db, surfaceContents, layerId, surfaceId) { + const taggedDivs = surfaceContents.querySelectorAll('div[ana]') + + for (const div of taggedDivs) { + const divId = ingestTaggedElement(db, div, 'div', layerId, surfaceId) + + parseTaggedSegs(db, div, divId, layerId, surfaceId) + } +} + +function getElementName(el) { + if (el.nodeName === 'seg') { + return el.textContent + } + + if (el.nodeName === 'zone') { + return el.getAttribute('xml:id') + } + + const headEl = el.querySelector(':scope > head') + + if (headEl) { + return headEl.textContent + } + + return null +} + +function parseZones(db, surface, layerId, surfaceId) { + const taggedZones = surface.querySelectorAll('zone[ana]') + + for (const zone of taggedZones) { + ingestTaggedElement(db, zone, 'zone', layerId, surfaceId) + } +} + +function parseSurfaces(db, xml, documentId) { + const facsimiles = xml.querySelectorAll('facsimile') + + for (const facsimile of facsimiles) { + const facsXmlId = facsimile.getAttribute('xml:id') + + if (facsXmlId) { + const layerResult = db + .prepare('INSERT INTO layers (xml_id, document_id) VALUES (?, ?)') + .run(facsXmlId, documentId) + + const surfaces = xml.querySelectorAll('surface') + + for (let i = 0; i < surfaces.length; i++) { + const surface = surfaces[i] + const xmlId = surface.getAttribute('xml:id') + + const labelEl = surface.querySelector('label') + + if (!labelEl) { + console.error(`Surface ${xmlId} does not have a name (a