This is a collection of scripts and other files to ease handling my huge IBM Documents collection. The collection was once available via my IBMDocs-Website. See this page for details why this sentence is in past tense.
Central to this is the database and some TUI maintenance applications meant for running on OS/400 (IBM i).
All of this is really really special and I'm not sure if this is of real-world value to anyone but me. So far, the helper scripts have grown over time to a tangle and putting it into a repository will certainly help to consolidate "script here, script there, step-by-step guide somewhere else to a single directory.
Same goes for the separate documentation for special-purpose topics. They've started as a checklist/reminder of sorts. I have reworked them to be more general-purpose instead of system-specific, and expanded them with more information to be hopefully of additional value to the reader.
This document is part of the IBM Documentation Utilities, to be found on GitHub - see there for further details. Its content is subject to the CC BY-SA 4.0 license, also known as Attribution-ShareAlike 4.0 International.
All documents (IBM proprietary BOOK and PDF formats) eventually end up in one directory, being named by the document number IBM is usually giving to them. There's a fair amount of PDF format documents without document numbers, not being part of the collection, yet. I have not yet decided how to deal with those.
The documents collection itself is referenced by an automatically generated single-page, huge HTML table. Content is meant to be searched with the web browser's built-in text search function. The table references an external file doctable.css
, which must be made accessible accordingly.
The HTML table features these fields:
- Document: Number, format, date added
- Title
- Released (year)
- Subtitle
The Document-column is somewhat special. The document number is always output. Depending on the available format, additional lines are generated:
- For BOOKs, the string
BOOK
is output as a link to an assumed local install of Library Manager, followed by a more or less random 8 character string, and (optional) the date when this particular format for the document was added. - For PDFs, the string
PDF
is output as a link to the PDF file, optionally followed by the date when this particular format for the document was added.
Documents without a date designation have been part of the collection since it exists. The "added" date was solely meant to help in seeing which new documents are available since when. Since the table is static, I understand this is is of limited use.
It's completely valid that some documents are available in both (IBM proprietary BOOK and PDF) formats.
Said table features in the first column links directly to available PDF documents, and to a locally installed IBM BookManager BookServer 2.3. Unfortunately, the BookServer often opens the wrong file. Not sure how to deal with that. So far, this issue stays unsolved.
BOOK files usually have a DOS compatible file name (8 character string) to make them compatible to PC-DOS, MVS, and the OS/400 DLS. This string is also output in the first column to have quick access via other ways to view BOOK files. Particular options of interest are the AS/400 InfoSeeker, and the OS/390 based BookManager READ/MVS.
The HTML page is generated by a CGI program (written in C) designed to run on OS/400 under control of the IBM http server for OS/400. On my model 150, this application runs for nearly two minutes, for 13,540 documents. So ibmdoc-generate-index.sh
is provided to
- Save the long running CGI's output to a local file,
- fix owner/permissions for documents in the documents collection directory.
Multiple tables accommodate data for eventual creation of the index page:
- newdocspf is meant to temporarily hold metadata from BOOK files manually being derived by following the directions described in books-prepare.md.
- ibmdocpf contains metadata which is agnostic to the document's type, and can be shared amongst a single document in all formats.
- ibmdoctypf contains metadata which is dependent on the document's type. It includes the DOS compatible file name.
All tables primarily relate through the document number, although some manual check SQLs for preparing BOOKs use the DOS compatible file name for cross-reference.
Note that scripts make use of commitment control. This means you must setup journalling for these three files.
Along with the data-holding files, there are some logical files providing different indexes (sorting), and views (subset of fields), being dependent on the data holding files.
- docnbrlf supports code to duplicate document metadata to an already existing (autogenerated) dummy record. This helps to reduce typing effort when handling different revisions of the same document.
- ibmdocpos1..3, as well as ibmdoctyl1..3 are required by the scrolling- and position-to logic in the maintenance applications.
- listdocslf supports the listdocs CGI and outputs only "valid" records, ordered by document title.
"Valid" means, records which feature 1960 as year of release are assumed to have been automatically inserted by ibmdoc-db-lint.pl
, not (yet) manually complemented with metadata and thus are treated as invalid.
DDS descriptions for the database files are readily found in the as400 subdirectory.
This is a text-based full-screen application derived from parts of my AS/400 Subfile Template. It was initially thought as primary means to manually enter metadata for documents.
For further details refer to the accompanying README.
Here comes a short explanation of the files (and directories) contained in this repository, ordered by type and then alphabetically.
books-prepare.md
shows procedures to handle new BOOK files, especially large quantities of them.os390-bookmgr-notes.md
shows procedures how to make available a large number of BOOKs to BookManager/READ MVS.pdf-prepare.md
shows procedures to handle new PDF files, especially large quantities of them.format-bks.md
contains a preliminary description of the textual Book Shelf format.
- doctable.css is included in the main HTML table page (documents list) and needs to be accessible to the browser when loading the page.
Some of these scripts accessing the AS/400 database assume a working ODBC connection to the AS/400 machine.
Also make sure you update the script variables inside to actually reflect your local environment, such as usernames/passwords, and paths.
ibmdoc-copy-unhandled-pdfs.pl
copies PDF files with empty description to a directory, e. g. in ones home directory for inspection over e. g. a Samba share with the excellent Apple Preview.app, or the Finder's built-in preview feature.ibmdoc-create-8char-links.pl
creates hard links from BOOK files (with the document number as file name) to another directory with DLS (DOS compatible) file names. Affiliation is checked by querying the doctype database table. It does no changes to the database, only to the destination directory in the file system. Part ofibmdoc-merge-docs.pl
is doing the reverse. Note that source and destination directories must be on the same file system to satisfy the requirement for creating hard links.ibmdoc-db-lint.pl
script verifies the mutual consistency of database content vs. available files in the file system. It- creates new (dummy) records for every file in the file system not being listed in the database,
- deletes database entries not backed by a file in the file system,
- checks and deletes orphaned metadata entries without being backed by any document in the documents types table,
- and creates new (dummy) records in the documents types table not having a metadata entry.
ibmdoc-generate-index.sh
saves the long running CGI's output to a local file, and fixes owner/permissions for documents in that folder.ibmdoc-list-exceptions.sh
is a helper script to quickly identify "wrong" files erroneously ended up in the documents directory. This applies mainly files not adhering to the IBM document numbers "standard", plus some hard coded exceptions.ibmdoc-merge-docs.pl
must be run as part of the procedures described inbooks-prepare.md
. It copies database records from a table with separately collected metadata for new BOOK files, and creates hard links from the "incoming" directory to the documents directory. Filesystem-wise,ibmdoc-create-8char-links.pl
is doing the reverse. Note that source and destination directories must be on the same file system to satisfy the requirement for creating hard links.
- as400 contains the AS/400 specific database- and screen definitions, along with program code for the maintenance application. There's a
README.md
specifically dealing with the files in there.
How to handle incoming documents with the facilities provided through this repository.
This process is modestly complex, and leaves substantial manual labor in copying metadata from the PDFs to the database. See pdf-prepare.md.
This process is highly complex but leaves no manual labor regarding database metadata updates. See books-prepare.md.
2024-05-30 [email protected]