The central creation for a Bookworm is handled by a Makefile.
The easiest way to create a Bookworm is simply to make sure you have stored the files as described above at:
files/texts/input.txt
(or a number of files atfiles/texts/raw
files/metadata/jsoncatalog.txt
files/metadata/field_descriptions.json
Once this is done, simply run make
. The first time you do this, it will prompt you for the name of the Bookworm, username, and password to build; then it will run through all the data and build the Bookworm in place.
The bookworm is the name of the database (and the website): it can't include spaces.
The username and password are not your personal username and password. They are, instead, a default username and password that the web site will use. This is for an extra layer of security; all Bookworm queries are performed by a user with no privileges to change the database. In general, the right place to store this is at your system-level my.cnf
file. (It may be located somewhere like /etc/my.cnf
on OS X, or /etc/mysql/my.cnf
on Ubuntu.) These must be the same as the username and password that Apache will use when executing a CGI script.
All jobs are dispatched through the Makefile--if you can read through the dependency chain to see how it's put together, you'll understand all the elements.
For reference, the general workflow of the Makefile is the following:
- Build the directory structure in
files/texts/
. - Derive
files/metadata/field_descriptions_derived.json
fromfiles/metadata/field_descriptions.json
. - Derive
files/metadata/jsoncatalog_derived.txt
fromfiles/metadata/jsoncatalog.txt
. - Create metadata catalog files in
files/metadata/
. - Create, if not pre-defined, a file at
files/wordlist/wordlist.txt
that defines the tokens that will be counted (the million most common tokens). - Encode unigrams and bigrams from the binaries into
files/encoded
. - Load wordcounts into MySQL database.
- Load metadata into MySQL database.
- Create temporary MySQL table and .json file that will be used by the web app.
At any point, you can backtrack part of the way by clearing out files from files/targets
.