The virtual workspace of the WhatEvery1Says (WE1S) project.
The workspace is implemented using a notebook web interface (Jupyter notebooks) and supports topic modeling (with MALLET) and browsing / visualization (with DFR-browser).
The WE1S project uses this to track the word "humanities" and related concepts as they appear in public discourse.
The workspace is based on a Jupyter notebooks environment using a mix of Python and R notebooks.
- Projects are created using a template copying system.
- Import, cleaning, and deduplication and handled by custom scripts
- Topic models are generated by
mallet
- Visualizations are produced as interactive web browsers generated by
dfrbrowser
Work proceeds by:
- naming and creating a new project
- customizing the import data
- cleaning the data
- creating a topic model
- making an interactive topic browser
- browsing online or downloading
NOTE: Depending on Jupyter configuration, the website may be protected by a password.
new_topic_browser.ipynb
: Name and create a new project folder from a template which includes a series of project generation notebooks, a project directory structure, and a collection of utility scripts and configuration files.
Subsequent steps occur inside the project folder at /projects/[NEWPROJECTNAME]/
, and these notebooks can be modified and their settings saved for each project.
Each project comes with a series of project notebooks in a numbered workflow. Customize and then run the notebooks in order — they often depend on previous steps in order to work. Each notebook in the series will generate a link to the next notebook after it is finished running.
1_import_data.ipynb
2_clean_data.ipynb
3_make_topic_model.ipynb
4_make_topic_browser.ipynb
With each notebook:
- Open the notebook.
- Customize settings.
- Run with menu:
Cell > Run All
. - Progress will be indicated by * in the running box.
- Click the link to launch the next notebook in a new tab.
- When done, select
File > Close and Halt
Notebooks can be customized and saved — they act as a record of project settings, and can also be expanded to incorporate special project-specific code. Notebook resources (such as stopword lists, scrubbing configuration files, utility scripts etc.) can also be customized per-project.
Each project also contains a batch script in scripts/run_all.ipynb
.
This can be used to run all project notebooks at once (after they have been configured) or to re-run all of them at once.
The 4_make_topic_browser.ipynb
notebook generates a new “DFR Browser” interactive visualization website for exploring the topic model. The browser is built from the topic model created in the project step. When it is created this browser is automatically published to a live website. It is simultaneously zipped up and linked as a downloadable package for offline viewing.
This will create your DFR-Browser from the topic model files and generate links for you to either download and host the DFR-Browser site yourself or link to a virtual machine that hosts a front-facing DFR-Browser site at mirrormask.english.ucsb.edu:10001/[NEWPROJECTNAME]/browser/
.
Browse the site at your leisure!
If results from a particular project are significant, let the project stand as a record of that experiment. Then start a new related project with different settings and try something new!
Project directories are full workspaces -- data and scripts can be modified or annotated, and projects can add additional Jupyter notebooks for running data analyses and/or visualizations using Python, R, or any of the many other languages that Jupyter supports.
Currently, the "New DFR Project" is the only project template that this environment supports. However, the workflow could be extended -- e.g. "New Gephi Project" "New D3 Project" etc. etc....