evidence
-- a doc2Vec-based assisted close reading tool with support for abstract concept-based search and context-based search.
Five recommendations for fair software from fair-software.nl | Badges |
---|---|
1. Code repository | |
2. License | |
3. Community registry | |
4. Enable citation | |
5. Checklist | N/A |
Other best practices | |
Test model generation | |
Frontend | |
docker-compose | |
GitHub Super Linter | |
Markdown Link Checker |
While research in the humanities has been able to leverage the digitization of text corpora and the development of computer based text analysis tools to its benefit, the interface current systems provide the user with is incompatible with the proven method of scholarly close reading of texts which is key in many research scenarios pursuing complex research questions.
What this boils down to, is the fact that it is often restrictive and difficult, if not impossible, to formulate adequate selection criteria, in particular for more complex or abstract concepts, in the framework of a keyword based search which is the standard entry point to digitized text collections.
evidence
provides an alternative, intuitive entry point into collections by leveraging the doc2vec framework. Using doc2vec evidence
learns abstract representations of the theme and content of the elements of the user's corpus. Then, instead of trying to translate the scientific query into keywords, after compiling a set of relevant elements as starting points, i.e. examples of the concept the user is interested in, the user can query the corpus based on these examples of their concept of interest. Specifically, evidence
retrieves elements with similar abstract representations and presents them to the user, using the users feedback to refine its retrieval.
Furthermore, this concept-based query mode is complemented by the ability to perform additional retrieval using more-like-this
context based retrieval function provided by elasticsearch
.
Together, this enables a user to combine the power of a close-reading approach with that of a large digitized corpus, selecting elements from the entire corpus which are likely to be of interest, but leaving the decision up to the user as to what evidence they deem useful.
The repository contains a demonstration including a corpus and a model. The demonstration allows you the explore the features of this software without supplying your own corpus.
Prerequisites:
First test that the docker installation is working. Depending on your system, you need to use either a PowerShell (on Windows) or a terminal (on Linux or on MacOS).
-
For Windows:
Open a Powershell prompt (press Windows+S and type Powershell) and run:
docker run hello-world
-
For Linux/MacOs:
Open a terminal and run:
docker run hello-world
This should show a message that your Docker installation is working correctly. If so, we can proceed to the installation of evidence
, otherwise we suggest to check the Docker troubleshooting page.
Download a copy of evidence archive and extract its contents on your machine.
Alternatively, if you have git installed, you can also clone the repository.
git clone https://github.com/ADAH-EviDENce/evidence.git
-
For Windows:
-
Open a Powershell prompt
-
Change your current working directory to where you extracted the files. For instance:
cd C:\Users\JohnDoe\Downloads\evidence-master\evidence-master
-
-
Linux/MacOS:
-
Open a terminal
-
Change your current working directory to where you extracted the files. For instance:
cd /home/JohnDoe/Downloads/evidence
-
The demo can be started with the commands below. Keep this PowerShell/Terminal window open and running during the demo.
-
Set the experiment name
For Windows:
$Env:EXPERIMENT="getuigenverhalen"
For Linux/MacOS:
export EXPERIMENT="getuigenverhalen"
-
Start the demo
docker-compose up --build
The command above downloads necessary Docker images, builds all the Docker images and starts the demo.
The command prints many log messages. If all goes well, the last lines of the output should be:
...
indexer_1 | Indexing done.
evidence-master_indexer_1 exited with code 0
Check troubleshooting if you have any issues about this step.
Go to the following URL in your web browser: http://localhost:8080/.
Once you are done with exploring the demo, you can stop it by selecting the PowerShell/Terminal that is still running the demo and press Ctrl+C.
Verify that your docker-compose
version is at least 1.25.4. (Earlier versions may work).
docker-compose --version
Verify that your docker
version is at least 19.03.12. (Earlier versions may work).
docker --version
If you want to use your own corpus, refer to ./experiments/README.md for notes on the required format and directory layout.
Define the name of the dataset/experiment. Here we choose 'getuigenverhalen'. The corpus files should reside under /experiments/<EXPERIMENT>/corpus
, see sample corpora.
export EXPERIMENT=getuigenverhalen
Be aware that building can take a couple of minutes.
# (starting from the repo root directory)
docker-compose --file generate-model.yml build generate-model
# (starting from the repo root directory)
docker-compose --file generate-model.yml run --user $(id -u):$(id -g) generate-model
# (starting from the repo root directory)
export EXPERIMENT=getuigenverhalen
docker-compose build
docker-compose up
Frontend should now be usable at http://localhost:8080
.
We strongly suggest not making the frontend available publicly as there is no authentication. Anyone with the url will have access to the frontend. Running it on a local network, for example a university network, should be protected from most evil-doers.
Besides interaction with a web browser you can also interact with the frontend from the command line see here and here for examples.
The first page of the frontend forces you to select a user or 'gebruiker' in Dutch.
A user called demo
exists and can be selected.
The initial user named demo
can be renamed by setting the FRONTEND_USER
environment variable before running docker-compose up
.
For example to have myinitialusername
as a user name, do the following:
# (starting from the repo root directory)
export EXPERIMENT=getuigenverhalen
export FRONTEND_USER=myinitialusername
docker-compose up
If the existing user is not enough, you can add a new user to the frontend with the following command
(you can choose your own username by replacing mynewusername
value in the command below):
export EXPERIMENT=getuigenverhalen
export FRONTEND_USER=mynewusername
docker-compose run usercreator
To add more users, repeat the command with different values for FRONTEND_USER
.
When updating the documentation, you can check if the links are all working by running:
npm install
npm run mlc
https://github.com/ADAH-EviDENce/EviDENce_doc2vec_docker_framework