TALEN w/ React

This is a rewrite of TALEN: a Tool for Annotation of Low-resource ENtities using React.js and a python backend.

This software was designed for annotating Named Entity Recognition (NER), but can be used for any token-level sequence annotation task.

Check out a demo here: annotate.universalner.org.

Quickstart

Requirements

npm
python 3.6+

Installation

The code is separated into two folders: client/, which holds the frontend, and server/, which holds the backend. Each folder has it's own README file, with more details (probably too many). Installation and running will be done separately for each folder.

To install MongoDB:

$ brew update
$ brew install mongodb

To install the backend:

$ cd server
$ python -m venv cool-environment-name  # virtual env optional but strongly recommended
$ source cool-environment-name/bin/activate
$ pip install -r requirements.txt
cd ..

To install the frontend:

$ cd client
$ npm install
$ cd ..

Running

First, make sure that MongoDB is running locally.

$ bash start_mongo.sh
$ cd server
$ python -m scripts.mongo_stats -e dev   # check that it worked.

You can also check this with check_mongo.sh, and stop it with stop_mongo.sh. (These commands assume that you are on a Mac, and have Homebrew installed. Instructions will be different on Windows and Linux).

$ cd server
$ export ENV=dev && python app.py

This will default to port 8080, but you can change this by setting the $PORT variable.

There are two options for viewing the frontend. If you want to modify it and have it reload automatically, start the node server (in a new terminal):

$ cd client
$ export REACT_APP_URL="http://localhost:8080" && npm run start

If you are ready to start annotating in earnest, compile the react code into static files and serve alongside the flask app. To do this, run (in client/):

$ cd client
$ npm run build

This will create a folder called client/build containing static files. Then, with the backend server running, visit, localhost:8080/.

Data

The primary method for storing data is in MongoDB.

This repo also contains some example datasets in server/data/, as well as corresponding dataset config files in config/datasets/.

Config Files

Every .yml file in config/datasets/ will be loaded as a config file. Each config file must contain:

name: some string identifier
path: path to the dataset
reader: the Python class that will read this data. See server/data_readers for examples.

You may optionally include a list of labels and their colors, but by default each config file inherits the labelset from config/base.yml.

Annotating Universal Dependencies

One of the motivators for writing this software was to annotate Universal Dependencies with NER tags.

To get going with annotation, see this file.

Dockerization

To build:

$ ./run_docker.sh build

To run:

$ export ENV=prod   # currently, you have to run on prod
$ ./run_docker.sh run

Then visit http://localhost:1337 in a browser.

Downloading to BIO format

Run:

$ python -m scripts.download_data_to_bio --environment prod --dataset-name sv_pud-ud-test

Replace sv_pud-ud-test with any dataset you choose. This will download to a single file, and only needs read-only privileges on the Mongo DB.

Managing users

When running the app locally, it will add a default user with username "a" and password "a". When running in production, use the manage_users.py script to add, update, or delete users.

Getting Interannotator Agreement

Run:

$ python -m scripts.get_interannotator_agreement

Deploying to Google Cloud

The repo has an Action defined in /.github/workflows/cloud-run.yml that deploys to Google Cloud Run when merging to master. Notice that MONGO_USERNAME and MONGO_PASSWORD variables are stored as secrets in Google Cloud.

Citation

If you use this in your research paper, please cite us!

@inproceedings{talen2018,
    author = {Stephen Mayhew, Dan Roth},
    title = {TALEN: Tool for Annotation of Low-resource ENtities},
    booktitle = {ACL System Demonstrations},
    year = {2018},
}

You can read the paper here: http://cogcomp.org/papers/MayhewRo18.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TALEN w/ React

Quickstart

Requirements

Installation

Running

Data

Config Files

Annotating Universal Dependencies

Dockerization

Downloading to BIO format

Managing users

Getting Interannotator Agreement

Deploying to Google Cloud

Citation

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 255 Commits
.github/workflows		.github/workflows
client		client
config		config
server		server
.gitignore		.gitignore
.isort.cfg		.isort.cfg
.pre-commit-config.yaml		.pre-commit-config.yaml
Dockerfile		Dockerfile
README.md		README.md
check_mongo.sh		check_mongo.sh
gcloud_run.sh		gcloud_run.sh
run_docker.sh		run_docker.sh
start_mongo.sh		start_mongo.sh
stop_mongo.sh		stop_mongo.sh
talen-screenshot.png		talen-screenshot.png

mayhewsw/talen-react

Folders and files

Latest commit

History

Repository files navigation

TALEN w/ React

Quickstart

Requirements

Installation

Running

Data

Config Files

Annotating Universal Dependencies

Dockerization

Downloading to BIO format

Managing users

Getting Interannotator Agreement

Deploying to Google Cloud

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages