Skip to content

Getting started

John J Czaplewski edited this page Aug 1, 2016 · 4 revisions

First, fork this repository into your personal Github account (first create one if you do not already have one). Next, clone the repository to your local machine by entering the following command in your terminal:

git clone https://github.com/<your_username>/app-template.git

Navigate into the app-template folder:

cd app-template

Download the example data and put it in the input folder:

curl -o example_input.zip https://geodeepdive.org/dev_subsets/example_input.zip
unzip -j example_input.zip -d ./input
rm example_input.zip

Using the various data products in the input folder, begin to develop your application. For a summary of what kind of data products are available, please see the data products page. Additionally, there is a very simple example application that can be found in the example_app directory of the app-template. What this application does and what kind of output it creates is entirely up to you!

There are two primary ways to use the Natural Language Processing (NLP) products - directly by reading the TSV files into your application, or by first importing them into PostgreSQL and querying them via your application. The latter is recommended because it is easier to write scalable, memory-efficient applications.

If you are using OS X, it is recommended that you use Postgres.app. Download the most recent version, and be sure to follow the instructions for setting up the command line tools. Next, edit the file credentials.yml with your local Postgres credentials. If you are using Postgres.app, your username will be your computer's username. For the database name, choose a name for which a database does not currently exist. By default, Postgres runs on port 5432.

Finally, to create the database, run make local_setup. This will import the TSV files in the input directory into the new Postgres database.

Once you have developed an application that does roughly what you intend it to do, edit the file config.yml with your information. For a description of the fields in that file, please see the config page.

Send an email to [email protected] with a description of your application and link to your Github repository that contains your fork of app-template. We will then use your configuration file to cull the corpus to create an application-specific testing set that contains all the same data products found in the example data testing set. When it has been generated, we will provide you with a link to this data.

Replace the contents of the input directory with your application-specific testing set.

Continue to develop your application. When it produces your desired result, send an email to [email protected] letting us know that your application is ready to be run on our infrastructure. We will then clone your repository to a sandboxed virtual machine on our infrastructure, install any dependencies that your application requires, and run your application against the full corpus subset targeted by your configuration file.

When the application has finished running, we will zip the contents of the output folder and send you a link for you to download it.

You can then use these results to continue to refine your application, or do science!

Clone this wiki locally