-
Notifications
You must be signed in to change notification settings - Fork 48
Getting started
First, fork this repository into your personal Github account (first create one if you do not already have one). Next, clone the repository to your local machine by entering the following command in your terminal:
git clone https://github.com/<your_username>/app-template.git
Navigate into the app-template
folder:
cd app-template
Download the example data and put it in the input
folder:
curl -o example_input.zip https://geodeepdive.org/dev_subsets/example_input.zip
unzip -j example_input.zip -d ./input
rm example_input.zip
Using the various data products in the input
folder, begin to develop your application. For a summary of what kind of data products are available, please see the data products page. Additionally, there is a very simple example application that can be found in the example_app
directory of the app-template. What this application does and what kind of output it creates is entirely up to you!
There are two primary ways to use the Natural Language Processing (NLP) products - directly by reading the TSV files into your application, or by first importing them into PostgreSQL and querying them via your application. The latter is recommended because it is easier to write scalable, memory-efficient applications.
If you are using OS X, it is recommended that you use Postgres.app. Download
the most recent version, and be sure to follow the instructions
for setting up the command line tools. Next, edit the file credentials.yml
with your local Postgres credentials. If you are using Postgres.app, your username will be your computer's username. For the database name, choose a name for which a database does not currently exist. By default, Postgres runs on port 5432
.
Finally, to create the database, run make local_setup
. This will import the TSV files in the input
directory into the new Postgres database.
Once you have developed an application that does roughly what you intend it to do, edit the file config.yml
with your information. For a description of the fields in that file, please see the config page.
Send an email to [email protected] with a description of your application and link to your Github repository that contains your fork of app-template
. We will then use your configuration file to cull the corpus to create an application-specific testing set that contains all the same data products found in the example data testing set. When it has been generated, we will provide you with a link to this data.
Replace the contents of the input
directory with your application-specific testing set.
Continue to develop your application. When it produces your desired result, send an email to [email protected] letting us know that your application is ready to be run on our infrastructure. We will then clone your repository to a sandboxed virtual machine on our infrastructure, install any dependencies that your application requires, and run your application against the full corpus subset targeted by your configuration file.
When the application has finished running, we will zip the contents of the output
folder and send you a link for you to download it.
You can then use these results to continue to refine your application, or do science!