-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
0 parents
commit 06c4508
Showing
24 changed files
with
2,729 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
.DS_Store | ||
|
||
credentials | ||
*.swp |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,91 @@ | ||
# GeoDeepDive Application Template | ||
A template for building applications for [GeoDeepDive](https://geodeepdive.org) | ||
|
||
## Getting started | ||
Dependencies: | ||
+ [GNU Make](https://www.gnu.org/software/make/) | ||
+ [git](https://git-scm.com/) | ||
+ [pip](https://pypi.python.org/pypi/pip) | ||
+ [PostgreSQL](http://www.postgresql.org/) | ||
|
||
### OS X | ||
OS X ships with GNU Make, `git`, and Python, but you will need to install `pip` and PostgreSQL. | ||
|
||
To install `pip`: | ||
```` | ||
sudo easy_install pip | ||
```` | ||
|
||
To install PostgreSQL, it is recommended that you use [Postgres.app](http://postgresapp.com/). Download | ||
the most recent version, and be sure to follow [the instructions](http://postgresapp.com/documentation/cli-tools.html) | ||
for setting up the command line tools, primarily adding the following line to your `~/.bash_profile`: | ||
|
||
```` | ||
export PATH=$PATH:/Applications/Postgres.app/Contents/Versions/latest/bin | ||
```` | ||
|
||
|
||
### Setting up the project | ||
First, clone this repository and run the setup script: | ||
|
||
```` | ||
git clone https://github.com/UW-DeepDiveInfrastructure/app-template | ||
cd app-template | ||
make | ||
```` | ||
|
||
Edit `credentials` with the connection credentials for your local Postgres database. | ||
|
||
To create a database with the data included in `/setup/usgs_example`: | ||
|
||
```` | ||
make local_setup | ||
```` | ||
|
||
To run an example, run `python run.py`. | ||
|
||
## Running on GeoDeepDive Infrastructure | ||
All applications are required to have the same structure as this repository, namely an empty folder named `output`, a valid | ||
`config` file, an updated `requirements.txt` describing any Python dependencies, and `run.py` which runs the application | ||
and outputs results. The `credentials` file will be ignored and substituted with a unique version at run time. | ||
|
||
The GeoDeepDive infrastructure will have the following software available: | ||
+ Python 2.7+ (Python 3.x not supported at this time) | ||
+ PostgreSQL 9.4+, including command line tools and PostGIS | ||
|
||
#### Submitting a config file | ||
The `config` file outlines a list of terms OR dictionaries that you are interested in culling from the corpus. Once you have | ||
updated this file, a private repository will be set up for you under the UW-DeepDiveInfrastructure Github group for you to | ||
push the code from this repository to. Your `config` file will be used to generate a custom testing subset of documents that | ||
you can use to develop your application. | ||
|
||
#### Running the application | ||
Once you have developed your application and tested it against the corpus subset, simply push your application to the | ||
private repository created in the previous step. The application will then be run according to the parameters set in the | ||
`config` file. | ||
|
||
#### Getting results | ||
After the application is run, the contents of the `output` folder will be gzipped and be made available to download. If | ||
an error was encountered or your application did not run successfully any errors thrown will be logged into the file | ||
`errors.txt` which is included in the gzipped results package. | ||
|
||
## File Summary | ||
|
||
#### config | ||
A YAML file that contains project settings. | ||
|
||
|
||
#### credentials | ||
A YAML file that contains local postgres credentials for testing and generating examples. | ||
|
||
|
||
#### requirements.txt | ||
List of Python dependencies to be installed by `pip` | ||
|
||
|
||
#### run.py | ||
Python script that runs the entire application, including any setup tasks and exporting of results to the folder `/output`. | ||
|
||
|
||
## License | ||
CC-BY 4.0 International |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
# The name of the application (no spaces) | ||
app_name: strom | ||
|
||
# First and last name of the user | ||
user: Jon Husson | ||
|
||
# The NLP product to run the application against | ||
product: NLP352 | ||
|
||
# How often the application should be run | ||
frequency: monthly | ||
|
||
# A list of terms used to subset the corpus | ||
terms: [stromatolite, stromatolitic, thrombolite, thrombolitic] | ||
|
||
# Stored dictionary of terms, to be set by GDD infrastructure admins | ||
dictionary: strom |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
postgres: | ||
user: postgres_username | ||
port: 5432 | ||
host: localhost | ||
database: deepdive_app | ||
password: password123 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,31 @@ | ||
#============================================================================== | ||
# PG DUMP FOR RESULTS | ||
#============================================================================== | ||
|
||
pg_dump -t results -t strat_target -t strat_target_distant -t age_check -t bib -t target_adjectives DBNAME > ./output/output.sql | ||
|
||
#============================================================================== | ||
# CREATE (ALREADY PRESENT) DATABASE FROM DUMP | ||
#============================================================================== | ||
|
||
psql -d DBNAME -f ../output/output.sql | ||
|
||
#============================================================================== | ||
# USEFUL SQL QUERIES FOR SUMMARY RESULTS | ||
#============================================================================== | ||
|
||
COPY(SELECT strat_phrase_root,strat_name_id, COUNT(strat_name_id) | ||
FROM results | ||
WHERE (strat_name_id<>'0' AND target_word ILIKE '%stromato%') | ||
GROUP BY strat_phrase_root, strat_name_id) | ||
TO '/Users/jhusson/Box Sync/postdoc/deepdive/stroms/V2/test.csv' DELIMITER ',' CSV HEADER; | ||
|
||
#============================================================================== | ||
# INTERESTING STROMATOLITE ADJECTIVES | ||
#============================================================================== | ||
|
||
SELECT * from target_adjectives WHERE target_adjective ILIKE 'domal' OR | ||
target_adjective ILIKE 'columnar' OR | ||
target_adjective ILIKE 'conical' OR | ||
target_adjective ILIKE 'domical' OR | ||
target_adjective ILIKE 'domed' |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
deepdivesubmit.chtc.wisc.edu/static/strom_nlp_27Jan2016.zip |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
all: | ||
cp credentials.example credentials; | ||
pip install -r requirements.txt; | ||
|
||
|
||
|
||
local_setup: | ||
./setup/setup.sh |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
* | ||
!.gitignore |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
psycopg2>=2.6.1 | ||
pyyaml>=3.11 | ||
tqdm>=1.0 | ||
stop-words>=2015.2.23.1 | ||
docopt>=0.6.1 | ||
numpy>=1.9.2 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,78 @@ | ||
#============================================================================== | ||
#RUN ALL - STROMATOLITES | ||
#============================================================================== | ||
|
||
#path: /Users/jhusson/local/bin/deepdive-0.7.1/deepdive-apps/stromatolites | ||
|
||
#============================================================================== | ||
|
||
import os, time, subprocess, yaml | ||
|
||
#tic | ||
start_time = time.time() | ||
|
||
#load configuration file | ||
with open('./config', 'r') as config_yaml: | ||
config = yaml.load(config_yaml) | ||
|
||
#load credentials file | ||
with open('./credentials', 'r') as credential_yaml: | ||
credentials = yaml.load(credential_yaml) | ||
|
||
|
||
#ensure working directory is proper | ||
#os.chdir("/Users/jhusson/local/bin/deepdive-0.7.1/deepdive-apps/stromatolites") | ||
|
||
#INITALIZE THE POSTGRES TABLES | ||
print 'Step 1: Initialize the PSQL tables ...' | ||
subprocess.call('./setup/setup.sh', shell=True) | ||
os.system('python ./udf/initdb.py') | ||
|
||
#BUILD THE BIBLIOGRAPHY | ||
print 'Step 2: Build the bibliography ...' | ||
os.system('python ./udf/buildbib.py') | ||
|
||
#FIND TARGET INSTANCES | ||
print 'Step 3: Find stromatolite instances ...' | ||
os.system('python ./udf/ext_target.py') | ||
|
||
#FIND STRATIGRAPHIC ENTITIES | ||
print 'Step 4: Find stratigraphic entities ...' | ||
os.system('python ./udf/ext_strat_phrases.py') | ||
|
||
#FIND STRATIGRAPHIC MENTIONS | ||
print 'Step 5: Find stratigraphic mentions ...' | ||
os.system('python ./udf/ext_strat_mentions.py') | ||
|
||
#CHECK AGE - UNIT MATCH AGREEMENT | ||
print 'Step 6: Check age - unit match agreement ...' | ||
os.system('python ./udf/ext_age_check.py') | ||
|
||
#DEFINE RELATIONSHIPS BETWEEN TARGET AND STRATIGRAPHIC NAMES | ||
print 'Step 7: Define the relationships between stromatolite phrases and stratigraphic entities/mentions ...' | ||
os.system('python ./udf/ext_strat_target.py') | ||
|
||
#DEFINE RELATIONSHIPS BETWEEN TARGET AND DISTANT STRATIGRAPHIC NAMES | ||
print 'Step 8: Define the relationships between stromatolite phrases and distant stratigraphic entities/mentions ...' | ||
os.system('python ./udf/ext_strat_target_distant.py') | ||
|
||
#DEFINE RELATIONSHIPS BETWEEN TARGET AND DISTANT STRATIGRAPHIC NAMES | ||
print 'Step 9: Delineate reference section from main body extractions ...' | ||
os.system('python ./udf/ext_references.py') | ||
|
||
#BUILD A BEST RESULTS TABLE OF STROM-STRAT_NAME TUPLES | ||
print 'Step 10: Build a best results table of strom-strat_name tuples ...' | ||
os.system('python ./udf/ext_results.py') | ||
|
||
#FIND ADJECTIVES DESCRIBING STROM | ||
print 'Step 11: Find adjectives describing strom target words ...' | ||
os.system('python ./udf/ext_target_adjective.py') | ||
|
||
#POSTGRES DUMP | ||
print 'Step 12: Dump select results from PSQL ...' | ||
output = 'pg_dump -U '+ credentials['postgres']['user'] + ' -t results -t strat_target -t strat_target_distant -t age_check -t refs_location -t bib -t target_adjectives -d ' + credentials['postgres']['database'] + ' > ./output/output.sql' | ||
subprocess.call(output, shell=True) | ||
|
||
#summary of performance time | ||
elapsed_time = time.time() - start_time | ||
print '\n ###########\n\n elapsed time: %d seconds\n\n ###########\n\n' %(elapsed_time) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,53 @@ | ||
#!/bin/bash | ||
|
||
# via http://stackoverflow.com/a/21189044/1956065 | ||
function parse_yaml { | ||
local prefix=$2 | ||
local s='[[:space:]]*' w='[a-zA-Z0-9_]*' fs=$(echo @|tr @ '\034') | ||
sed -ne "s|^\($s\):|\1|" \ | ||
-e "s|^\($s\)\($w\)$s:$s[\"']\(.*\)[\"']$s\$|\1$fs\2$fs\3|p" \ | ||
-e "s|^\($s\)\($w\)$s:$s\(.*\)$s\$|\1$fs\2$fs\3|p" $1 | | ||
awk -F$fs '{ | ||
indent = length($1)/2; | ||
vname[indent] = $2; | ||
for (i in vname) {if (i > indent) {delete vname[i]}} | ||
if (length($3) > 0) { | ||
vn=""; for (i=0; i<indent; i++) {vn=(vn)(vname[i])("_")} | ||
printf("%s%s%s=\"%s\"\n", "'$prefix'",vn, $2, $3); | ||
} | ||
}' | ||
} | ||
|
||
eval $(parse_yaml credentials) | ||
eval $(parse_yaml config) | ||
|
||
export PGPASSWORD=$postgres__password | ||
|
||
pwd=$(pwd) | ||
|
||
# Create the database - if it exists an error will be thrown which can be ignored | ||
createdb $postgres__database -h $postgres__host -U $postgres__user -p $postgres__port | ||
|
||
# Vanilla NLP | ||
echo "DROP TABLE IF EXISTS ${app_name}_sentences_nlp; CREATE TABLE ${app_name}_sentences_nlp (docid text, sentid integer, wordidx integer[], words text[], poses text[], ners text[], lemmas text[], dep_paths text[], dep_parents integer[], font text[], layout text[]);" | psql -U $postgres__user -h $postgres__host -p $postgres__port $postgres__database | ||
|
||
echo "CREATE INDEX ON ${app_name}_sentences_nlp (docid);" | psql -U $postgres__user -h $postgres__host -p $postgres__port $postgres__database | ||
echo "CREATE INDEX ON ${app_name}_sentences_nlp (sentid);" | psql -U $postgres__user -h $postgres__host -p $postgres__port $postgres__database | ||
|
||
echo "COPY ${app_name}_sentences_nlp FROM '$pwd/input/sentences_nlp'" | psql -U $postgres__user -h $postgres__host -p $postgres__port $postgres__database | ||
|
||
# NLP352 | ||
echo "DROP TABLE IF EXISTS ${app_name}_sentences_nlp352; CREATE TABLE ${app_name}_sentences_nlp352 (docid text, sentid integer, wordidx integer[], words text[], poses text[], ners text[], lemmas text[], dep_paths text[], dep_parents integer[]);" | psql -U $postgres__user -h $postgres__host -p $postgres__port $postgres__database | ||
|
||
echo "CREATE INDEX ON ${app_name}_sentences_nlp352 (docid);" | psql -U $postgres__user -h $postgres__host -p $postgres__port $postgres__database | ||
echo "CREATE INDEX ON ${app_name}_sentences_nlp352 (sentid);" | psql -U $postgres__user -h $postgres__host -p $postgres__port $postgres__database | ||
|
||
echo "COPY ${app_name}_sentences_nlp352 FROM '$pwd/input/sentences_nlp352'" | psql -U $postgres__user -h $postgres__host -p $postgres__port $postgres__database | ||
|
||
# NLP352 Bazaar | ||
echo "DROP TABLE IF EXISTS ${app_name}_sentences_nlp352_bazaar; CREATE TABLE ${app_name}_sentences_nlp352_bazaar (docid text, sentid integer, sentence text, words text[], lemmas text[], poses text[], ners text[], character_position integer[], dep_paths text[], dep_parents integer[]);" | psql -U $postgres__user -h $postgres__host -p $postgres__port $postgres__database | ||
|
||
echo "CREATE INDEX ON ${app_name}_sentences_nlp352_bazaar (docid);" | psql -U $postgres__user -h $postgres__host -p $postgres__port $postgres__database | ||
echo "CREATE INDEX ON ${app_name}_sentences_nlp352_bazaar (sentid);" | psql -U $postgres__user -h $postgres__host -p $postgres__port $postgres__database | ||
|
||
echo "COPY ${app_name}_sentences_nlp352_bazaar FROM '$pwd/input/sentences_nlp352_bazaar'" | psql -U $postgres__user -h $postgres__host -p $postgres__port $postgres__database |
Oops, something went wrong.