Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Adding SRL #215

Open
wants to merge 127 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 116 commits
Commits
Show all changes
127 commits
Select commit Hold shift + click to select a range
53c4ee1
Initial commit
christos-c Dec 12, 2014
d31108b
Added pom.xml back to git
christos-c Dec 12, 2014
ddef851
nlp-pipeline integration fully functional
christos-c Dec 22, 2014
7e0add5
Fixed an error in the WSJ train/dev sections split
christos-c Dec 31, 2014
e94d688
Minor changes
christos-c Jan 27, 2015
794bd04
Added JUnit tests
christos-c May 13, 2015
6eeded1
Declared gurobi optional
christos-c May 13, 2015
9bee145
Reverted back to nlp-pipeline-0.1.1
christos-c May 21, 2015
93352eb
Fixed bug in TextPreProcessor
christos-c May 21, 2015
80fd99e
Fixed dash ('-') bug in Nom SRL
christos-c Jul 7, 2015
e766eb4
Added an option for max inference rounds
christos-c Jul 8, 2015
09216e6
Added sense description ("name") to FrameData
christos-c Jul 21, 2015
0bd5517
Merge remote-tracking branch 'origin/master'
christos-c Jul 21, 2015
4a15867
moving to new deps 1
shyamupa Sep 29, 2015
80a30f0
moving to new deps 2
shyamupa Sep 29, 2015
b011ba6
moving to new deps 3
shyamupa Sep 29, 2015
da09783
final moving all to new deps 4
shyamupa Sep 29, 2015
39b1a5e
adding column writer from old edison
shyamupa Sep 29, 2015
527ac81
should compile. v5.1.3
shyamupa Sep 29, 2015
e9a68aa
WIP for moving to new dep.
shyamupa Sep 30, 2015
f322db1
which inference to use?
shyamupa Sep 30, 2015
400d6fa
compiles
shyamupa Sep 30, 2015
8a088e3
config madness. Need one config? or many?
shyamupa Sep 30, 2015
39993e1
New config for pipeline
christos-c Sep 30, 2015
dd07d30
this version goes upto feature extraction smoothly
shyamupa Oct 1, 2015
8dbadf3
can train sense still training identifier
shyamupa Oct 2, 2015
3722c3e
also try SP
shyamupa Oct 6, 2015
2afb706
confiugrable learner
shyamupa Oct 6, 2015
2386ec7
adding learner config
shyamupa Oct 7, 2015
e744814
all models trained!
shyamupa Oct 11, 2015
760b95f
working version with illinois-sl. v.5.1.4
shyamupa Oct 11, 2015
b8d34db
config now points to my models
shyamupa Oct 11, 2015
75db17e
tested interactively
shyamupa Oct 12, 2015
e46916b
turned off -ea for assertions.
shyamupa Oct 13, 2015
f881cff
Cleaning up
christos-c Oct 14, 2015
ae4e3ce
no nom model now
shyamupa Oct 15, 2015
7e2c3e7
Massive cleaning up
christos-c Oct 21, 2015
ae89da4
Completed README
christos-c Oct 21, 2015
9c03a57
Merge branch 'shyam' into 'master'
christos-c Oct 21, 2015
ee4fa4b
Added license
christos-c Oct 22, 2015
a645b1b
modifying srllabler init method, also adding getDescription in framem…
shyamupa Oct 22, 2015
a12aaf6
updating pom to 5.1.5
shyamupa Oct 22, 2015
c79f40f
Merge branch 'shyam'
shyamupa Oct 22, 2015
d1ed497
adding arg to factory client. mvn test passes now
shyamupa Oct 22, 2015
d693602
reverting change from last push
shyamupa Oct 22, 2015
550d228
all should be fixed now
shyamupa Oct 22, 2015
5ea7da9
Updated to illinois-cogcomp-nlp dependencies
christos-c Oct 23, 2015
c60c57b
numFeat threads as config param
shyamupa Nov 2, 2015
803bb24
moves to 5.1.6 still using models from 5.1.5
shyamupa Nov 2, 2015
4c98921
Merge branch 'threads' into 'master'
christos-c Nov 2, 2015
7f4d3b6
two config madness handled
shyamupa Nov 2, 2015
091cfde
Merge branch 'newthreads' into 'master'
christos-c Nov 2, 2015
99e7875
Updated Edison dependency
christos-c Nov 3, 2015
1260b53
new models
shyamupa Nov 3, 2015
6bfb3fa
Merge branch 'sshyam' into 'master'
shyamupa Nov 3, 2015
79de23f
Removed jwnl config file dependency
christos-c Nov 19, 2015
6827469
Updated models version
christos-c Nov 19, 2015
2923071
Updated dependencies versions
christos-c Nov 19, 2015
122767c
Merge branch 'christos-pipeline-fix' into 'master'
christos-c Nov 23, 2015
e191bdf
Created a bypass for getting top-k predictions before the ILP
christos-c Dec 8, 2015
298fa23
Removed pred-arg evaluator
christos-c Dec 9, 2015
a17d815
updated to follow changes to Annotator API
Dec 21, 2015
5c1306b
added Configurators for SRL to replace previous configuration code, b…
Dec 21, 2015
1db66f9
added minimal documentation to new classes
Dec 21, 2015
6890b3d
changed SRLProperties to use Configurators; if a file name is given, …
Dec 22, 2015
9053fbb
updated TextPreProcessor configuration to match SRL config
Dec 22, 2015
a8e3f30
updated SRL with constructor with no config file
Dec 22, 2015
4a3f0e1
updated SRL constructors to use ResourceManagers
Dec 22, 2015
db4f56d
updated SRL constructors to use ResourceManagers *correctly*
Dec 22, 2015
d94ab6b
Trying to make SRL conform to expected Annotator behavior, using View…
Dec 22, 2015
8c5b106
Minor changes for pipeline support
christos-c Dec 23, 2015
b518ebf
Fixed minor logging error
christos-c Dec 23, 2015
4d9070a
Merge remote-tracking branch 'remotes/origin/christos-ilp-updates' in…
Dec 25, 2015
48e310d
Merge branch 'mssammon' into 'master'
Dec 25, 2015
a2c9f40
Updated to latest dependencies
christos-c Jan 17, 2016
d9c1eec
Merge branch 'christos-update' into 'master'
Jan 17, 2016
9a58721
Fixed NOMLEX reader bug
christos-c Jan 31, 2016
c07c1a1
Updated models usage
christos-c Feb 1, 2016
fbd5602
updated version
Feb 1, 2016
62fe36c
fixed bug with incorrect package name
Feb 1, 2016
a31d438
Small changes
christos-c Feb 20, 2016
1e0b6f6
Removed prop/nombank dependence during inference
christos-c Mar 28, 2016
5bca95d
Merge branch 'christos-prop-nom-fix' into 'master'
Mar 28, 2016
0459b97
Fixed Windows-specific JWNL problem
christos-c Apr 12, 2016
e2ce6a5
Added Windows script
christos-c Apr 12, 2016
52cc455
Added ILP Solver as a property
christos-c Apr 12, 2016
025e7df
Added GitLab CI script
christos-c Apr 12, 2016
82040ec
Fixed unit tests
christos-c Apr 12, 2016
75de089
Added CI badge
christos-c Apr 12, 2016
5c6c99c
Merge remote-tracking branch 'origin/master' into christos-win
christos-c Apr 12, 2016
396de0d
Updated README and linux script
christos-c Apr 12, 2016
5722539
Merge branch 'christos-win' into 'master'
christos-c Apr 12, 2016
2a04ebc
Removed CI badge
christos-c Apr 12, 2016
976f288
Update README.md
christos-c May 2, 2016
d444a13
upgrade SL dependency
May 4, 2016
3998c95
Merge branch 'daniel_sl_dependency_upgrade' into 'master'
christos-c May 4, 2016
15cd5fd
Fixed typo in feature definition
christos-c May 4, 2016
13e2746
Increment version
christos-c May 4, 2016
691aac5
Fixed interactive Bash script
christos-c Jun 10, 2016
ae71b69
Updated cogcomp-nlp & pipeline deps
christos-c Aug 1, 2016
1161b78
Fixed bug in PropBank legal args list creation
christos-c Aug 1, 2016
d6f68c2
Incremented version
christos-c Aug 1, 2016
b016cc6
updated to use updated Annotator API, which accommodates lazy initial…
Aug 12, 2016
4a60b49
added test to try to monitor memory use with repeated instantiation o…
Aug 13, 2016
662afdd
updated to check for/add CLAUSE_STANFORD view before trying to apply …
Aug 14, 2016
2a98e89
updated icu dependency to non-snapshot
Aug 14, 2016
357fd39
updated pipeline dependency
Aug 15, 2016
f5b0df0
Merge branch 'icuupdate' into 'master'
Aug 16, 2016
d23369b
updated version number
Aug 16, 2016
a008507
Merge branch 'icuupdate' into 'master'
Aug 16, 2016
1f6beab
moving the files into the srl folder
Sep 20, 2016
2ab580c
Merge remote-tracking branch 'cogcomp/master'
Sep 20, 2016
458b844
adding srl to the main pom file.
Sep 20, 2016
6813089
update readme file
Sep 20, 2016
de4dab6
remove gitlab-ci file.
Sep 20, 2016
901998f
adding license header to the srl files.
Sep 20, 2016
2653065
Merge remote-tracking branch 'cogcomp/master'
Sep 22, 2016
960e5c1
applying comments: updated pom file and removed some redundant files.
Sep 22, 2016
90f920e
applying formatter on SRL.
Sep 22, 2016
8907a86
dropping two old fex files.
Sep 22, 2016
f196fe4
adding a memory condition to srl tests.
Sep 22, 2016
fd97201
increase memory requirement for srl tests to 8gb.
Sep 22, 2016
f292cde
drop curator release components.
Sep 22, 2016
084d0ba
some clean up for pom.xml.
Oct 13, 2016
7c40a69
Merge remote-tracking branch 'upstream/master' into adding-srl
christos-c Oct 14, 2016
24ba53e
Changed SRL's TA cache mechanism
christos-c Oct 14, 2016
8252cb9
do not initialize multiple caches.
Oct 14, 2016
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,8 @@ An application that identifies the part of speech (e.g. verb + tense, noun + num
in plain text.
- [illinois-ner](ner/README.md)
An application that identifies named entities in plain text according to two different sets of categories.

- [illinois-srl](srl/README.md)
An application to annotate natural language sentences with semantic roles.

## Using each library programmatically

Expand Down
1 change: 1 addition & 0 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
<module>pos</module>
<module>chunker</module>
<module>ner</module>
<module>srl</module>
<module>corpusreaders</module>
<module>lbjava-nlp-tools</module>
</modules>
Expand Down
55 changes: 55 additions & 0 deletions srl/CHANGELOG
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
Version 3.0.73
Moved to the super-project and changed the versioning to the super-project versioning

Version 5.1.12
Added Windows support (including access to non-Gurobi solver)

Version 5.1.4
Switched entirely to illinois-sl for structured prediction (removed JLIS traces)
Using the latest AnnotatorService from illinois-core-utilities for both Curator & pipeline annotation
Major cleaning up

Version 5.1
Added JUnit tests
Removed unnecessary dependencies
Switched to illinois-nlp-pipeline-0.1.2
Minor fixes

Version 5.0
Standalone SRL using illinois-nlp-pipeline

Version 4.1.1
Switched to edison-0.7.1 and LBJava-1.0
Added dependency to illinois-common-resources

Version 4.1
Various bugfixes

Version 4.0.2
Updated inference dependency to latest version and modified inference
code accordingly.

Version 4.0.1
Removed duplicate code from JLIS-core and moved to IllinoisSL. Minor edits.

Version 4.0
A complete rewrite of the SRL. Includes predicate and sense detectors,
new constraints and a memory footprint of only 3GB.

Version 3.0.3
Minor bugfixes. Uses edison v0.2.9

Version 3.0.2
Added an option to trim leading prepositions from arguments.

Revamped the training mechanism to train using LBJ's BatchTrainer in
the code. This allows manual lexicon handling, which reduces the
memory requirements by nearly 40 percent.

Version 3.0.1
Minor bugfix

Version 3.0
A complete Java based re-implementation of the Illinois SRL from
Punyakanok 2008. This version uses LBJ to train classifiers and
for performing inference with a home-brewed beam search.
49 changes: 49 additions & 0 deletions srl/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
# illinois-srl: Semantic Role Labeler

### Running
You can use the **illinois-srl** system in either *interactive* or *annotator* mode.
#### Interactive mode
In *interactive mode* the user can input a single piece of text and get back the feedback from both
the **Nom**inal or **Verb**al SRL systems in plain text.

To run the system in *interactive mode* see the class `edu.illinois.cs.cogcomp.srl.SemanticRoleLabeler`
or simply execute the `run-interactive` script:

For linux:
```
scripts/run-interactive.sh
```

For windows:
```
cd scripts
run-interactive-win.bat
```

#### As an `Annotator` component
**illinois-srl** can also be used programmatically through the `SemanticRoleLabeler` class which implements CogComp's
[Annotator interface](http://cogcomp.cs.illinois.edu/software/doc/illinois-core-utilities/apidocs/edu/illinois/cs/cogcomp/core/datastructures/textannotation/Annotator.html).

The main method is `getView(TextAnnotation)` inside `SemanticRoleLabeler`. This will add a new
[`PredicateArgumentView`](http://cogcomp.cs.illinois.edu/software/doc/illinois-core-utilities/apidocs/edu/illinois/cs/cogcomp/core/datastructures/textannotation/PredicateArgumentView.html)
for either **Nom**inal or **Verb**al SRL.

### Training
To train the SRL system you will require access to the [Propbank](https://verbs.colorado.edu/~mpalmer/projects/ace.html)
or [Nombank](http://nlp.cs.nyu.edu/meyers/NomBank.html) corpora. You need to set pointers to these in the
`config/srl-config.properties` file.
(To train the system with a non-Prop/Nombank corpus, you need to extend
[`AbstractSRLAnnotationReader`](http://cogcomp.cs.illinois.edu/software/doc/illinois-core-utilities/apidocs/edu/illinois/cs/cogcomp/nlp/corpusreaders/AbstractSRLAnnotationReader.html))

To perform the whole training/testing suite, run the `Main` class with parameters `<config-file> expt Verb|Nom true`.
This will:

1. Read and cache the datasets (train/test)
2. Annotate each `TextAnnotation` with the required views
(here you can set the `useCurator` flag to false to use the CogComp's standalone NLP pipeline)
3. Pre-extract and cache the features for the classifiers
4. Train the classifiers
5. Evaluate on the (cached) test corpus

**IMPORTANT** After training, make sure you comment-out the pre-trained SRL model dependencies inside
`pom.xml` (lines 27-38).
20 changes: 20 additions & 0 deletions srl/config/learner.properties
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Available learning models: {L2LossSSVM, StructuredPerceptron}
LEARNING_MODEL = L2LossSSVM

# Available solver types: {DCDSolver, ParallelDCDSolver, DEMIParallelDCDSolver}
L2_LOSS_SSVM_SOLVER_TYPE = ParallelDCDSolver

NUMBER_OF_THREADS = 8

# Regularization parameter
C_FOR_STRUCTURE = 1.0

# Mini-batch for 'warm' start
TRAINMINI = true
TRAINMINI_SIZE = 10000

# Suppress optimatility check
CHECK_INFERENCE_OPT = false

# Number of training rounds
MAX_NUM_ITER = 100
17 changes: 17 additions & 0 deletions srl/config/pipeline.properties
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
## Flags for whether to use different annotators
usePos = true
useLemma = true
useShallowParse = true
useNerConll = true
useNerOntonotes = false
useStanfordParse = true
useStanfordDep = true
useSrlVerb = false
useSrlNom = false

## Flags for the Stanford parser (for pre-processing)
# Max time per sentence (in milliseconds)
stanfordMaxTimePerSentence = 1000

# Max sentence lenght (will throw exception if larger)
stanfordParseMaxSentenceLength = 80
39 changes: 39 additions & 0 deletions srl/config/srl-config.properties
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
## Illinois SRL Configuration##

# Whether to use the Illinois Curator to get the required annotations for training/testing
# If set to false, Illinois NLP pipeline will be used
UseCurator = false

# The configuration of the Illinois NLP pipeline
PipelineConfig = config/pipeline.properties

# The parser used to extract constituents and syntactic features
# Options are: Charniak, Berkeley, Stanford
# NB: Only Stanford can be used in standalone mode.
DefaultParser = Stanford

# The configuration for the Structured learner
LearnerConfig = config/learner.properties

# Num of threads for feat. ext.
NumFeatExtThreads = 10

# The ILP solver to use for the joint inference
# Options are: Gurobi, OJAlgo
ILPSolver = OJAlgo

### Training corpora directories ###
# This is the directory of the merged (mrg) WSJ files
PennTreebankHome = /shared/corpora/corporaWeb/treebanks/eng/pennTreebank/treebank-3/parsed/mrg/wsj/
PropbankHome = /shared/corpora/corporaWeb/treebanks/eng/propbank_1/data
NombankHome = /shared/corpora/corporaWeb/treebanks/eng/nombank/

# The directory of the sentence and pre-extracted features database (~5G of space required)
# Not used during test/working with pre-trained models
CacheDirectory = cache

ModelsDirectory = models

# Directory to output gold and predicted files for manual comparison
# Comment out for no output
OutputDirectory = srl-out
20 changes: 20 additions & 0 deletions srl/curator-release/README.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
1. Run prepare-curator-release.sh in the root directory

2. Run 'ant dist' in both the verb and nom directories here.

3. Copy all the jars from the lib directory into the lib directory
within $CURATOR_HOME/dist.

4. Copy illinois-verb-srl/dist/illinois-verb-srl-server.jar and
illinois-nom-srl/dist/illinois-nom-srl-server.jar into
$CURATOR_HOME/dist/components.

5. Copy config/srl-config.properties into
$CURATOR_HOME/dist/configs.

6. Copy illinois-verb-srl/illinois-verb-srl-server.sh and
illinois-nom-srl/illinois-nom-srl-server.sh to the bin directory
within $CURATOR_HOME/dist/bin


Now the SRL servers are ready to be launched.
45 changes: 45 additions & 0 deletions srl/curator-release/config/jwnl_properties.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
<?xml version="1.0" encoding="UTF-8"?>
<jwnl_properties language="en">
<version publisher="Princeton" number="3.0" language="en"/>
<dictionary class="net.didion.jwnl.dictionary.FileBackedDictionary">
<param name="morphological_processor" value="net.didion.jwnl.dictionary.morph.DefaultMorphologicalProcessor">
<param name="operations">
<param value="net.didion.jwnl.dictionary.morph.LookupExceptionsOperation"/>
<param value="net.didion.jwnl.dictionary.morph.DetachSuffixesOperation">
<param name="noun" value="|s=|ses=s|xes=x|zes=z|ches=ch|shes=sh|men=man|ies=y|"/>
<param name="verb" value="|s=|ies=y|es=e|es=|ed=e|ed=|ing=e|ing=|"/>
<param name="adjective" value="|er=|est=|er=e|est=e|"/>
<param name="operations">
<param value="net.didion.jwnl.dictionary.morph.LookupIndexWordOperation"/>
<param value="net.didion.jwnl.dictionary.morph.LookupExceptionsOperation"/>
</param>
</param>
<param value="net.didion.jwnl.dictionary.morph.TokenizerOperation">
<param name="delimiters">
<param value=" "/>
<param value="-"/>
</param>
<param name="token_operations">
<param value="net.didion.jwnl.dictionary.morph.LookupIndexWordOperation"/>
<param value="net.didion.jwnl.dictionary.morph.LookupExceptionsOperation"/>
<param value="net.didion.jwnl.dictionary.morph.DetachSuffixesOperation">
<param name="noun" value="|s=|ses=s|xes=x|zes=z|ches=ch|shes=sh|men=man|ies=y|"/>
<param name="verb" value="|s=|ies=y|es=e|es=|ed=e|ed=|ing=e|ing=|"/>
<param name="adjective" value="|er=|est=|er=e|est=e|"/>
<param name="operations">
<param value="net.didion.jwnl.dictionary.morph.LookupIndexWordOperation"/>
<param value="net.didion.jwnl.dictionary.morph.LookupExceptionsOperation"/>
</param>
</param>
</param>
</param>
</param>
</param>
<param name="dictionary_element_factory" value="net.didion.jwnl.princeton.data.PrincetonWN17FileDictionaryElementFactory"/>
<param name="file_manager" value="net.didion.jwnl.dictionary.file_manager.FileManagerImpl">
<param name="file_type" value="net.didion.jwnl.princeton.file.PrincetonRandomAccessDictionaryFile"></param>
<param name="dictionary_path" value="/shared/grandpa/opt/dict"/>
</param>
</dictionary>
<resource class="JWNLResource"></resource>
</jwnl_properties>
10 changes: 10 additions & 0 deletions srl/curator-release/config/srl-config.properties
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# For a description of the arguments, see the documents
DefaultParser = Charniak

WordNetConfig=jwnl_properties.xml


StandardIdentifierClassifierPipeline = true



48 changes: 48 additions & 0 deletions srl/curator-release/illinois-nom-srl/README.dist
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
Illinois Nominal SRL Server
======================

Description
-----------

This package wraps the Illinois Nominal Semantic Role Labeler as a
Thrift server using the Curator's Parser interface. It is designed to
be integrated into the Curator system. Parsers return a forests; in
this case, consisting of one tree per predicate, with the predicate as
the root and its arguments as the children.


Semantics
----------

Semantic Roles are represented using a forest consisting of two level
trees. Each predicate in the input is associated with a tree whose
root corresponds to the predicate. The children of this node denote
the labeled arguments of the predicate.

Usage
-----

> $ bin/illinois-verb-srl-server.sh --help
> usage: java edu.illinois.cs.cogcomp.annotation.server.IllinoisNomSRLServer
> [-c <CONFIG>] [-h] [-p <PORT>] [-t <THREADS>]
> -c,--config <CONFIG> configuration file
> -h,--help print this message
> -p,--port <PORT> port to open server on
> -t,--threads <THREADS> number of threads to run

Installation
------------

The Illinois Nominal SRL Server depends on:

* Java >= 1.5
* The Illinois SRL system
* Illinois SRL model jar
* Curator interfaces (`curator-interfaces.jar`)


Configuration
-------------

See the documentation for the Illinois SRL system for configuration
options.
Loading