In this tutorial, we will provide a brief tour of SEMPRE. This tutorial is very much about the mechanics of the system, not about the linguistics or semantic parsing from a research point of view (for those, see the recommended readings at the end of this document).
We will construct a semantic parser to understand a toy subset of natural language. Concretely, the system will have the following behavior:
- Input: What is three plus four?
- Output: 7
In semantic parsing, natural language utterances are mapped into logical forms (think programs), which are executed to produce some denotation (think return value).
To download all the dependencies for the system, run:
./download-dependencies core
To compile the system, run:
make
To run the system, run:
java -Xmx3g -cp classes:lib/* edu.stanford.nlp.sempre.Main -executor JavaExecutor -interactive
This will put you in an interactive prompt where you can develop a system and
parse utterances into tiny Java programs (hence JavaExecutor
). Note: you
might find it convenient to use rlwrap
to get readline support.
A logical form (Formula
) is a hierarchical expression. For example, here are
some primitive logical forms:
(boolean true)
(number 3)
(string hello)
(date 2013 7 28)
Logical forms can be constructed recursively using call
, which takes a
function name followed by arguments:
(call + (number 3) (number 4))
(call java.lang.Math.cos (number 0))
(call .indexOf (string "what is this?") (string is))
(call .substring (string "what is this?") (number 5) (number 7))
(call if (call < (number 3) (number 4)) (string yes) (string no))
Note that each logical form is painfully explicit about types. You would probably not want to program directly in this language, but that is not the point; we will generate these logical forms automatically from natural language.
We can execute these logical forms by typing into the interactive prompt:
(execute (call + (number 3) (number 4)))
This should print out (number 7)
, which we refer to as the denotation of
the logical form. Try to execute the other logical forms and see what you get.
So far, we have been representing monolithic logical forms. When we start doing parsing, it will be convenient to be able to refer to logical forms with some of its parts abstracted out. For example, the following logical form represents a function that takes a number and returns its square:
(lambda x (call * (var x) (var x)))
We apply a lambda expression in the usual Lispy way:
((lambda x (call * (var x) (var x))) (number 3))
Executing this expression should yield (number 9)
.
Technical note: these lambda expressions are just doing macro substitution, not actually representing higher-order functions; since there are no side effects here, there is no difference.
This concludes the section on logical forms and denotations. We have presented
one system of logical forms, which are executed using JavaExecutor
. The
system supports other types of logical forms, for example, those which encode
SPARQL database queries for question answering (we will get to that later).
Note that there is no mention of natural language yet...
Having established the nature of logical forms and their denotations, let us turn to the problem of mapping a natural language utterance into a logical form. We will proceed by defining a grammar, which is a set of rules which specify how to perform this mapping piece by piece. This is a grammar in the computer science sense, not in the linguistic sense (again, SEMPRE just provides a framework; you can use whatever grammar you would like).
We will run through some examples to give you a feel for how things work, and then go into the details. First, let us a rule to the grammar by entering the following into the prompt:
(rule $ROOT (three) (ConstantFn (number 3)))
Now to parse an utterance, just type it in:
three
Note: The first time you parse an utterance will be slower since Stanford CoreNLP components need to be loaded into memory (e.g., the part-of-speech tagger).
The parser should print out (among other information) a line that shows that the utterance was parsed sucessfully into a Derivation:
(derivation (formula (number 3.0)) (value (number 3.0)) (type fb:type.number))
Now type in the utterance:
four
You should get no results because no rule will match four
. To fix that, let
us create a more general rule:
(rule $ROOT ($PHRASE) (NumberFn))
This rule says match any phrase (sequence of tokens), pass it to a special
function called NumberFn
, which will transform the string into a new
derivation.
Now the system can interpret general numbers! Try typing in:
twenty five million
The result shold be (number 2.5E7)
.
Note: if you now type in three
, you should get two derivations that yield the
same answer, one coming from each rule.
So far, we have only parsed utterances using one rule, but the true power of these grammars come from combining multiple rules. Copy and paste in the following rules:
(rule $Expr ($PHRASE) (NumberFn))
(rule $Operator (plus) (ConstantFn (lambda y (lambda x (call + (var x) (var y)))) (-> fb:type.number (-> fb:type.number fb:type.number))))
(rule $Operator (times) (ConstantFn (lambda y (lambda x (call * (var x) (var y)))) (-> fb:type.number (-> fb:type.number fb:type.number))))
(rule $Partial ($Operator $Expr) (JoinFn forward))
(rule $Expr ($Expr $Partial) (JoinFn backward))
(rule $ROOT ((what optional) (is optional) $Expr (? optional)) (IdentityFn))
Now try typing in:
What is three plus four?
The output should be
(number 7)
We can parse longer sentences. Type in:
What is three plus four times two?
There should be two derivations, yielding (number 14)
and (number 11)
,
corresponding to either combining three plus four first or four times two
first. Note that this is expected because we have not encoded any order of
operations anywhere.
Hopefully that should give you a sense of what parsing looks like. Let us now take a closer look. At the end of the day, a grammar declaratively specifies a mapping from utterances to a set of candidate Derivations (which are to be scored and ranked later). A parser is an actual algorithm that takes the grammar and generates those derivations. Recall that a derivation looks like this:
(derivation (formula (number 3.0)) (value (number 3.0)) (type fb:type.number))
Formally, each Derivation produced by the parser has the following properties:
- Span i:j (e.g., 0:1): specifies the continguous portion of the input utterance (tokens i to j-1) that the Derivation is constructed from.
- Category (e.g.,
$ROOT
): categories place hard constraints on what Derivations can be combined. - Type (e.g.,
(fb:type.number)
): typically more fine-grained than the category, and is generated dynamically. - Logical form (e.g.,
(call + (number 3) (number 4))
): what we normally think of as the output of semantic parsing. - Value (e.g.,
(number 7)
): the result of executing the logical form.
There are some special categories:
$TOKEN
: matches a single token of the utterance. Formally, the parser builds a Derivation with category$TOKEN
and logical form corresponding to the token (e.g.,(string three)
) for each token in the utterance.$PHRASE
: matches any contiguous subsequence of tokens. The logical form created is the concatenation of those tokens (e.g.,(string "twenty-five million")
).$LEMMA_TOKEN
: like$TOKEN
, but the logical form produced is a lemmatized version of the token (for example $TOKEN would yield cows, while $LEMMA_TOKEN would yield cow).$LEMMA_PHRASE
: the lemmatized version of$PHRASE
.$ROOT
Derivations that have category$ROOT
and span the entire utterance are executed, scored, and sent back to the user.
Now let us see how a grammar specifies the set of derivations. A grammar is a set of rules, and each rule has the following form:
(rule |target| (|source_1| ... |source_k|) |semantic function|)
- Target category (e.g.,
$ROOT
): any derivation produced by this rule is labeled with this category.$ROOT
is the designated top-level category. Derivations of type$ROOT
than span the entire utterance are returned to the user. - Source sequence (e.g.,
three
): in general, this is a sequence of tokens and categories (all categories start with$
by convention). Tokens (e.g.,three
) are matched verbatim, and categories (e.g.,$PHRASE
) match any derivation that is labeled with that category and has a span at that position. - Semantic function (
SemanticFn
): a semantic function takes a sequence of derivations corresponding to the categories in the children and produces a set of new derivations which are to be labeled with the target category. Semantic functions are arbitrary Java code, and allow the parser to integrate custom logic in a flexible modular way. In the example aboveConstantFn
is an example of a semantic function which always returns oneDerivation
with the given logical form (e.g.,(number 3)
).
Derivations are built recursively: for each category and span, we construct a
set of Derivations. We can apply a rule if there is some segmentation of the
span into sub-spans
The first rule is a familiar one that just parses numbers such as three million
into the category $Expr
:
(rule $Expr ($PHRASE) (NumberFn))
Specifically, one derivation with logical form (string three)
is created
with category $PHRASE
and span 0:1. This derivation is passed into
NumberFn
, which returns one derivation with logical form (number 3)
and
category $Expr
and span 0:1. The same goes for four on span 2:3.
The next two rules map the tokens plus and times to a static logical form
(returned by ConstantFn
):
(rule $Operator (plus) (ConstantFn (lambda y (lambda x (call + (var x) (var y)))) (-> fb:type.number (-> fb:type.number fb:type.number))))
(rule $Operator (times) (ConstantFn (lambda y (lambda x (call * (var x) (var y)))) (-> fb:type.number (-> fb:type.number fb:type.number))))
Note that the ConstantFn
specifies both the logical form and the type (note
that the type is a property of the derivation, not of the logical form). Here,
the logical form is
(lambda y (lambda x (call + (var x) (var y))))
which represents a curried function that takes y
and x
and returns calls
+
on the values. The type of this entry is:
(-> fb:type.number (-> fb:type.number fb:type.number))
The type is used in parsing to prevent bad derivations from being combined spuriously.
The next two rules are the main composition rules:
(rule $Partial ($Operator $Expr) (JoinFn forward))
(rule $Expr ($Expr $Partial) (JoinFn backward))
The semantic function (JoinFn forward)
takes two a lambda term $Operator
and an argument $Expr
and returns a new derivation by forward application:
Input $Operator: (lambda y (lambda x (call + (var x) (var y))))
Input $Expr: (number 4)
Output $Partial: (lambda x (call + (var x) (number 4)))
The semantic function (Join backward)
takes an argument $Expr
and a lambda
term $Partial
and returns a new derivation by backward application:
Input $Expr: (number 3)
Input $Partial: (lambda x (call + (var x) (number 4)))
Output $Expr: (call + (number 3) (number 4))
(rule $ROOT ((what optional) (is optional) $Expr (? optional)) (IdentityFn))
We allow some RHS elements to be optional, so that we could have typed in
three plus four
or three plus four?
. IdentityFn
simply takes the logical
form corresponding to $Expr
and passes it up.
So far, we have used the grammar to generate a set of derivations given an utterance. In general, this set is huge and we need a way to rank the derivations. Our strategy is to provide the system with training examples, from which the system can learn a model that places a distribution over derivations given utterances.
To enable learning in the interactive prompt, do the following (we could have
also passed these in as command-line arguments, e.g.,
-Master.onlineLearnExamples true -FeatureExtractor.featureDomains rule
):
(set Master.onlineLearnExamples true)
(set FeatureExtractor.featureDomains rule)
The first statement turns on online learning updates (this is only necessary for the interactive prompt). The second statement adds features that keep track of which rules we are using.
As a simple example, imagine that a priori, we do not know what the word and means: it could be either plus or times. Let us add two rules to capture the two possibilities:
(rule $Operator (and) (ConstantFn (lambda y (lambda x (call + (var x) (var y)))) (-> fb:type.number (-> fb:type.number fb:type.number))))
(rule $Operator (and) (ConstantFn (lambda y (lambda x (call * (var x) (var y)))) (-> fb:type.number (-> fb:type.number fb:type.number))))
Now type in
three and four
There should be two derivations each with probability 0.5:
(derivation (formula (((lambda y (lambda x (call + (var x) (var y)))) (number 4.0)) (number 3.0))) (value (number 7.0)) (type fb:type.number)) [score=0, prob=0.500]
(derivation (formula (((lambda y (lambda x (call * (var x) (var y)))) (number 4.0)) (number 3.0))) (value (number 12.0)) (type fb:type.number)) [score=0, prob=0.500]
You will also see the features that are active for the first derivation. For example, the following feature represents the fact that one of the rules we just added is active:
[ rule :: $Operator -> and (ConstantFn (lambda y (lambda x (call + (var x) (var y)))) (-> fb:type.number (-> fb:type.number fb:type.number))) ] 0 = 1 * 0
So far all the features have weight zero, so all the scores (dot products between feature vector and weight vector) are also zero.
We can add a training example which says that the utterance should map to the first derivation:
(accept 0)
This should perform a stochastic gradient update of the parameters. Now type in:
three and four
The first derivation should have much higher probability (around 0.88). To see the new parameters, type in:
(params)
This concludes the basics of logical forms, denotations, grammars, features,
and learning. In practice, much of these operations can be done offline by
putting the grammar rules in a file (data/tutorial.grammar
) and the examples
in another file (`data/tutorial.examples.json). Then you can one run command
to perform what we just did:
java -cp classes:lib/* edu.stanford.nlp.sempre.Main -executor JavaExecutor -Grammar.inPaths data/tutorial.grammar -Dataset.inPaths train:data/tutorial.examples.json -readLispTreeFormat false -trainFrac 0.5 -devFrac 0.5 -maxTrainIters 2 -featureDomains rule -execDir tutorial.out
This will create a directory tutorial.out
which records all the information
associated with this run.
So far, we have worked with JavaExecutor
as the engine that maps logical
forms to denotations by executing Java code. A major application of semantic
parsing is where the logical forms are database queries. In this section, we
will look at querying graph databases.
A graph database (e.g., Freebase) stores information about entities
and their properties; concretely, it is just a set of triples
fb:en.barack_obama fb:place_of_birth fb:en.honolulu
is one triple. If we think of the entities as nodes in a directed graph, the each triple is a directed edge between two nodes labeled with the property.
See data/tutorial.ttl
for an example of a tiny subset of the Freebase graph
pertaining to geography about California.
We will use Virtuoso to provide the backend for querying this data. Make sure you have Virtuoso downloaded and compiled (see QUICKSTART.md for instructions).
Start the server:
scripts/virtuoso start tutorial.vdb 3001
Add the small graph to the database:
scripts/virtuoso add data/tutorial.ttl 3001
Aside: if you want to work with a larger database:
./download-dependencies geofreebase_ttl # Subset of Freebase for geography in .ttl format (not needed to run the server)
./download-dependencies geofreebase_vdb # Virtuoso index for subset of Freebase for geography
./download-dependencies fullfreebase_ttl # All of Freebase in .ttl format (not needed to run the server) [BIG FILE]
./download-dependencies fullfreebase_vdb # Virtuoso index for all of Freebase [BIG FILE]
Then, just replace tutorial.vdb
with the appropriate lib/freebase/??.exec/vdb
path.
Now we are ready to make some queries. Start up the interactive prompt, now with the SparqlExecutor
:
java -cp classes:lib/* edu.stanford.nlp.sempre.Main -executor SparqlExecutor -endpointUrl http://localhost:3001/sparql -interactive
SPARQL is a language for querying these graphs, but it will be convenient to use a language more tailored for semantic parsing which is based on a mix between lambda calculus, description logic, and dependency-based compositional semantics. The simplest formula is a single entity:
fb:en.california
To execute this query, simply type the following into the interactive prompt:
(execute fb:en.california)
This should return:
(list (name fb:en.california California))
The result is a list containing the single entity. Here, fb:en.california
is
the canonical Freebase ID (always beginning with the prefix fb:
) and
California
is the name (look at data/tutorial.ttl
to see where this comes
from).
Let us try a more complex query which will fetch all the cities (in the database);
(execute (fb:type.object.type fb:location.citytown))
This should return the three cities, Seattle, San Francisco, and Los Angeles. We can restrict to cities in California:
(execute (and (fb:type.object.type fb:location.citytown) (fb:location.location.containedby fb:en.california)))
This should return the two cities satisfying the restriction: San Francisco and Los Angeles.
We can count the number of cities (should return 3):
(execute (count (fb:type.object.type fb:location.citytown)))
We can also get the city with the largest area:
(execute (argmax 1 1 (fb:type.object.type fb:location.citytown) fb:location.location.area))
Now let us take a closer look at what is going on with these logical forms under the hood. We are using a logical language called lambda DCS.
Here are the following types of logical forms:
- Primitive (e.g.,
fb:en.seattle
): denotes a set containing that single entity. - Intersection
(and |u1| |u2|)
: denotes the intersection of the sets denoted by unary logical formsu1
andu2
. - Join
(|b| |u|)
: denotes the set of$x$ which are connected to some$y$ via a binary$b$ and$y$ is in the set denoted by unary$u$ . - Count
(count |u|)
: denotes the set containing the cardinality of the set denoted byu
. - Superlative
(argmax |rank| |count| |u| |b|)
: sort the elements ofz
by decreasingb
and returncount
elements starting at offsetrank
(1-based). - Mu abstraction
(mark (var |v|) |u|)
: same as the unary |u| denoting entities |x|, with the exception that |x| must be equal to all occurrences of the variable |v| in |u|. - Lambda abstraction
(lambda (var |v|) |u|)
: produces a binary (x,y) wherex
is in the set denoted byu
andy
is the value taken on by variablev
.
See SparqlExecutorTest.java
for examples.
So far, we have described the denotations of logical forms for querying a graph database. Now we focus on parsing natural language utterances into these logical forms.
The core challenge is at the lexical level: mapping natural language phrases
(e.g., born in) to logical predicates (e.g.,
fb:people.person.place_of_birth
). First, we need to download the files that
specify this mapping (roughly, the mappings were created by aligning Freebase
and ClueWeb, as described the EMNLP 2013 paper, but you can just take these
mappings as given for now):
./download-dependencies emnlp2013
The following commands will require you having a Virtuoso instance that has all of Freebase indexed. If you do not have this set up, run the following two commands:
./download-dependencies fullfreebase_vdb # Virtuoso index for all of Freebase [BIG FILE]
scripts/virtuoso start lib/freebase/93.exec/vdb 3093 # Starts server on localhost at port 3093
Now, restart the SEMPRE shell, pointing the SparqlExecutor to the Virtuoso instance:
java -cp classes:lib/* edu.stanford.nlp.sempre.Main -executor SparqlExecutor -endpointUrl http://localhost:3093/sparql -interactive
Now, we can add rules with semantic function LexiconFn
:
(rule $Entity ($PHRASE) (LexiconFn entity allowInexact)) # e.g., Ulm
(rule $Unary ($LEMMA_PHRASE) (LexiconFn unary)) # e.g., physicists
(rule $Binary ($PHRASE) (LexiconFn binary)) # e.g., born in
(rule $ROOT ($Entity) (IdentityFn))
(rule $ROOT ($Unary) (IdentityFn))
LexiconFn takes logical forms which corresponds to strings (e.g., (string physicist)
)
and converts them into Freebase logical forms (e.g.,
(fb:people.person.profession fb:en.physicist)
).
Type the following utterances into the prompt:
Ulm
physicists
Each utterance should return a list of candidate entities.
Now for compositionality, add the following rules, which will take two logical forms and either perform an intersection or a join:
(rule $Set ($Unary $PartialSet) (MergeFn and))
(rule $PartialSet ($Binary $Entity) (JoinFn binary,unary unaryCanBeArg1))
(rule $ROOT ($Set) (IdentityFn))
This allows us to type in:
physicists born in Ulm
You should get many candidates, but the correct answer should appear in the list:
(derivation (formula (and (fb:people.person.profession fb:en.physicist) (!fb:location.location.people_born_here fb:en.ulm))) (value (list (name fb:en.albert_einstein "Albert Einstein"))) (type fb:people.person))
For an example of a more complex grammar, look at data/emnlp2013.grammar
.
-
Convert the following natural language utterances into lambda-DCS logical forms:
city with the largest area
top 5 cities by area
countries whose capitals have area at least 500 squared kilometers
states bordering Oregon and Washington
second tallest mountain in France
country with the most number of rivers
You should familiarize yourself with the Freebase schema to see which predicates to use.
Execute these logical forms on the geofreebase
subset to verify your answers.
- Write a grammar that can parse the above utterances into a set of candidates containing the true logical form you annotated above. Train a model (remember to add features) so that the correct logical forms appear at the top of the candidate list.
So far this tutorial has provided a very operational view of semantic parsing based on SEMPRE. The following references provide a much broader look at the area of semantic parsing and the linguistic and statistical foundations.
-
Natural language semantics: The question of how to represent natural language utterances using logical forms has been well-studied in linguistics under formal (or compositional) semantics. Start with the CS224U course notes from Stanford and an introduction by Lappin. You should take away from this an appreciation for the various phenomena in natural language.
-
Log-linear models: Our semantic parser is based on log-linear models, which is a very important tool in machine learning and statistical natural language processing. Start with a tutorial by Michael Collins, which is geared towards applications in NLP.
-
Semantic parsing: finally, putting the linguistic insights from formal semantics and the computational and statistical tools from machine learning, we get semantic parsing. There has been a lot of work on semantic parsing. Check out the ACL 2013 tutorial by Yoav Artzi and Luke Zettlemoyer, which focuses on how to build semantic parsers using Combinatory Categorical Grammar (CCG). Our EMNLP 2013 paper is the first paper based on SEMPRE.