Language Model

OQA uses a language model for scoring during inference. It uses the KenLM software for language modeling. This sub-project has code for downloading KenLM, building KenLM, and using KenLM to construct a language model from WikiAnswers. The OQA code accesses the KenLM language model by querying an HTTP server started via a python script.

To build KenLM, you will need Boost installed and the corresponding environment variables set to the Boost lib and include paths. Using Homebrew on a Mac, I had to set these environment variables:

export LDFLAGS=-L/usr/local/Cellar/boost/1.57.0/lib
export LD_LIBRARY_PATH=/usr/local/Cellar/boost/1.57.0/lib
export CPLUS_INCLUDE_PATH=/usr/local/Cellar/boost/1.57.0/include

Below are the steps needed to get the language model component of OQA started. Please look at the script sources if you are interested in what is going on in more detail. Each step below depends on the previous step.

Installing KenLM

Run ./src/main/scripts/install-kenlm.sh to download and build KenLM.

Gotcha: If you receive an error like this after KenLM builds:

error: [Errno 13] Permission denied: '/usr/local/lib/python2.7/dist-packages/kenlm.so'

then you need to run sudo to install the KenLM python bindings. To do this, run these steps:

cd kenlm
sudo python setup.py install

Building the Language Model

You will need to have downloaded the OQA data in order to complete this step. Run ./src/main/scripts/build-lm.sh to build the language model from the WikiAnswers data.

Starting the Language Model Server

Run ./src/main/scripts/start.sh to start the server. This will start an HTTP server on localhost:9090.

Check the log in lm.err for any errors. You may need to install the web.py python module.

Stopping the Language Model Server

Run ./src/main/scripts/stop.sh to stop the server.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Language Model

Installing KenLM

Building the Language Model

Starting the Language Model Server

Stopping the Language Model Server

Files

README.md

Latest commit

History

README.md

File metadata and controls

Language Model

Installing KenLM

Building the Language Model

Starting the Language Model Server

Stopping the Language Model Server