Skip to content

Latest commit

 

History

History
86 lines (62 loc) · 2.75 KB

README.md

File metadata and controls

86 lines (62 loc) · 2.75 KB

This is an example project of the SIGIR 2016 tutorial Succinct Data Structures in Information Retrieval: Theory and Practice presented by Simon Gog and Rossano Venturini.

The example shows how the Succinct Data Structure Library can be used to implement a space-efficient top-k query completion system. The final result is an almost state-of-the-art system which is implemented in less than 300 lines of code.

Here is an example of our final system. The index is built over titles and click counts of Wikipedia pages.

Searching Wikipedia titles

Installation

    ./install.sh

Building the project

    cd build
    cmake ..
    make

CMake will parse the index.config file and generate binaries for each index. The index name will be the prefix of the corresponding executables.

Running the command line version

    ./index1-main ../data/stops_nl.txt

The binary will generate an index and wait for user input and answer queries (one per line) interactively. The index is stored in ../data/stops_nl.txt.index1.sdsl and a visualization of its memory consumption is available at stops_nl.txt.index1.html. In general, each executable IDX-* will store the generated index at file.IDX.sdsl and its space visualization at file.IDX.html.

Running the webserver version

    ./index1-webserver ../data/stops_nl.txt 8000

The binary will generate an index and start a webserver which will listen to the specified port.

Running the demo application

  1. Change into the build directory
  2. Download the Wikipedia titles by calling make download
  3. Build the executable by calling make index4ci-webserver
  4. Generate the index and start the webserver by calling ./index4ci-webserver ../data/enwiki-20160601-all-titles
  5. You can access the demo at http://127.0.0.1:8000

Credits

  • Thanks to Sascha Witt for preparing the example input file which contains the pairs of Dutch train stations and number of daily train stops.

  • Thanks to all contributers to the SDSL project.