Skip to content

Latest commit

 

History

History
72 lines (47 loc) · 3.8 KB

README.md

File metadata and controls

72 lines (47 loc) · 3.8 KB

Dismember1: Retrieval Algorithms for Decomposing Large-Scale Item Set

This repository implements a series of algorithms that aim to tackle the large-scale retrieval problem.

Due to the common situation that modern industrial applications such as online advertising, searching and recommendation usually contain a large number of items, complex deep learning models are often impractical to use directly. Since it requires traversing all the items to compute scores respectively, which leads to linear complexity w.r.t. the whole item set. To alleviate this issue, nowadays most real-world systems tend to convert it into an approximate similarity-searching(computing) problem, but the downside is the model expressiveness is limited.

Thus, the most intriguing feature of these implemented algorithms is providing logarithmic or sub-linear complexity w.r.t. the whole item set for model serving, where arbitrary deep models can be used. By introducing some kind of index structures, large-scale item set is decomposed into pieces, and relevant items can be retrieved through beam search.

Currently, the repository contains the following modules:

[1] Name comes from Swedish death metal band Dismember

Requirements

  • Linux platform
  • Scala 2.13
  • sbt ~= 1.8.2

Data

An example data is included in data folder, which comes from movielens-1m dataset.

Build & Usage

$ git clone https://github.com/massquantity/dismember.git

For quick running, one can directly import it into an IDE, i.e. IntelliJ IDEA as described in JetBrains doc. We can also use sbt and sbt-native-packager to build packages and generate runnable scripts:

$ cd dismember
$ sbt Universal/packageZipTarball

The command will create a package called examples-x.x.x.tgz in dismember/examples/target/universal/ folder. Now let's extract it into another folder, e.g. tasks/:

$ mkdir tasks
$ tar -zxf examples/target/universal/examples-0.1.0.tgz -C tasks --strip-components 1

At this point the dismember/tasks/ folder should look like this:

bin/
  <bash script>
lib/
  <jar files>

Before running those tasks, one should set up the parameters in xxx.conf files first, which are located in dismember/configs folder. The descriptions of parameters are listed in Configuration doc.

Then head to the corresponding docs to see how to run these scripts:

License

BSD-3-Clause