to compile any of tasks (haha, who in their sane mind is gonna need this very repository anyway):
- install dub
cd
to a, say,ir01
and rundub build
- You may also try executing
dub build :ir01
and suchs from the root directory, but that doesn't always work (screw me if I know why)
- You may also try executing
no. | task | additional info |
---|---|---|
ir01 | inverted index | |
ir02 | boolean search | and/or/not support; queries are parsed using shunting yard algorithm |
ir03 | external stemmer | oleander stemming library (porter stemmer, basically) |
ir04 | index w/ skip pointers | |
ir05 | bigram index | |
ir06 | coordinate index | enables one to search citations |
ir07 | metasymbol search | trie |
ir08 | indexation using mapreduce | hadoop streaming |
ir09 | index compression | variable byte encoding |
lp01 | zipf's law coefficients | |
lp02 | mandelbrot's law coefficients | |
lp03 | collocations | |
lp04 | language model w/ smoothing | lidstone smoothing |
lp05 | spellchecking | viterbi algorithm + lidstone smoothing |
- texts used for IR tasks (may require preprocessing due to invalid html markdown):
- oleander stemming library
- stop words list obtained using nltk
- text corpus for lp03~05 obtained from leipzig corpora collection