-
Notifications
You must be signed in to change notification settings - Fork 7
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
10 changed files
with
399 additions
and
18 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -17,6 +17,9 @@ | |
# Cpp | ||
nxc | ||
|
||
# Docs | ||
docs/*/* | ||
|
||
# Experiments | ||
/data | ||
/models* | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
# Documentation | ||
|
||
Documentation for napkinXC is generated using [Sphinx](https://www.sphinx-doc.org/) | ||
After each commit on `master`, documentation is updated and published to [Read the Docs](https://napkinxc.readthedocs.io). | ||
|
||
You can build the documentation locally. Just install Sphinx and run in ``docs`` directory: | ||
|
||
``` | ||
pip install -r requirements.txt | ||
make html | ||
``` | ||
|
||
Documentation will be created in `docs/_build` directory. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,92 @@ | ||
# Configuration file for the Sphinx documentation builder. | ||
# | ||
# This file only contains a selection of the most common options. For a full | ||
# list see the documentation: | ||
# https://www.sphinx-doc.org/en/master/usage/configuration.html | ||
|
||
# -- Path setup -------------------------------------------------------------- | ||
|
||
# If extensions (or modules to document with autodoc) are in another directory, | ||
# add these directories to sys.path here. If the directory is relative to the | ||
# documentation root, use os.path.abspath to make it absolute, like shown here. | ||
|
||
import os | ||
import sys | ||
sys.path.insert(0, os.path.abspath('../python')) | ||
|
||
|
||
# -- Project information ----------------------------------------------------- | ||
|
||
project = 'napkinXC' | ||
copyright = '2020, Marek Wydmuch' | ||
author = 'Marek Wydmuch' | ||
|
||
# The full version, including alpha/beta/rc tags | ||
release = '0.4.1' | ||
|
||
|
||
# -- General configuration --------------------------------------------------- | ||
|
||
# Add any Sphinx extension module names here, as strings. They can be | ||
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom | ||
# ones. | ||
extensions = [ | ||
'sphinx.ext.todo', | ||
'sphinx.ext.viewcode', | ||
'sphinx.ext.autodoc', | ||
'sphinx.ext.autosummary', | ||
'sphinx.ext.mathjax' | ||
] | ||
|
||
# Add any paths that contain templates here, relative to this directory. | ||
templates_path = ['_templates'] | ||
|
||
# List of patterns, relative to source directory, that match files and | ||
# directories to ignore when looking for source files. | ||
# This pattern also affects html_static_path and html_extra_path. | ||
exclude_patterns = ['_build', 'Thumbs.db', '.DS_Store'] | ||
|
||
|
||
# -- Autodoc configuration --------------------------------------------------- | ||
autodoc_mock_imports = [ | ||
"napkinxc._napkinxc", | ||
"numpy", | ||
"scipy", | ||
"scipy.sparse", | ||
"sklearn" | ||
] | ||
#autoclass_content = 'both' | ||
autodoc_default_flags = ['members', 'inherited-members', 'show-inheritance'] | ||
autodoc_default_options = { | ||
"members": True, | ||
"inherited-members": True, | ||
"show-inheritance": True, | ||
} | ||
|
||
# Generate autosummary pages. Output should be set with: `:toctree: pythonapi/` | ||
autosummary_generate = ['python_api.rst'] | ||
|
||
# Only the class' docstring is inserted. | ||
autoclass_content = 'class' | ||
|
||
# If true, `todo` and `todoList` produce output, else they produce nothing. | ||
todo_include_todos = False | ||
|
||
# The master toctree document. | ||
master_doc = 'index' | ||
|
||
|
||
# -- Options for HTML output ------------------------------------------------- | ||
|
||
# The theme to use for HTML and HTML Help pages. See the documentation for | ||
# a list of builtin themes. | ||
|
||
import sphinx_rtd_theme | ||
html_theme = 'sphinx_rtd_theme' | ||
|
||
html_theme_path = [sphinx_rtd_theme.get_html_theme_path()] | ||
|
||
# Add any paths that contain custom static files (such as style sheets) here, | ||
# relative to this directory. They are copied after the builtin static files, | ||
# so a file named "default.css" will overwrite the builtin "default.css". | ||
html_static_path = ['_static'] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,114 @@ | ||
Executable | ||
========== | ||
|
||
napkinXC can also be built and used as an executable that can be used to train and evaluate models and make a prediction. | ||
|
||
|
||
Building | ||
-------- | ||
|
||
To build napkinXC, first clone the project repository and run the following commands in the root directory of the project: | ||
|
||
.. code:: sh | ||
cmake . | ||
make | ||
``-B`` options can be passed to CMake command to specify other build directory. | ||
After successful compilation, ``nxc`` executable should appear in the root or specified build directory. | ||
|
||
|
||
Data Format | ||
----------- | ||
|
||
napkinXC supports multi-label svmlight/libsvm format | ||
and format of datasets from `The Extreme Classification Repository <https://manikvarma.github.io/downloads/XC/XMLRepository.html>`_, | ||
which has an additional header line with a number of data points, features, and labels. | ||
|
||
.. code:: sh | ||
label,label,... feature(:value) feature(:value) ... | ||
Command line options | ||
-------------------- | ||
|
||
.. code:: | ||
Usage: nxc <command> <args> | ||
Commands: | ||
train Train model on given input data | ||
test Test model on given input data | ||
predict Predict for given data | ||
ofo Use online f-measure optimization | ||
version Print napkinXC version | ||
help Print help | ||
Args: | ||
General: | ||
-i, --input Input dataset | ||
-o, --output Output (model) dir | ||
-m, --model Model type (default = plt): | ||
Models: ovr, br, hsm, plt, oplt, svbopFull, svbopHf, brMips, svbopMips | ||
--ensemble Number of models in ensemble (default = 1) | ||
-t, --threads Number of threads to use (default = 0) | ||
Note: -1 to use #cpus - 1, 0 to use #cpus | ||
--hash Size of features space (default = 0) | ||
Note: 0 to disable hashing | ||
--featuresThreshold Prune features below given threshold (default = 0.0) | ||
--seed Seed (default = system time) | ||
--verbose Verbose level (default = 2) | ||
Base classifiers: | ||
--optimizer Optimizer used for training binary classifiers (default = libliner) | ||
Optimizers: liblinear, sgd, adagrad, fobos | ||
--bias Value of the bias features (default = 1) | ||
--inbalanceLabelsWeighting Increase the weight of minority labels in base classifiers (default = 1) | ||
--weightsThreshold Threshold value for pruning models weights (default = 0.1) | ||
LIBLINEAR: (more about LIBLINEAR: https://github.com/cjlin1/liblinear) | ||
-s, --liblinearSolver LIBLINEAR solver (default for log loss = L2R_LR_DUAL, for l2 loss = L2R_L2LOSS_SVC_DUAL) | ||
Supported solvers: L2R_LR_DUAL, L2R_LR, L1R_LR, | ||
L2R_L2LOSS_SVC_DUAL, L2R_L2LOSS_SVC, L2R_L1LOSS_SVC_DUAL, L1R_L2LOSS_SVC | ||
-c, --liblinearC LIBLINEAR cost co-efficient, inverse of regularization strength, must be a positive float, | ||
smaller values specify stronger regularization (default = 10.0) | ||
--eps, --liblinearEps LIBLINEAR tolerance of termination criterion (default = 0.1) | ||
SGD/AdaGrad: | ||
-l, --lr, --eta Step size (learning rate) for online optimizers (default = 1.0) | ||
--epochs Number of training epochs for online optimizers (default = 1) | ||
--adagradEps Defines starting step size for AdaGrad (default = 0.001) | ||
Tree: | ||
-a, --arity Arity of tree nodes (default = 2) | ||
--maxLeaves Maximum degree of pre-leaf nodes. (default = 100) | ||
--tree File with tree structure | ||
--treeType Type of a tree to build if file with structure is not provided | ||
tree types: hierarchicalKmeans, huffman, completeKaryInOrder, completeKaryRandom, | ||
balancedInOrder, balancedRandom, onlineComplete | ||
K-Means tree: | ||
--kmeansEps Tolerance of termination criterion of the k-means clustering | ||
used in hierarchical k-means tree building procedure (default = 0.001) | ||
--kmeansBalanced Use balanced K-Means clustering (default = 1) | ||
Prediction: | ||
--topK Predict top-k labels (default = 5) | ||
--threshold Predict labels with probability above the threshold (default = 0) | ||
--thresholds Path to a file with threshold for each label | ||
--setUtility Type of set-utility function for prediction using svbopFull, svbopHf, svbopMips models. | ||
Set-utility functions: uP, uF1, uAlfa, uAlfaBeta, uDeltaGamma | ||
See: https://arxiv.org/abs/1906.08129 | ||
Set-Utility: | ||
--alpha | ||
--beta | ||
--delta | ||
--gamma | ||
Test: | ||
--measures Evaluate test using set of measures (default = "p@1,r@1,c@1,p@3,r@3,c@3,p@5,r@5,c@5") | ||
Measures: acc (accuracy), p (precision), r (recall), c (coverage), hl (hamming loos) | ||
p@k (precision at k), r@k (recall at k), c@k (coverage at k), s (prediction size) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,37 @@ | ||
.. napkinXC documentation master file | ||
Welcome to napkinXC's documentation! | ||
==================================== | ||
|
||
.. note:: Documentation is currently a work in progress! | ||
|
||
napkinXC is an extremely simple and fast library for extreme multi-class and multi-label classification | ||
that implements the following methods both in Python and C++: | ||
|
||
* Probabilistic Label Trees (PLTs) - for multi-label log-time training and prediction, | ||
* Hierarchical softmax (HSM) - for multi-class log-time training and prediction, | ||
* Binary Relevance (BR) - multi-label baseline, | ||
* One Versus Rest (OVR) - multi-class baseline. | ||
|
||
All the methods decompose multi-class and multi-label into the set of binary learning problems. | ||
|
||
|
||
Right now, the detailed descirption of methods and their parameters can be found in this paper: | ||
`Probabilistic Label Trees for Extreme Multi-label Classification <https://arxiv.org/pdf/2009.11218.pdf>`_ | ||
|
||
|
||
|
||
.. toctree:: | ||
:maxdepth: 1 | ||
:caption: Contents: | ||
|
||
quick_start | ||
exe_usage | ||
python_api | ||
|
||
|
||
Indices and tables | ||
------------------ | ||
|
||
* :ref:`genindex` | ||
* :ref:`search` |
Oops, something went wrong.