diff --git a/.gitignore b/.gitignore index 16290d9e..95098827 100644 --- a/.gitignore +++ b/.gitignore @@ -10,3 +10,6 @@ tasks/xnli/XNLI-1.0* tasks/xnli/multinli_1.0* .??*swp .idea +__pycache__ +nllb +dist diff --git a/README.md b/README.md index 96d96ff0..526f9632 100644 --- a/README.md +++ b/README.md @@ -3,6 +3,7 @@ LASER is a library to calculate and use multilingual sentence embeddings. **NEWS** +* 2023/11/16 Released [**laser_encoders**](laser_encoders), a pip-installable package supporting LASER-2 and LASER-3 models * 2023/06/26 [**xSIM++**](https://arxiv.org/abs/2306.12907) evaluation pipeline and data [**released**](tasks/xsimplusplus/README.md) * 2022/07/06 Updated LASER models with support for over 200 languages are [**now available**](nllb/README.md) * 2022/07/06 Multilingual similarity search (**xsim**) evaluation pipeline [**released**](tasks/xsim/README.md) @@ -26,7 +27,27 @@ a language family which is covered by other languages. A detailed description of how the multilingual sentence embeddings are trained can be found [here](https://arxiv.org/abs/2205.12654), together with an experimental evaluation. -## Dependencies +## The core sentence embedding package: `laser_encoders` +We provide a package `laser_encoders` with minimal dependencies. +It supports LASER-2 (a single encoder for the languages listed [below](#supported-languages)) +and LASER-3 (147 language-specific encoders described [here](nllb/README.md)). + +The package can be installed simply with `pip install laser_encoders` and used as below: + +```python +from laser_encoders import LaserEncoderPipeline +encoder = LaserEncoderPipeline(lang="eng_Latn") +embeddings = encoder.encode_sentences(["Hi!", "This is a sentence encoder."]) +print(embeddings.shape) # (2, 1024) +``` + +The laser_encoders [readme file](laser_encoders) provides more examples of its installation and usage. + +## The full LASER kit +Apart from the `laser_encoders`, we provide support for LASER-1 (the original multilingual encoder) +and for various LASER applications listed below. + +### Dependencies * Python >= 3.7 * [PyTorch 1.0](http://pytorch.org/) * [NumPy](http://www.numpy.org/), tested with 1.15.4 @@ -42,7 +63,8 @@ be found [here](https://arxiv.org/abs/2205.12654), together with an experimental * [pandas](https://pypi.org/project/pandas), data analysis toolkit (`pip install pandas`) * [Sentencepiece](https://github.com/google/sentencepiece), subword tokenization (installed automatically) -## Installation +### Installation +* install the `laser_encoders` package by e.g. `pip install -e .` for installing it in the editable mode * set the environment variable 'LASER' to the root of the installation, e.g. `export LASER="${HOME}/projects/laser"` * download encoders from Amazon s3 by e.g. `bash ./nllb/download_models.sh` diff --git a/install_external_tools.sh b/install_external_tools.sh index 9fba8417..6aee045f 100755 --- a/install_external_tools.sh +++ b/install_external_tools.sh @@ -181,6 +181,10 @@ InstallMecab () { # ################################################################### +echo "Installing the laser_encoders package in editable mode" + +pip install -e . + echo "Installing external tools" InstallMosesTools diff --git a/laser_encoders/README.md b/laser_encoders/README.md index 4c508824..8a35c8d7 100644 --- a/laser_encoders/README.md +++ b/laser_encoders/README.md @@ -17,10 +17,15 @@ You can find a full list of requirements [here](requirements.txt) ## Installation -You can install laser_encoders using pip: +You can install `laser_encoders` package from PyPI: ```sh - pip install laser_encoders +pip install laser_encoders +``` + +Alternatively, you can install it from a local clone of this repository, in editable mode: +```sh +pip install . -e ``` ## Usage