Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix outdated Dockerfile and Flask app #251

Merged
merged 26 commits into from
Dec 1, 2023
Merged
Show file tree
Hide file tree
Changes from 7 commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
1ebcf5d
Add requirements needed for laser_encoders
Paulooh007 Sep 30, 2023
4def6e5
Add script to use laser_encoder
Paulooh007 Sep 30, 2023
4cadec7
Update flask app to use laser_encoders
Paulooh007 Sep 30, 2023
4256bbc
Update Dockerfile to build image for laser_encoder
Paulooh007 Sep 30, 2023
473ec93
Update README to setup docker
Paulooh007 Sep 30, 2023
0aa213d
Update README
Paulooh007 Sep 30, 2023
1a9937c
style: Format code and sort imports using black and isort
Paulooh007 Sep 30, 2023
cbc82ed
Update Dockerfile to include maintainer
Paulooh007 Sep 30, 2023
dd4e527
Update README for docker setup
Paulooh007 Oct 1, 2023
91ca272
MLH fellowship contribution: adding the `laser_encoders` module (#249)
avidale Nov 21, 2023
0837068
Remove unesssary file in docker directory
Paulooh007 Nov 21, 2023
dc4bf52
Enable pip installing laser_encoders from local directory
Paulooh007 Nov 21, 2023
77ff495
Fix pip install error while building docker container
Paulooh007 Nov 21, 2023
f201332
Add error handling for unsupported languages in /vectorize endpoint
Paulooh007 Nov 21, 2023
7db9310
Add language model download to Docker build process
Paulooh007 Nov 21, 2023
2522e03
Merge remote-tracking branch 'upstream/main' into docker-laser
Paulooh007 Nov 25, 2023
429a908
Create cache for encoder to improve subsequent request speed
Paulooh007 Nov 25, 2023
7bad4a9
Add build arguments to predownload encoders and tokenizers
Paulooh007 Nov 25, 2023
ec76ed9
Update README on usage
Paulooh007 Nov 25, 2023
6150978
Update README
Paulooh007 Nov 25, 2023
81d0385
Change default lang to 2 letter code
Paulooh007 Nov 27, 2023
2202aab
Update README to indicate language used in default build
Paulooh007 Nov 27, 2023
e059d55
Update Dockerfile to use toml file instead of requirements file
Paulooh007 Nov 27, 2023
f51fb0a
Improve caching for laser2 languages
Paulooh007 Nov 28, 2023
2f07eb2
Fix faulty caching logic
Paulooh007 Nov 29, 2023
b4a67cd
Merge branch 'facebookresearch:main' into docker-laser
Paulooh007 Dec 1, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
56 changes: 9 additions & 47 deletions docker/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,62 +1,24 @@
FROM continuumio/miniconda3

MAINTAINER Gilles Bodart <[email protected]>
# Install build-essential (compiler and development tools)
Paulooh007 marked this conversation as resolved.
Show resolved Hide resolved
RUN apt-get update && \
apt-get install -y build-essential && \
rm -rf /var/lib/apt/lists/*

RUN conda create -n env python=3.6
RUN conda create -n env python=3.9
avidale marked this conversation as resolved.
Show resolved Hide resolved
RUN echo "source activate env" > ~/.bashrc
ENV PATH /opt/conda/envs/env/bin:$PATH

RUN apt-get -qq -y update
RUN apt-get -qq -y upgrade
RUN apt-get -qq -y install \
gcc \
g++ \
wget \
curl \
git \
make \
unzip \
sudo \
vim

# Use C.UTF-8 locale to avoid issues with ASCII encoding
ENV LC_ALL=C.UTF-8
ENV LANG=C.UTF-8

# Set the working directory to /app
WORKDIR /app

COPY ./requirements.txt /app/requirements.txt

# Install any needed packages specified in requirements.txt
RUN pip install --trusted-host pypi.python.org -r requirements.txt --verbose


# Download LASER from FB
RUN git clone https://github.com/facebookresearch/LASER.git

ENV LASER /app/LASER
WORKDIR $LASER

RUN bash ./install_models.sh


#Installing FAISS

RUN conda install --name env -c pytorch faiss-cpu -y

RUN bash ./install_external_tools.sh

COPY ./decode.py $LASER/tasks/embed/decode.py


# Make port 80 available to the world outside this container
WORKDIR /app
COPY ./requirements.txt /app/requirements.txt

RUN echo "Hello World" > test.txt
RUN pip3 install --upgrade pip
RUN pip3 install -r requirements.txt --verbose

RUN $LASER/tasks/embed/embed.sh test.txt en test_embed.raw
RUN python $LASER/tasks/embed/decode.py test_embed.raw
COPY ./encode.py /app/encode.py

#Open the port 80
EXPOSE 80
Expand Down
61 changes: 52 additions & 9 deletions docker/README.md
Original file line number Diff line number Diff line change
@@ -1,19 +1,62 @@
## Docker
## LASER Docker Image

An image docker has been created to help you with the settings of an environment here are the step to follow :
This image provides a convenient way to run LASER in a Docker container.
To build the image, run the following command from the root of the LASER directory:

* Open a command prompt on the root of your LASER project
* Execute the command `docker build --tag=laser docker`
* Once the image is built run `docker run -it laser`
```
docker build --tag=laser docker
avidale marked this conversation as resolved.
Show resolved Hide resolved
```
Once the image is built, you can run it with the following command:

A REST server on top of the embed task is under developement,
to run it you'll have to expose a local port [CHANGEME_LOCAL_PORT] by executing the next line instead of the last command. It'll overinde the command line entrypoint of your docker container.
```
docker run -it laser
```
**Note:** If you want to expose a local port to the REST server on top of the embed task, you can do so by executing the following command instead of the last command:

* `docker run -p [CHANGEME_LOCAL_PORT]:80 -it laser python app.py`
```
docker run -it -p [CHANGEME_LOCAL_PORT]:80 laser
```
This will override the command line entrypoint of the Docker container.

Example:

```
docker run -it -p 8081:80 laser
```

This Flask server will serve a REST Api that can be use by calling your server with this URL :

* http://127.0.0.1:[CHANGEME_LOCAL_PORT]/vectorize?q=[YOUR_SENTENCE_URL_ENCODED]&lang=[LANGUAGE]
```
http://127.0.0.1:[CHANGEME_LOCAL_PORT]/vectorize?q=[YOUR_SENTENCE_URL_ENCODED]&lang=[LANGUAGE]
```

Example:

```
http://127.0.0.1:8081/vectorize?q=ki%20lo%20'orukọ%20ẹ&lang=yor
```

Sample response:
```
{
"content": "ki lo 'orukọ ẹ",
"embedding": [
[
-0.10241681337356567,
0.11120740324258804,
-0.26641348004341125,
-0.055699944496154785,
....
....
....
-0.034048307687044144,
0.11005636304616928,
-0.3238321840763092,
-0.060631975531578064,
-0.19269055128097534,
]
}
```

Here is an example of how you can send requests to it with python:

Expand Down
85 changes: 26 additions & 59 deletions docker/app.py
Original file line number Diff line number Diff line change
@@ -1,78 +1,45 @@
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from flask import Flask, request, jsonify
import os
import socket
import tempfile
from pathlib import Path

import numpy as np
from LASER.source.lib.text_processing import Token, BPEfastApply
from LASER.source.embed import *
from flask import Flask, jsonify, request
from laser_encoders import initialize_encoder, initialize_tokenizer

app = Flask(__name__)
app.config['JSON_AS_ASCII'] = False


@app.route("/")
def root():
print("/")
html = "<h3>Hello {name}!</h3>" \
"<b>Hostname:</b> {hostname}<br/>"
html = "<h3>Hello {name}!</h3>" "<b>Hostname:</b> {hostname}<br/>"
return html.format(name=os.getenv("LASER", "world"), hostname=socket.gethostname())


@app.route("/vectorize")
@app.route("/vectorize", methods=["GET"])
def vectorize():
content = request.args.get('q')
lang = request.args.get('lang')
embedding = ''
if lang is None or not lang:
lang = "en"
# encoder
model_dir = Path(__file__).parent / "LASER" / "models"
encoder_path = model_dir / "bilstm.93langs.2018-12-26.pt"
bpe_codes_path = model_dir / "93langs.fcodes"
print(f' - Encoder: loading {encoder_path}')
encoder = SentenceEncoder(encoder_path,
max_sentences=None,
max_tokens=12000,
sort_kind='mergesort',
cpu=True)
with tempfile.TemporaryDirectory() as tmp:
tmpdir = Path(tmp)
ifname = tmpdir / "content.txt"
bpe_fname = tmpdir / 'bpe'
bpe_oname = tmpdir / 'out.raw'
with ifname.open("w") as f:
f.write(content)
if lang != '--':
tok_fname = tmpdir / "tok"
Token(str(ifname),
str(tok_fname),
lang=lang,
romanize=True if lang == 'el' else False,
lower_case=True,
gzip=False,
verbose=True,
over_write=False)
ifname = tok_fname
BPEfastApply(str(ifname),
str(bpe_fname),
str(bpe_codes_path),
verbose=True, over_write=False)
ifname = bpe_fname
EncodeFile(encoder,
str(ifname),
str(bpe_oname),
verbose=True,
over_write=False,
buffer_size=10000)
dim = 1024
X = np.fromfile(str(bpe_oname), dtype=np.float32, count=-1)
X.resize(X.shape[0] // dim, dim)
embedding = X
body = {'content': content, 'embedding': embedding.tolist()}
content = request.args.get("q")
lang = request.args.get(
"lang", "en"
avidale marked this conversation as resolved.
Show resolved Hide resolved
) # Default to English if 'lang' is not provided

if content is None:
return jsonify({"error": "Missing input content"})

encoder = initialize_encoder(lang=lang)
avidale marked this conversation as resolved.
Show resolved Hide resolved
avidale marked this conversation as resolved.
Show resolved Hide resolved
tokenizer = initialize_tokenizer(lang=lang)
avidale marked this conversation as resolved.
Show resolved Hide resolved

# Tokenize the input content
tokenized_sentence = tokenizer.tokenize(content)

# Encode the tokenized sentence
embeddings = encoder.encode_sentences([tokenized_sentence])
embeddings_list = embeddings.tolist()

body = {"content": content, "embedding": embeddings_list}
return jsonify(body)


if __name__ == "__main__":
app.run(debug=True, port=80, host='0.0.0.0')
app.run(debug=True, port=80, host="0.0.0.0")
9 changes: 9 additions & 0 deletions docker/encode.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
from laser_encoders import initialize_encoder, initialize_tokenizer
avidale marked this conversation as resolved.
Show resolved Hide resolved

tokenizer = initialize_tokenizer(lang="yor")
tokenized_sentence = tokenizer.tokenize("Eku aro")

encoder = initialize_encoder(lang="yor")
embeddings = encoder.encode_sentences([tokenized_sentence])

print("Embeddings Shape", embeddings.shape)
18 changes: 13 additions & 5 deletions docker/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,6 +1,14 @@
Flask
scipy
numpy
Cython
fairseq==0.12.2
numpy==1.25.0
pytest==7.4.0
Requests==2.31.0
sacremoses==0.0.53
sentencepiece==0.1.99
tqdm==4.65.0
Flask==2.3.3

--extra-index-url https://download.pytorch.org/whl/cpu
torch
transliterate

--extra-index-url https://test.pypi.org/simple/
laser-encoders==0.0.3
avidale marked this conversation as resolved.
Show resolved Hide resolved