Albert models

Deploy a full OpenAI API with vLLM that supports all embedding models

🔥 News :

[2024-07-11] add .env in docker compose to add VLLM additional variable environments
[2024-07-10] add API swagger for test all endpoints
[2024-07-10] support API_KEY for protect your model API
[2024-07-10] use directly base url /v1 for Langchain integration

Incoming :

Use PyTest for unit test (tests.py file)

vLLM is one of the state of the art libraries for deploying a Large Language Model (LLM) and its API with better generation performance. However, vLLM does not currently support all embeddings models for endpoint /v1/embeddings, although it can be used to deploy an API according to OpenAI conventions (see this discussion).

This repository makes it easy to add the /v1/embeddings endpoint by deploying an embedding model with HuggingFace Text Embeddings Inference (TEI) and serves it all on a single port. The aim of this repository is to have a complete API that's very light, easy to use and maintain !

API offer the following OpenAI endpoints:

/v1/models
/v1/completions
/v1/chat/completions
/v1/embeddings

⚙️ How it works ?

🍿 The swagger

📦 Models

Currently, this architecture support almost all LLM and embeddings models. The return of the /v1/models endpoint adds a new "type" key which takes the value "text-generation" or "text-embeddings-inference" depending on the nature of the model (language or embeddings). These values correspond to the label given to models on Huggingface. Example :

{
    "object": "list", 
    "data": [
        {
            "model": < language model >,
            "type": "text-generation",
            ...
        },
        {
            "model": < embeddings model >,
            "type": "text-embeddings-inference",
            ...
        }
    ]
}

🚀 Quickstart

First, configure a .env file or modify the .env.example file in this repository. For more informations about the configuration, please refer to the configuration section.

Then, run the containers with Docker compose :

docker compose --env-file env.example up --detach

🔧 Configuration

For additional variables of VLLM (see https://docs.vllm.ai/en/stable/serving/env_vars.html), add them in the .env file.

variable	values
EMBEDDINGS_HF_REPO_ID	HuggingFace repository ID of the embeddings model. Please refer to HuggingFace Text Embeddings Inference documentation to find supported models.
LLM_HF_REPO_ID	HuggingFace repository ID of the LLM model. Please refer to vLLM documentation to find supported models.
TEI_ARGS	Arguments for Text Embeddings Inference (format: --arg1 --arg2 ). Please refer to HuggingFace Text Embeddings Inference documentation for more information.
VLLM_ARGS	Arguments for vLLM (format: --arg1 --arg2 ). Please refer to vLLM documentation for more information.
HF_TOKEN	HuggingFace API token for private model on HuggingFace Hub (optional).
API_KEY	API key for protect your model (optional).

🦜 Lanchain integration

You can use the deployed API with Langchain to create embedding vectors for your vector store. For example:

from langchain_community.embeddings import HuggingFaceHubEmbeddings

embeddings = HuggingFaceHubEmbeddings(model="http://localhost:8080/v1")

🔦 Tests

To test if your deployment is up, you can use the following command:

python tests.py --base-url http://localhost:8080 --api-key mysecretkey --debug

⚠️ The vllm container may take several minutes to run, particularly if it has to download the model.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.github/workflows		.github/workflows
assets		assets
fastapi		fastapi
nginx		nginx
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
tests.py		tests.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Albert models

Deploy a full OpenAI API with vLLM that supports all embedding models

⚙️ How it works ?

🍿 The swagger

📦 Models

🚀 Quickstart

🔧 Configuration

🦜 Lanchain integration

🔦 Tests

About

Releases

Packages

Contributors 2

Languages

License

etalab-ia/albert-models

Folders and files

Latest commit

History

Repository files navigation

Albert models

Deploy a full OpenAI API with vLLM that supports all embedding models

⚙️ How it works ?

🍿 The swagger

📦 Models

🚀 Quickstart

🔧 Configuration

🦜 Lanchain integration

🔦 Tests

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages