Compute Embeddings from ML models

This repository aims at setting up a Nvidia's Triton Inference server which simplifies the deployment of AI models at scale in production. It natively supports multiple framework backends like TensorFlow, PyTorch, ONNX Runtime, Python, and even custom backends.