v0.7.0
This release switches all examples to use cloud hosted GPU accelerated LLM and embedding models from Nvidia API Catalog as default. It also deprecates support to deploy on-prem models using NeMo Inference Framework Container and adds support to deploy accelerated generative AI models across the cloud, data center, and workstation using latest Nvidia NIM-LLM.
Added
- Added model auto download and caching support for
nemo-retriever-embedding-microservice
andnemo-retriever-reranking-microservice
. Updated steps to deploy the services can be found here. - Multimodal RAG Example enhancements
- Moved to the PDF Plumber library for parsing text and images.
- Added
pgvector
vector DB support. - Added support to ingest files with .pptx extension
- Improved accuracy of image parsing by using tesseract-ocr
- Added a new notebook showcasing RAG usecase using accelerated NIM based on-prem deployed models
- Added a new experimental example showcasing how to create a developer-focused RAG chatbot using RAPIDS cuDF source code and API documentation.
- Added a new experimental example demonstrating how NVIDIA Morpheus, NIMs, and RAG pipelines can be integrated to create LLM-based agent pipelines.
Changed
- All examples now use llama3 models from Nvidia API Catalog as default. Summary of updated examples and the model it uses is available here.
- Switched default embedding model of all examples to Snowflake arctic-embed-I model
- Added more verbose logs and support to configure log level for chain server using LOG_LEVEL enviroment variable.
- Bumped up version of
langchain-nvidia-ai-endpoints
,sentence-transformers
package andmilvus
containers - Updated base containers to use ubuntu 22.04 image
nvcr.io/nvidia/base/ubuntu:22.04_20240212
- Added
llama-index-readers-file
as dependency to avoid runtime package installation within chain server.
Deprecated
- Deprecated support of on-prem LLM model deployment using NeMo Inference Framework Container. Developers can use Nvidia NIM-LLM to deploy TensorRT optimized models on-prem and plug them in with existing examples.
- Deprecated kubernetes operator support.
nvolveqa_40k
embedding model was deprecated from Nvidia API Catalog. Updated all notebooks and experimental artifacts to use Nvidia embed-qa-4 model instead.- Removed notebooks numbered 00-04, which used on-prem LLM model deployment using deprecated NeMo Inference Framework Container.