v0.6.0
This release adds ability to switch between API Catalog models and on-prem models using NIM-LLM and adds documentation on how to build an RAG application from scratch. It also releases a containerized end to end RAG evaluation application integrated with RAG chain-server APIs.
Added
- Ability to switch between API Catalog models to on-prem models using NIM-LLM.
- New API endpoint
/health
- Provides a health check for the chain server.
- Containerized evaluation application for RAG pipeline accuracy measurement.
- Observability support for langchain based examples.
- New Notebooks
- Added Chat with NVIDIA financial data notebook.
- Added notebook showcasing langgraph agent handling.
- A simple rag example template showcasing how to build an example from scratch.
Changed
- Renamed example
csv_rag
to structured_data_rag - Model Engine name update
nv-ai-foundation
andnv-api-catalog
llm engine are renamed tonvidia-ai-endpoints
nv-ai-foundation
embedding engine is renamed tonvidia-ai-endpoints
- Embedding model update
developer_rag
example uses UAE-Large-V1 embedding model.- Using
ai-embed-qa-4
for api catalog examples instead ofnvolveqa_40k
as embedding model
- Ingested data now persists across multiple sessions.
- Updated langchain-nvidia-endpoints to version 0.0.11, enabling support for models like llama3.
- File extension based validation to throw error for unsupported files.
- The default output token length in the UI has been increased from 250 to 1024 for more comprehensive responses.
- Stricter chain-server API validation support to enhance API security
- Updated version of llama-index, pymilvus.
- Updated pgvector container to
pgvector/pgvector:pg16
- LLM Model Updates
- Multiturn Chatbot now uses
ai-mixtral-8x7b-instruct
model for response generation. - Structured data rag now uses
ai-llama3-70b
for response and code generation.
- Multiturn Chatbot now uses