This repository provides an end-to-end example of using LLMOps practices on Amazon SageMaker for large language models (LLMs). The repository demonstrates a sample LLMOps pipeline for training, optimizing, deploying, monitoring, and managing LLMs on SageMaker using infrastructure as code principles.
Currently implemented:
End-to-End:
Inference:
- Deploy Llama 3 on Amazon SageMaker
- Deploy Llama 3.2 Vision on Amazon SageMaker
- Deploy Mixtral 8x7B on Amazon SageMaker
- Deploy QwQ-32B-Preview on Amazon SageMaker
- Scale LLM Inference on Amazon SageMaker with Multi-Replica Endpoints
Training:
The repository currently contains:
scripts/
: Scripts for training and deploying LLMs on SageMakernotebooks/
: Examples and tutorials for using the pipelinedemo/
: Demo applications and utilities for testing deployed modelsassets/
: Images and other static assets used in documentation
Before we can start make sure you have met the following requirements:
- AWS Account with appropriate service quotas
- AWS CLI installed
- AWS IAM user configured in CLI with permission to create and manage SageMaker resources
- Hugging Face account for accessing gated models (e.g. Llama)
Contributions are welcome! Please open issues and pull requests.
This repository is licensed under the MIT License.