Welcome to the reproducible benchmark recipes repository for GPUs! This repository contains recipes for reproducing training and serving benchmarks for large machine learning models using GPUs on Google Cloud.
- Identify your requirements: Determine the model, GPU type, workload, framework, and orchestrator you are interested in.
- Select a recipe: Based on your requirements use the Benchmark support matrix to find a recipe that meets your needs.
- Follow the recipe: each recipe will provide you with procedures to complete the following tasks:
- Prepare your environment
- Run the benchmark
- Analyze the benchmarks results. This includes not just the results but detailed logs for further analysis
Models | GPU Machine Type | Framework | Workload Type | Orchestrator | Link to the recipe |
---|---|---|---|---|---|
GPT3-175B | A3 Mega (NVIDIA H100) | NeMo | Pre-training | GKE | Link |
Llama-3-70B | A3 Mega (NVIDIA H100) | NeMo | Pre-training | GKE | Link |
Llama-3.1-70B | A3 Mega (NVIDIA H100) | NeMo | Pre-training | GKE | Link |
Mixtral-8-7B | A3 Mega (NVIDIA H100) | NeMo | Pre-training | GKE | Link |
- training/: Contains recipes to reproduce training benchmarks with GPUs.
- src/: Contains shared dependencies required to run benchmarks, such as Docker and Helm charts.
- docs/: Contains supporting documentation for the recipes, such as explanation of benchmark methodologies or configurations.
If you have any questions or if you found any problems with this repository, please report through GitHub issues.
This is not an officially supported Google product. The code in this repository is for demonstrative purposes only.