Releases: aws-neuron/aws-neuron-sdk
Neuron SDK Release - August 29, 2023
This release adds support for Llama 2 model training (tutorial) using neuronx-nemo-megatron library, and adds support for Llama 2 model inference using transformers-neuronx library (tutorial) .
Please follow these instructions in setup guide <setup-guide-index>
to upgrade to latest Neuron release.
Note
Please install transformers-neuronx from https://pip.repos.neuron.amazonaws.com/ to get latest features and improvements.
This release does not support LLama 2 model with Grouped-Query Attention
Neuron SDK Release - August 28, 2023
What’s New
This release introduces support for GPT-NeoX
20B model training in neuronx-distributed
including Zero-1 optimizer capability. It also adds support for Stable Diffusion XL
and CLIP
models inference in torch-neuronx
. Neuron 2.13 also introduces AWS Neuron Reference for Nemo Megatron library supporting distributed training of LLMs like GPT-3 175B
. This release also introduces other new features, performance optimizations, minor enhancements and bug fixes.
This release introduces the following:
What’s New | Details | Instances |
---|---|---|
AWS Neuron Reference for Nemo Megatron library | Modified versions of the open-source packages NeMo and Apex that have been adapted for use with AWS Neuron and AWS EC2 Trn1 instances. GPT-3 model training support ( tutorial ) See more at neuronx-nemo-megatron github repo | Trn1/Trn1n |
Transformers Neuron (transformers-neuronx) for Inference | Latency optimizations for Llama and GPT-2 models inference. Neuron Persistent Cache support (developer guide) See more at Transformers Neuron (transformers-neuronx) release notes | Inf2, Trn1/Trn1n |
Neuron Distributed (neuronx-distributed) for Training | Now Stable, removed Experimental support ZeRO-1 Optimizer support with tensor parallel. (tutorial) Sequence Parallel support. (api guide) GPT-NeoX model training support. (sample script) (tutorial) See more at Neuron Distributed Release Notes (neuronx-distributed) and API Reference Guide (neuronx-distributed ) | Trn1/Trn1n |
Neuron Distributed (neuronx-distributed) for Inference | KV Cache Support for LLM Inference (tutorial) (release notes) | Inf2,Trn1/Trn1n |
PyTorch Neuron (torch-neuronx) | Seedable dropout enabled by default for training camembert-base training script. (sample script) New models inference support that include Stable Diffusion XL , CLIP (clip-vit-base-patch32 , clip-vit-large-patch14 ) , Vision Perceiver , Language Perceiver and T5 | Trn1/Trn1n,Inf2 |
Neuron Tools | New data types support for Neuron Collective Communication Test Utility (NCCOM-TEST) –check option: fp16, bf16, (u)int8, (u)int16, and (u)int32 Neuron SysFS support for FLOP count(flop_count) and connected Neuron Device ids (connected_devices). See Neuron Sysfs User Guide See more at Neuron System Tools | Inf1/Inf2/Trn1/Trn1n |
Neuron Runtime | Runtime version and Capture Time support to NTFF Async DMA copies support to improve Neuron Device copy times for all instance types Logging and error messages improvements for Collectives timeouts and when loading NEFFs. See more at Neuron Runtime Release Notes | Inf1, Inf2, Trn1/Trn1n |
End of Support Announcements and Documentation Updates | Announcing End of support for AWS Neuron reference for Megatron-LM starting Neuron 2.13. See more at AWS Neuron reference for Megatron-LM no longer supported Announcing end of support for torch-neuron version 1.9 starting Neuron 2.14. See more at Announcing end of support for torch-neuron version 1.9 Added TensorFlow 2.x (tensorflow-neuronx) analyze_model API section. See more at TensorFlow 2.x (tensorflow-neuronx) analyze_model API Upgraded numpy version to 1.21.6 in various training scripts for Text Classification Updated bert-japanese training Script to use multilingual-sentiments dataset. See hf-bert-jp <https://github.com/aws-neuron/aws-neuron-samples/tree/master/torch-neuronx/training/hf_bert_jp> _ See more at Neuron Documentation Release Notes |
Inf1, Inf2, Trn1/Trn1n |
Minor enhancements and bug fixes. | See Neuron Components Release Notes | Trn1/Trn1n , Inf2, Inf1 |
Known Issues and Limitations | See 2.13.0 Known Issues and Limitations | Trn1/Trn1n , Inf2, Inf1 |
Release Artifacts | see Release Artifacts | Trn1/Trn1n , Inf2, Inf1 |
For more detailed release notes of the new features and resolved issues, see Neuron Components Release Notes.
[What’s New](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/index.html#id10)Neuron SDK Release - August 19, 2023
Patch release to fix a jemalloc conflict for all Neuron customers that use Ubuntu 22. The previous releases shipped with a dependency on jemalloc that may lead to compilation failures in Ubuntu 22 only. Please follow these instructions in setup guide to upgrade to latest Neuron release.
Neuron SDK Release - August 9, 2023
Patch release to improve reliability of Neuron Runtime for customers using memory constrained instances. The Neuron Runtime has reduced the contiguous memory requirement for initializing the Neuron Cores associated with applications.
This reduction allows bringup when only small amounts of contiguous memory remain on an instance.
Neuron SDK Release - July 19, 2023
What’s New
This release introduces ZeRO-1 optimizer for model training in torch-neuronx
, introduces experimental support for GPT-NeoX
, BLOOM
, Llama
and Llama 2(coming soon)
models in transformers-neuronx
. This release also adds support for model inference serving on Triton Inference Server for Inf2 & Trn1 instances, lazy_load
API and async_load
API for model loading in torch-neuronx
, as well as other new features,
performance optimizations, minor enhancements and bug fixes. This release introduces the following:
What’s New | Details | Instances |
---|---|---|
ZeRO-1 optimizer for model training in torch-neuronx | Support of ZeRO-Stage-1 optimizer ( ZeroRedundancyOptimizer() API) for training models using torch-neuronx See tutorial at ZeRO-1 Tutorial | Inf2, Trn1/Trn1n |
Support for new models and Enhancements in transformers-neuronx | [Experimental] Support for inference of GPT-NeoX, BLOOM and Llama models. [Experimental] Support for Llama 2 coming soon. Please monitor the transformers-neuronx repository for updates. Removed constraints on tp_degree in tensor-parallel configurations for GPT2, OPT, and BLOOM . See more at Transformers Neuron (transformers-neuronx) release notes Added multi-query / multi-group attention support for GPT2. See more at Transformers Neuron (transformers-neuronx) release notes | Inf2, Trn1/Trn1n |
Support for Inf2 and Trn1 instances on Triton Inference Server | Support for Model Inference serving on Triton for Inf2 and Trn1 instances. See more at Triton Server Python Backend See tutorial at Triton on SageMaker - Deploying on Inf2 | Inf2, Trn1 |
Support for new computer vision models | Performance optimizations in Stable Diffusion 2.1 model script and added [experimental] support for Stable Diffusion 1.5 models. [Experimental] Script for training CLIP model for Image Classification. [Experimental] Script for inference of Multimodal perceiver model Please check aws-neuron-samples repository | Inf2, Trn1/Trn1n |
New Features in neuronx-distributed for training | Added parallel cross entropy loss function. See more at API Reference Guide for Tensor Parallelism (neuronx-distributed ) | Trn1/Trn1n |
lazy_load and async_load API for model loading in inference and performance enhancements in torch-neuronx | Added lazy_load and async_load API to accelerate model loading for Inference. See more at PyTorch Neuron (torch-neuronx) Lazy and Asynchronous Loading API Optimize DataParallel API to load onto multiple cores simultaneously when device IDs specified are consecutive. See more at PyTorch Neuron (torch-neuronx) release notes | Inf2, Trn1/Trn1n |
[Experimental]Asynchronous Execution support and Enhancements in Neuron Runtime | Added experimental asynchronous execution feature which can reduce latency by roughly 12% for training workloads. See more at Neuron Runtime Configuration AllReduce with All-to-all communication pattern enabled for 16 ranks on TRN1/TRN1N within the instance (intranode) See more at Neuron Runtime Release Notes | Inf1, Inf2, Trn1/Trn1n |
Support for distribution_strategy compiler option in neuronx-cc | Support for optional --distribution_strategy compiler option to enable compiler specific optimizations based on distribution strategy used. See more at Neuron Compiler CLI Reference Guide (neuronx-cc) | Inf2, Trn1/Trn1n |
New Micro Benchmarking Performance User Guide and Documentation Updates | Added best practices user guide for benchmarking performance of Neuron devices. See more at Benchmarking Guide and Helper scripts Announcing end of support for Ubuntu 18. See more at Announcing end of support for Ubuntu 18 Removed support for Distributed Data Parallel(DDP) Tutorial. Improved sidebar navigation in Documentation. See more at Neuron Documentation Release Notes | Inf1, Inf2, Trn1/Trn1n |
Minor enhancements and bug fixes. | See Neuron Components Release Notes | Trn1/Trn1n , Inf2, Inf1 |
Known Issues and Limitations | See 2.12.0 Known Issues and Limitations | Trn1/Trn1n , Inf2, Inf1 |
Release Artifacts | see Release Artifacts | Trn1/Trn1n , Inf2, Inf1 |
For more detailed release notes of the new features and resolved issues, see Neuron Components Release Notes.
To learn about the model architectures currently supported on Inf1, Inf2, Trn1 and Trn1n instances, please see Model Architecture Fit Guidelines.
Neuron SDK Release - June 14, 2023
What’s New
This release introduces Neuron Distributed, a new python library to simplify training and inference of large models, improving usability with features like S3 model caching, standalone profiler tool, support for Ubuntu22, as well as other new features, performance optimizations, minor enhancements and bug fixes. This release introduces the following:
What’s New | Details | Instances |
---|---|---|
New Features and Performance Enhancements in transformers-neuronx | Support for int8 inference. See example at int8 weight storage supportImproved prompt context encoding performance. See more at Transformers Neuron (transformers-neuronx) Developer GuideImproved collective communications performance for Tensor Parallel inference on Inf2 and Trn1.See more at Transformers Neuron (transformers-neuronx) release notes | Inf2, Trn1/Trn1n |
Neuron Profiler Tool | Support for as a stand alone tool to profile and get visualized insights on execution of models on Trainium and Inferentia devices.See more at Neuron Profile User Guide | Inf1, Inf2, Trn1/Trn1n |
Neuron Compilation Cache through S3 | Support for sharing compiled models across Inf2 and Trn1 nodes through S3See more at PyTorch Neuron neuron_parallel_compile CLI (torch-neuronx) | Inf2, Trn1/Trn1n |
New script to scan a model for supported/unsupported operators | Script to scan a model for supported/unsupported operators before training, scan output includes supported and unsupported operators at both XLA operators and PyTorch operators level.See a sample tutorial at Analyze for Training Tutorial | Inf2, Trn1/Trn1n |
Neuron Distributed Library [Experimental] | New Python Library based on PyTorch enabling distributed training and inference of large models.Initial support for tensor-parallelism.See more at Neuron Distributed [Experimental] | Inf2, Trn1/Trn1n |
Neuron Calculator and Documentation Updates | New Neuron Calculator Documentation section to help determine number of Neuron Cores needed for LLM Inference.Added App Note Generative LLM inference with NeuronSee more at Neuron Documentation Release Notes | Inf1, Inf2, Trn1/Trn1n |
Enhancements to Neuron SysFS | Support for detailed breakdown of memory usage across the NeuronCoresSee more at Neuron Sysfs User Guide | Inf1, Inf2, Trn1/Trn1n |
Support for Ubuntu 22 | See more at Setup Guide for setup instructions on Ubuntu22 | Inf1, Inf2, Trn1/Trn1n |
Minor enhancements and bug fixes. | See Neuron Components Release Notes | Trn1/Trn1n , Inf2, Inf1 |
Release Artifacts | see Release Artifacts | Trn1/Trn1n , Inf2, Inf1 |
For more detailed release notes of the new features and resolved issues, see Neuron Components Release Notes.
To learn about the model architectures currently supported on Inf1, Inf2, Trn1 and Trn1n instances, please see Model Architecture Fit Guidelines.
Neuron SDK Release - May 1, 2023
What’s New
This release introduces new features, performance optimizations, minor enhancements and bug fixes. This release introduces the following:
What’s New | Details | Instances |
---|---|---|
Initial support for computer vision models inference | Added Stable Diffusion 2.1 model script for Text to Image Generation, Added VGG model script for Image Classification Task, Added UNet model script for Image Segmentation Task, Please check aws-neuron-samples repository | Inf2, Trn1/Trn1n |
Profiling support in PyTorch Neuron(torch-neuronx) for Inference with TensorBoard | See more at Profiling PyTorch Neuron (torch-neuronx) with TensorBoard | Inf2, Trn1/Trn1n |
New Features and Performance Enhancements in transformers-neuronx | Support for the HuggingFace generate function, Model Serialization support including model saving, loading, and weight swapping, Improved prompt context encoding performance. See transformers_neuronx_readme for examples and usage, See more at Transformers Neuron (transformers-neuronx) release notes | Inf2, Trn1/Trn1n |
Support models larger than 2GB in TensorFlow 2.x Neuron (tensorflow-neuronx) | See Special Flags for details. (tensorflow-neuronx) | Trn1/Trn1n, Inf2 |
Support models larger than 2GB in TensorFlow 2.x Neuron (tensorflow-neuron) | See Special Flags for details. (tensorflow-neuron) | Inf1 |
Performance Enhancements in PyTorch C++ Custom Operators [Experimental] | Support for using multiple GPSIMD Cores in Custom C++ Operators, See Custom Operators (Experimental) | Trn1/Trn1n |
Weight Deduplication Feature (Inf1) | Support for Sharing weights when loading multiple instance versions of the same model on different NeuronCores.See more at Neuron Runtime Configuration | Inf1 |
nccom-test - Collective Communication Benchmarking Tool | Supports enabling benchmarking sweeps on various Neuron Collective Communication operations. See NCCOM-TEST (Beta) for more details. | Trn1/Trn1n , Inf2 |
Minor enhancements and bug fixes. | See Neuron Components Release Notes | Trn1/Trn1n , Inf2, Inf1 |
Release Artifacts | see Release Artifacts | Trn1/Trn1n , Inf2, Inf1 |
For more detailed release notes of the new features and resolved issues, see Neuron Components Release Notes.
To learn about the model architectures currently supported on Inf1, Inf2, Trn1 and Trn1n instances, please see Model Architecture Fit Guidelines.
Neuron SDK Release - April 19, 2023
Minor patch release to add support for deserialized torchscript model compilation and support for multi-node training in EKS. Fixes included in this release are critical to enable training and deploying models with Amazon Sagemaker or Amazon EKS.
Neuron SDK Release - March 28, 2023
What’s New
This release adds support for EC2 Trn1n instances, introduces new features, performance optimizations, minor enhancements and bug fixes. This release introduces the following:
What’s New | Details | Instances |
---|---|---|
Support for EC2 Trn1n instances | Updated Neuron Runtime for Trn1n instances Overall documentation update to include Trn1n instances | Trn1n |
New Analyze API in PyTorch Neuron (torch-neuronx) | A new API that return list of supported and unsupported PyTorch operators for a model. See PyTorch Neuron (torch-neuronx) Analyze API for Inference | Trn1, Inf2 |
Support models that are larger than 2GB in PyTorch Neuron (torch-neuron) on Inf1 | See separate_weights flag to torch_neuron.trace() to support models that are larger than 2GB | Inf1 |
Performance Improvements | Up to 10% higher throughput when training GPT3 6.7B model on multi-node | Trn1 |
Dynamic Batching support in TensorFlow 2.x Neuron (tensorflow-neuronx) | See Special Flags for details. | Trn1, Inf2 |
NeuronPerf support for Trn1/Inf2 instances | Added Trn1/Inf2 support for PyTorch Neuron (torch-neuronx) and TensorFlow 2.x Neuron (tensorflow-neuronx) | Trn1, Inf2 |
Hierarchical All-Reduce and Reduce-Scatter collective communication | Added support for hierarchical All-Reduce and Reduce-Scatter in Neuron Runtime to enable better scalability of distributed workloads . | Trn1, Inf2 |
New Tutorials added | Added tutorial to fine-tune T5 model Added tutorial to demonstrate use of Libtorch with PyTorch Neuron (torch-neuronx) for inference [html] | Trn1, Inf2 |
New sample scripts for 3D Parallelism training in Megatron-LM reference for Neuron | See GPT3-65b and GPT3-175b examples. | Trn1, Inf2 |
Minor enhancements and bug fixes. | See Neuron Components Release Notes | Trn1, Inf2, Inf1 |
Release included packages | see Release Content | Trn1, Inf2, Inf1 |
For more detailed release notes of the new features and resolved issues, see Neuron Components Release Notes.
To learn about the model architectures currently supported on Inf1, Inf2, Trn1 and Trn1n instances, please see Model Architecture Fit Guidelines.
Neuron SDK Release - February 24, 2023
What’s New
This release adds support for EC2 Inf2 instances, introduces initial inference support with TensorFlow 2.x Neuron (tensorflow-neuronx
) on Trn1 and Inf2, and introduces minor enhancements and bug fixes.
This release introduces the following:
What’s New | Details |
---|---|
Support for EC2 Inf2 instances | Inference support for Inf2 instances in PyTorch Neuron (torch-neuronx) Inference support for Inf2 instances in TensorFlow 2.x Neuron (tensorflow-neuronx) Overall documentation update to include Inf2 instances |
TensorFlow 2.x Neuron (tensorflow-neuronx) support | This releases introduces initial inference support with TensorFlow 2.x Neuron (tensorflow-neuronx) on Trn1 and Inf2 |
New Neuron GitHub samples | New sample scripts for deploying LLM models with transformer-neuronx under aws-neuron-samples GitHub repository. New sample scripts for deploying models with torch-neuronx under aws-neuron-samples repository GitHub repository. |
Minor enhancements and bug fixes. | See Neuron Components Release Notes |
Release included packages | see Release Content |
For more detailed release notes of the new features and resolved issues, see Neuron Components Release Notes.
[What’s New](https://awsdocs-neuron-staging.readthedocs-hosted.com/en/release_2.8.0/release-notes/index.html#id6)