Releases: aws-neuron/aws-neuron-sdk
Neuron SDK Release - April 1, 2024
What's New
Neuron 2.18 release introduces stable support (out of beta) for PyTorch 2.1, introduces new features and performance improvements to LLM training and inference, and updates Neuron DLAMIs and Neuron DLCs to support this release (Neuron 2.18).
Training highlights: LLM model training user experience using NeuronX Distributed (NxD) is improved by introducing asynchronous checkpointing. This release also adds support for auto partitioning pipeline parallelism in NxD and introduces Pipeline Parallelism in PyTorch Lightning Trainer (beta).
Inference highlights: Speculative Decoding support (beta) in TNx library improves LLM inference throughput and output token latency(TPOT) by up to 25% (for LLMs such as Llama-2-70B). TNx also improves weight loading performance by adding support for SafeTensor checkpoint format. Inference using Bucketing in PyTorch NeuronX and NeuronX Distributed is improved by introducing auto-bucketing feature. This release also adds a new sample for Mixtral-8x7B-v0.1
and mistralai/Mistral-7B-Instruct-v0.2
in TNx.
Neuron DLAMI and Neuron DLC support highlights: This release introduces new Multi Framework DLAMI for Ubuntu 22 that customers can use to easily get started with latest Neuron SDK on multiple frameworks that Neuron supports as well as SSM parameter support for DLAMIs to automate the retrieval of latest DLAMI ID in cloud automation flows. Support for new Neuron Training and Inference Deep Learning containers (DLCs) for PyTorch 2.1, as well as a new dedicated GitHub repository to host Neuron container dockerfiles and a public Neuron container registry to host Neuron container images.
Neuron SDK Release - February 13, 2024
What's New
Neuron 2.17 release improves small collective communication operators (smaller than 16MB) by up to 30%, which improves large language model (LLM) Inference performance by up to 10%. This release also includes improvements in :ref:`Neuron Profiler <neuron-profile-ug>` and other minor enhancements and bug fixes.
For more detailed release notes of the new features and resolved issues, see :ref:`components-rn`.
To learn about the model architectures currently supported on Inf1, Inf2, Trn1 and Trn1n instances, please see :ref:`model_architecture_fit`.
Neuron Components Release Notes
Inf1, Trn1/Trn1n and Inf2 common packages
Component | Instance/s | Package/s | Details |
---|---|---|---|
Neuron Runtime | Trn1/Trn1n, Inf1, Inf2 | Trn1/Trn1n: aws-neuronx-runtime-lib (.deb, .rpm) Inf1: Runtime is linked into the ML frameworks packages | :ref:neuron-runtime-rn |
Neuron Runtime Driver | Trn1/Trn1n, Inf1, Inf2 | aws-neuronx-dkms (.deb, .rpm) | :ref:neuron-driver-release-notes |
Neuron System Tools | Trn1/Trn1n, Inf1, Inf2 | aws-neuronx-tools (.deb, .rpm) | :ref:neuron-tools-rn |
Containers | Trn1/Trn1n, Inf1, Inf2 | aws-neuronx-k8-plugin (.deb, .rpm) aws-neuronx-k8-scheduler (.deb, .rpm) aws-neuronx-oci-hooks (.deb, .rpm) | :ref:neuron-k8-rn :ref:neuron-containers-release-notes |
NeuronPerf (Inference only) | Trn1/Trn1n, Inf1, Inf2 | neuronperf (.whl) | :ref:neuronperf_rn |
TensorFlow Model Server Neuron | Trn1/Trn1n, Inf1, Inf2 | tensorflow-model-server-neuronx (.deb, .rpm) | :ref:tensorflow-modeslserver-neuronx-rn |
Neuron Documentation | Trn1/Trn1n, Inf1, Inf2 | :ref:neuron-documentation-rn |
Neuron SDK Release - January 18, 2024
Patch release with compiler bug fixes, updates to Neuron Device Plugin and Neuron Kubernetes Scheduler .
Neuron SDK Release - Decemeber 21, 2023
What’s New
Neuron 2.16 adds support for Llama-2-70B training and inference, upgrades to PyTorch 2.1 (beta) and adds new support for PyTorch Lightning Trainer (beta) as well as performance improvements and adding Amazon Linux 2023 support.
Training highlights: NeuronX Distributed library LLM models training performance is improved by up to 15%. LLM model training user experience is improved by introducing support of PyTorch Lightning Trainer (beta), and a new model optimizer wrapper which will minimize the amount of changes needed to partition models using NeuronX Distributed primitives.
Inference highlights: PyTorch inference now allows to dynamically swap different fine-tuned weights for an already loaded model, as well as overall improvements of LLM inference throughput and latency with Transformers NeuronX. Two new reference model samples for LLama-2-70b and Mistral-7b model inference.
User experience: This release introduces two new capabilities: A new tool, Neuron Distributed Event Tracing (NDET) which improves debuggability, and the support of profiling collective communication operators in the Neuron Profiler tool.
More release content can be found in the table below and each component release notes.
What’s New | Details | Instances |
---|---|---|
Transformers NeuronX (transformers-neuronx) for Inference | [Beta] Support for Grouped Query Attention(GQA). See developer guide [Beta] Support for Llama-2-70b model inference using Grouped Query Attention. See tutorial [Beta] Support for Mistral-7B-Instruct-v0.1 model inference. See sample code See more at Transformers Neuron (transformers-neuronx) release notes | Inf2, Trn1/Trn1n |
NeuronX Distributed (neuronx-distributed) for Training | [Beta] Support for PyTorch Lightning to train models using tensor parallelism and data parallelism . See api guide , developer guide and tutorial Support for Model and Optimizer Wrapper training API that handles the parallelization. See api guide and Developer guide for model and optimizer wrapper (neuronx-distributed ) New save_checkpoint and load_checkpoint APIs to save/load checkpoints during distributed training. See Developer guide for save/load checkpoint (neuronx-distributed ) Support for a new Query-Key-Value(QKV) module that provides the ability to replicate the Key Value heads and adds flexibility to use higher Tensor parallel degree during Training. See api guide and tutorial See more at Neuron Distributed Release Notes (neuronx-distributed) | Trn1/Trn1n |
NeuronX Distributed (neuronx-distributed) for Inference | Support weight-deduplication amongst TP shards by giving ability to save weights separately than in NEFF files. See developer guide Llama-2-7B model inference script ([html] [notebook]) See more at Neuron Distributed Release Notes (neuronx-distributed) and API Reference Guide (neuronx-distributed ) | Inf2,Trn1/Trn1n |
PyTorch NeuronX (torch-neuronx) | [Beta]Support for] PyTorch 2.1. See Introducing PyTorch 2.1 Support (Beta) . See llama-2-13b inference sample. Support to separate out model weights from NEFF files and new replace_weights API to replace the separated weights. See PyTorch Neuron (torch-neuronx) Weight Replacement API for Inference and PyTorch NeuronX Tracing API for Inference [Beta] Script for training stabilityai/stable-diffusion-2-1-base and runwayml/stable-diffusion-v1-5 models . See script [Beta] Script for training facebook/bart-large model. See script [Beta] Script for stabilityai/stable-diffusion-2-inpainting model inference. See script | Trn1/Trn1n,Inf2 |
Neuron Tools | New Neuron Distributed Event Tracing (NDET) tool to help visualize execution trace logs and diagnose errors in multi-node workloads. See Neuron Distributed Event Tracing (NDET) User Guide Support for multi-worker jobs in neuron-profile . See Neuron Profile User Guide See more at Neuron System Tools | Inf1/Inf2/Trn1/Trn1n |
Documentation Updates | Added setup guide instructions for AL2023 OS. See Setup Guide Added announcement for name change of Neuron Components. See Announcing Name Change for Neuron Components Added announcement for End of Support for PyTorch 1.10 . See Announcing End of Support for PyTorch Neuron version 1.10 Added announcement for End of Support for PyTorch 2.0 Beta. See Announcing End of Support for PyTorch NeuronX version 2.0 (beta) See more at Neuron Documentation Release Notes | Inf1, Inf2, Trn1/Trn1n |
Minor enhancements and bug fixes. | See Neuron Components Release Notes | Trn1/Trn1n , Inf2, Inf1 |
Known Issues and Limitations | See 2.16.0 Known Issues and Limitations | Trn1/Trn1n , Inf2, Inf1 |
Neuron SDK Release - November 17, 2023
Patch release to fix performance related issues when training through neuronx-nemo-megatron library. Refer to 2.15.2 compiler release notes for additional information.
Neuron SDK Release - November 9, 2023
Patch release to fix execution overhead issues in Neuron Runtime that were inadvertently introduced in 2.15 release. Refer to 2.15.1 runtime release notes for additional information.
Neuron SDK Release - October 26, 2023
What’s New
This
release adds support for PyTorch 2.0 (Beta), increases performance for
both training and inference workloads, adding ability to train models
like Llama-2-70B
using neuronx-distributed
. With this release, we are also adding pipeline parallelism support for neuronx-distributed
enabling full 3D parallelism support to easily scale training to large model sizes.
Neuron 2.15 also introduces support for training resnet50
, milesial/Pytorch-UNet
and deepmind/vision-perceiver-conv
models using torch-neuronx
, as well as new sample code for flan-t5-xl
model inference using neuronx-distributed
, in addition to other performance optimizations, minor enhancements and bug fixes.
What’s New | Details | Instances |
---|---|---|
Neuron Distributed (neuronx-distributed) for Training | Pipeline parallelism support. See API Reference Guide (neuronx-distributed ) , pp_developer_guide and pipeline_parallelism_overview Llama-2-70B model training script (sample script) (tutorial) Mixed precision support. See pp_developer_guide Support serialized checkpoint saving and loading using save_xser and load_xser parameters. See API Reference Guide (neuronx-distributed ) See more at Neuron Distributed Release Notes (neuronx-distributed) | Trn1/Trn1n |
Neuron Distributed (neuronx-distributed) for Inference | flan-t5-xl model inference script (tutorial) See more at Neuron Distributed Release Notes (neuronx-distributed) and API Reference Guide (neuronx-distributed ) | Inf2,Trn1/Trn1n |
Transformers Neuron (transformers-neuronx) for Inference | Serialization support for Llama, Llama-2, GPT2 and BLOOM models . See developer guide and tutorial See more at Transformers Neuron (transformers-neuronx) release notes | Inf2, Trn1/Trn1n |
PyTorch Neuron (torch-neuronx) | Introducing PyTorch 2.0 Beta support. See Introducing PyTorch 2.0 Support (Beta) . See llama-2-7b training , bert training and t5-3b inference samples. Scripts for training resnet50[Beta] , milesial/Pytorch-UNet[Beta] and deepmind/vision-perceiver-conv[Beta] models. | Trn1/Trn1n,Inf2 |
AWS Neuron Reference for Nemo Megatron library (neuronx-nemo-megatron) | Llama-2-70B model training sample using pipeline parallelism and tensor parallelism ( tutorial ) GPT-NeoX-20B model training using pipeline parallelism and tensor parallelism See more at AWS Neuron Reference for Nemo Megatron(neuronx-nemo-megatron) Release Notes and neuronx-nemo-megatron github repo | Trn1/Trn1n |
Neuron Compiler (neuronx-cc) | New llm-training option argument to --distribution_strategy compiler option for optimizations related to distributed training. See more at Neuron Compiler CLI Reference Guide (neuronx-cc) See more at Neuron Compiler (neuronx-cc) release notes | Inf2/Trn1/Trn1n |
Neuron Tools | alltoall Collective Communication operation, previously released in Neuron Collectives v2.15.13, was added as a testable operation in nccom-test. See NCCOM-TEST User Guide See more at Neuron System Tools | Inf1/Inf2/Trn1/Trn1n |
Documentation Updates | New App Note and Developer Guide about Activation memory reduction using sequence parallelism and activation recomputation in neuronx-distributed Added a new Model Samples and Tutorials summary page. See Model Samples and Tutorials Added Neuron SDK Classification guide. See Neuron Software Classification See more at Neuron Documentation Release Notes | Inf1, Inf2, Trn1/Trn1n |
Minor enhancements and bug fixes. | See Neuron Components Release Notes | Trn1/Trn1n , Inf2, Inf1 |
Release Artifacts | see Release Artifacts | Trn1/Trn1n , Inf2, Inf1 |
For more detailed release notes of the new features and resolved issues, see Neuron Components Release Notes.
To learn about the model architectures currently supported on Inf1, Inf2, Trn1 and Trn1n instances, please see Model Architecture Fit Guidelines.
[What’s New](https://awsdocs-neuron-staging.readthedocs-hosted.com/en/latest/release-notes/index.html#id7)Neuron SDK Release - September 26, 2023
This is a patch release that fixes compiler issues in certain configurations of Llama and Llama-2 model inference using transformers-neuronx. Refer to 2.14.1 release notes for additional information.
Neuron SDK Release - September 15, 2023
What’s New
This release introduces support for Llama-2-7B
model training and T5-3B
model inference using neuronx-distributed
. It also adds support for Llama-2-13B
model training using neuronx-nemo-megatron
. Neuron 2.14 also adds support for Stable Diffusion XL(Refiner and Base)
model inference using torch-neuronx
. This release also introduces other new features, performance optimizations, minor enhancements and bug fixes.
This release introduces the following:
Note
This release deprecates --model-type=transformer-inference
compiler flag. Users are highly encouraged to migrate to the --model-type=transformer
compiler flag.
What’s New | Details | Instances |
---|---|---|
AWS Neuron Reference for Nemo Megatron library (neuronx-nemo-megatron) | Llama-2-13B model training support ( tutorial ) ZeRO-1 Optimizer support that works with tensor parallelism and pipeline parallelism See more at AWS Neuron Reference for Nemo Megatron(neuronx-nemo-megatron) Release Notes and neuronx-nemo-megatron github repo | Trn1/Trn1n |
Neuron Distributed (neuronx-distributed) for Training | pad_model API to pad attention heads that do not divide by the number of NeuronCores, this will allow users to use any supported tensor-parallel degree. See API Reference Guide (neuronx-distributed ) Llama-2-7B model training support (sample script) (tutorial) See more at Neuron Distributed Release Notes (neuronx-distributed) and API Reference Guide (neuronx-distributed ) | Trn1/Trn1n |
Neuron Distributed (neuronx-distributed) for Inference | T5-3B model inference support (tutorial) pad_model API to pad attention heads that do not divide by the number of NeuronCores, this will allow users to use any supported tensor-parallel degree. See API Reference Guide (neuronx-distributed ) See more at Neuron Distributed Release Notes (neuronx-distributed) and API Reference Guide (neuronx-distributed ) | Inf2,Trn1/Trn1n |
Transformers Neuron (transformers-neuronx) for Inference | Introducing --model-type=transformer compiler flag that deprecates --model-type=transformer-inference compiler flag. See more at Transformers Neuron (transformers-neuronx) release notes | Inf2, Trn1/Trn1n |
PyTorch Neuron (torch-neuronx) | Performance optimizations in torch_neuronx.analyze API. See PyTorch Neuron (torch-neuronx) Analyze API for Inference Stable Diffusion XL(Refiner and Base) model inference support ( sample script) | Trn1/Trn1n,Inf2 |
Neuron Compiler (neuronx-cc) | New --O compiler option that enables different optimizations with tradeoff between faster model compile time and faster model execution. See more at Neuron Compiler CLI Reference Guide (neuronx-cc) See more at Neuron Compiler (neuronx-cc) release notes | Inf2/Trn1/Trn1n |
Neuron Tools | Neuron SysFS support for showing connected devices on trn1.32xl, inf2.24xl and inf2.48xl instances. See Neuron Sysfs User Guide See more at Neuron System Tools | Inf1/Inf2/Trn1/Trn1n |
Documentation Updates | Neuron Calculator now supports multiple model configurations for Tensor Parallel Degree computation. See Neuron Calculator Announcement to deprecate --model-type=transformer-inference flag. See Announcing deprecation for --model-type=transformer-inference compiler flag See more at Neuron Documentation Release Notes | Inf1, Inf2, Trn1/Trn1n |
Minor enhancements and bug fixes. | See Neuron Components Release Notes | Trn1/Trn1n , Inf2, Inf1 |
Release Artifacts | see Release Artifacts | Trn1/Trn1n , Inf2, Inf1 |
For more detailed release notes of the new features and resolved issues, see Neuron Components Release Notes.
To learn about the model architectures currently supported on Inf1, Inf2, Trn1 and Trn1n instances, please see Model Architecture Fit Guidelines.
[What’s New](https://awsdocs-neuron-staging.readthedocs-hosted.com/en/latest/release-notes/index.html#id7)Neuron SDK Release - September 01, 2023
This release adds support for Llama 2 model training (tutorial) using neuronx-nemo-megatron library, and adds support for Llama 2 model inference using transformers-neuronx library (tutorial) .
Please follow these instructions in setup guide to upgrade to latest Neuron release.
Note
Please install transformers-neuronx from https://pip.repos.neuron.amazonaws.com/ to get latest features and improvements.
This release does not support LLama 2 model with Grouped-Query Attention