Releases · aws-neuron/aws-neuron-sdk

02 Apr 01:34

v2.18.0

af96728

Neuron SDK Release - April 1, 2024

What's New

Neuron 2.18 release introduces stable support (out of beta) for PyTorch 2.1, introduces new features and performance improvements to LLM training and inference, and updates Neuron DLAMIs and Neuron DLCs to support this release (Neuron 2.18).

Training highlights: LLM model training user experience using NeuronX Distributed (NxD) is improved by introducing asynchronous checkpointing. This release also adds support for auto partitioning pipeline parallelism in NxD and introduces Pipeline Parallelism in PyTorch Lightning Trainer (beta).

Inference highlights: Speculative Decoding support (beta) in TNx library improves LLM inference throughput and output token latency(TPOT) by up to 25% (for LLMs such as Llama-2-70B). TNx also improves weight loading performance by adding support for SafeTensor checkpoint format. Inference using Bucketing in PyTorch NeuronX and NeuronX Distributed is improved by introducing auto-bucketing feature. This release also adds a new sample for Mixtral-8x7B-v0.1 and mistralai/Mistral-7B-Instruct-v0.2 in TNx.

Neuron DLAMI and Neuron DLC support highlights: This release introduces new Multi Framework DLAMI for Ubuntu 22 that customers can use to easily get started with latest Neuron SDK on multiple frameworks that Neuron supports as well as SSM parameter support for DLAMIs to automate the retrieval of latest DLAMI ID in cloud automation flows. Support for new Neuron Training and Inference Deep Learning containers (DLCs) for PyTorch 2.1, as well as a new dedicated GitHub repository to host Neuron container dockerfiles and a public Neuron container registry to host Neuron container images.

Assets 2

14 Feb 02:27

aws-mesharma

v2.17.0

82ffe52

Neuron SDK Release - February 13, 2024

What's New

Neuron 2.17 release improves small collective communication operators (smaller than 16MB) by up to 30%, which improves large language model (LLM) Inference performance by up to 10%. This release also includes improvements in :ref:`Neuron Profiler <neuron-profile-ug>` and other minor enhancements and bug fixes.

For more detailed release notes of the new features and resolved issues, see :ref:`components-rn`.

To learn about the model architectures currently supported on Inf1, Inf2, Trn1 and Trn1n instances, please see :ref:`model_architecture_fit`.

Neuron Components Release Notes

Inf1, Trn1/Trn1n and Inf2 common packages

Component	Instance/s	Package/s	Details
Neuron Runtime	Trn1/Trn1n, Inf1, Inf2	Trn1/Trn1n: aws-neuronx-runtime-lib (.deb, .rpm) Inf1: Runtime is linked into the ML frameworks packages	:ref:`neuron-runtime-rn`
Neuron Runtime Driver	Trn1/Trn1n, Inf1, Inf2	aws-neuronx-dkms (.deb, .rpm)	:ref:`neuron-driver-release-notes`
Neuron System Tools	Trn1/Trn1n, Inf1, Inf2	aws-neuronx-tools (.deb, .rpm)	:ref:`neuron-tools-rn`
Containers	Trn1/Trn1n, Inf1, Inf2	aws-neuronx-k8-plugin (.deb, .rpm) aws-neuronx-k8-scheduler (.deb, .rpm) aws-neuronx-oci-hooks (.deb, .rpm)	:ref:`neuron-k8-rn` :ref:`neuron-containers-release-notes`
NeuronPerf (Inference only)	Trn1/Trn1n, Inf1, Inf2	neuronperf (.whl)	:ref:`neuronperf_rn`
TensorFlow Model Server Neuron	Trn1/Trn1n, Inf1, Inf2	tensorflow-model-server-neuronx (.deb, .rpm)	:ref:`tensorflow-modeslserver-neuronx-rn`
Neuron Documentation	Trn1/Trn1n, Inf1, Inf2		:ref:`neuron-documentation-rn`

Assets 2

18 Jan 23:51

aws-mesharma

v2.16.1

1351ee1

Neuron SDK Release - January 18, 2024

Patch release with compiler bug fixes, updates to Neuron Device Plugin and Neuron Kubernetes Scheduler .

Assets 2

22 Dec 03:34

aws-anantsh

v2.16.0

27461f5

Neuron SDK Release - Decemeber 21, 2023

What’s New

Neuron 2.16 adds support for Llama-2-70B training and inference, upgrades to PyTorch 2.1 (beta) and adds new support for PyTorch Lightning Trainer (beta) as well as performance improvements and adding Amazon Linux 2023 support.

Training highlights: NeuronX Distributed library LLM models training performance is improved by up to 15%. LLM model training user experience is improved by introducing support of PyTorch Lightning Trainer (beta), and a new model optimizer wrapper which will minimize the amount of changes needed to partition models using NeuronX Distributed primitives.

Inference highlights: PyTorch inference now allows to dynamically swap different fine-tuned weights for an already loaded model, as well as overall improvements of LLM inference throughput and latency with Transformers NeuronX. Two new reference model samples for LLama-2-70b and Mistral-7b model inference.

User experience: This release introduces two new capabilities: A new tool, Neuron Distributed Event Tracing (NDET) which improves debuggability, and the support of profiling collective communication operators in the Neuron Profiler tool.

More release content can be found in the table below and each component release notes.

What’s New	Details	Instances
Transformers NeuronX (transformers-neuronx) for Inference	[Beta] Support for Grouped Query Attention(GQA). See developer guide [Beta] Support for Llama-2-70b model inference using Grouped Query Attention. See tutorial [Beta] Support for Mistral-7B-Instruct-v0.1 model inference. See sample code See more at Transformers Neuron (transformers-neuronx) release notes	Inf2, Trn1/Trn1n
NeuronX Distributed (neuronx-distributed) for Training	[Beta] Support for PyTorch Lightning to train models using tensor parallelism and data parallelism . See api guide , developer guide and tutorial Support for Model and Optimizer Wrapper training API that handles the parallelization. See api guide and Developer guide for model and optimizer wrapper (neuronx-distributed ) New save_checkpoint and load_checkpoint APIs to save/load checkpoints during distributed training. See Developer guide for save/load checkpoint (neuronx-distributed ) Support for a new Query-Key-Value(QKV) module that provides the ability to replicate the Key Value heads and adds flexibility to use higher Tensor parallel degree during Training. See api guide and tutorial See more at Neuron Distributed Release Notes (neuronx-distributed)	Trn1/Trn1n
NeuronX Distributed (neuronx-distributed) for Inference	Support weight-deduplication amongst TP shards by giving ability to save weights separately than in NEFF files. See developer guide Llama-2-7B model inference script ([html] [notebook]) See more at Neuron Distributed Release Notes (neuronx-distributed) and API Reference Guide (neuronx-distributed )	Inf2,Trn1/Trn1n
PyTorch NeuronX (torch-neuronx)	[Beta]Support for] PyTorch 2.1. See Introducing PyTorch 2.1 Support (Beta) . See llama-2-13b inference sample. Support to separate out model weights from NEFF files and new replace_weights API to replace the separated weights. See PyTorch Neuron (torch-neuronx) Weight Replacement API for Inference and PyTorch NeuronX Tracing API for Inference [Beta] Script for training stabilityai/stable-diffusion-2-1-base and runwayml/stable-diffusion-v1-5 models . See script [Beta] Script for training facebook/bart-large model. See script [Beta] Script for stabilityai/stable-diffusion-2-inpainting model inference. See script	Trn1/Trn1n,Inf2
Neuron Tools	New Neuron Distributed Event Tracing (NDET) tool to help visualize execution trace logs and diagnose errors in multi-node workloads. See Neuron Distributed Event Tracing (NDET) User Guide Support for multi-worker jobs in neuron-profile . See Neuron Profile User Guide See more at Neuron System Tools	Inf1/Inf2/Trn1/Trn1n
Documentation Updates	Added setup guide instructions for AL2023 OS. See Setup Guide Added announcement for name change of Neuron Components. See Announcing Name Change for Neuron Components Added announcement for End of Support for PyTorch 1.10 . See Announcing End of Support for PyTorch Neuron version 1.10 Added announcement for End of Support for PyTorch 2.0 Beta. See Announcing End of Support for PyTorch NeuronX version 2.0 (beta) See more at Neuron Documentation Release Notes	Inf1, Inf2, Trn1/Trn1n
Minor enhancements and bug fixes.	See Neuron Components Release Notes	Trn1/Trn1n , Inf2, Inf1
Known Issues and Limitations	See 2.16.0 Known Issues and Limitations	Trn1/Trn1n , Inf2, Inf1

Assets 2

18 Nov 00:16

awsjoshir

v2.15.2

a448407

Neuron SDK Release - November 17, 2023

Patch release to fix performance related issues when training through neuronx-nemo-megatron library. Refer to 2.15.2 compiler release notes for additional information.

Assets 2

09 Nov 21:44

awsjoshir

v2.15.1

b8d6165

Neuron SDK Release - November 9, 2023

Patch release to fix execution overhead issues in Neuron Runtime that were inadvertently introduced in 2.15 release. Refer to 2.15.1 runtime release notes for additional information.

Assets 2

27 Oct 23:00

aws-mesharma

v2.15.0

340aa91

Neuron SDK Release - October 26, 2023

What’s New

This release adds support for PyTorch 2.0 (Beta), increases performance for both training and inference workloads, adding ability to train models like Llama-2-70B using neuronx-distributed. With this release, we are also adding pipeline parallelism support for neuronx-distributed enabling full 3D parallelism support to easily scale training to large model sizes. Neuron 2.15 also introduces support for training resnet50, milesial/Pytorch-UNet and deepmind/vision-perceiver-conv models using torch-neuronx, as well as new sample code for flan-t5-xl model inference using neuronx-distributed, in addition to other performance optimizations, minor enhancements and bug fixes.

What’s New	Details	Instances
Neuron Distributed (neuronx-distributed) for Training	Pipeline parallelism support. See API Reference Guide (neuronx-distributed ) , pp_developer_guide and pipeline_parallelism_overview Llama-2-70B model training script (sample script) (tutorial) Mixed precision support. See pp_developer_guide Support serialized checkpoint saving and loading using save_xser and load_xser parameters. See API Reference Guide (neuronx-distributed ) See more at Neuron Distributed Release Notes (neuronx-distributed)	Trn1/Trn1n
Neuron Distributed (neuronx-distributed) for Inference	flan-t5-xl model inference script (tutorial) See more at Neuron Distributed Release Notes (neuronx-distributed) and API Reference Guide (neuronx-distributed )	Inf2,Trn1/Trn1n
Transformers Neuron (transformers-neuronx) for Inference	Serialization support for Llama, Llama-2, GPT2 and BLOOM models . See developer guide and tutorial See more at Transformers Neuron (transformers-neuronx) release notes	Inf2, Trn1/Trn1n
PyTorch Neuron (torch-neuronx)	Introducing PyTorch 2.0 Beta support. See Introducing PyTorch 2.0 Support (Beta) . See llama-2-7b training , bert training and t5-3b inference samples. Scripts for training resnet50[Beta] , milesial/Pytorch-UNet[Beta] and deepmind/vision-perceiver-conv[Beta] models.	Trn1/Trn1n,Inf2
AWS Neuron Reference for Nemo Megatron library (neuronx-nemo-megatron)	Llama-2-70B model training sample using pipeline parallelism and tensor parallelism ( tutorial ) GPT-NeoX-20B model training using pipeline parallelism and tensor parallelism See more at AWS Neuron Reference for Nemo Megatron(neuronx-nemo-megatron) Release Notes and neuronx-nemo-megatron github repo	Trn1/Trn1n
Neuron Compiler (neuronx-cc)	New llm-training option argument to --distribution_strategy compiler option for optimizations related to distributed training. See more at Neuron Compiler CLI Reference Guide (neuronx-cc) See more at Neuron Compiler (neuronx-cc) release notes	Inf2/Trn1/Trn1n
Neuron Tools	alltoall Collective Communication operation, previously released in Neuron Collectives v2.15.13, was added as a testable operation in nccom-test. See NCCOM-TEST User Guide See more at Neuron System Tools	Inf1/Inf2/Trn1/Trn1n
Documentation Updates	New App Note and Developer Guide about Activation memory reduction using sequence parallelism and activation recomputation in neuronx-distributed Added a new Model Samples and Tutorials summary page. See Model Samples and Tutorials Added Neuron SDK Classification guide. See Neuron Software Classification See more at Neuron Documentation Release Notes	Inf1, Inf2, Trn1/Trn1n
Minor enhancements and bug fixes.	See Neuron Components Release Notes	Trn1/Trn1n , Inf2, Inf1
Release Artifacts	see Release Artifacts	Trn1/Trn1n , Inf2, Inf1

For more detailed release notes of the new features and resolved issues, see Neuron Components Release Notes.

To learn about the model architectures currently supported on Inf1, Inf2, Trn1 and Trn1n instances, please see Model Architecture Fit Guidelines.

[What’s New](https://awsdocs-neuron-staging.readthedocs-hosted.com/en/latest/release-notes/index.html#id7)

Assets 2

26 Sep 23:57

awsjoshir

v2.14.1

a390205

Neuron SDK Release - September 26, 2023

This is a patch release that fixes compiler issues in certain configurations of Llama and Llama-2 model inference using transformers-neuronx. Refer to 2.14.1 release notes for additional information.

Assets 2

16 Sep 04:22

aws-mesharma

v2.14.0

4c317ca

Neuron SDK Release - September 15, 2023

What’s New

This release introduces support for Llama-2-7B model training and T5-3B model inference using neuronx-distributed. It also adds support for Llama-2-13B model training using neuronx-nemo-megatron. Neuron 2.14 also adds support for Stable Diffusion XL(Refiner and Base) model inference using torch-neuronx . This release also introduces other new features, performance optimizations, minor enhancements and bug fixes. This release introduces the following:

Note

This release deprecates --model-type=transformer-inference compiler flag. Users are highly encouraged to migrate to the --model-type=transformer compiler flag.

What’s New	Details	Instances
AWS Neuron Reference for Nemo Megatron library (neuronx-nemo-megatron)	Llama-2-13B model training support ( tutorial ) ZeRO-1 Optimizer support that works with tensor parallelism and pipeline parallelism See more at AWS Neuron Reference for Nemo Megatron(neuronx-nemo-megatron) Release Notes and neuronx-nemo-megatron github repo	Trn1/Trn1n
Neuron Distributed (neuronx-distributed) for Training	pad_model API to pad attention heads that do not divide by the number of NeuronCores, this will allow users to use any supported tensor-parallel degree. See API Reference Guide (neuronx-distributed ) Llama-2-7B model training support (sample script) (tutorial) See more at Neuron Distributed Release Notes (neuronx-distributed) and API Reference Guide (neuronx-distributed )	Trn1/Trn1n
Neuron Distributed (neuronx-distributed) for Inference	T5-3B model inference support (tutorial) pad_model API to pad attention heads that do not divide by the number of NeuronCores, this will allow users to use any supported tensor-parallel degree. See API Reference Guide (neuronx-distributed ) See more at Neuron Distributed Release Notes (neuronx-distributed) and API Reference Guide (neuronx-distributed )	Inf2,Trn1/Trn1n
Transformers Neuron (transformers-neuronx) for Inference	Introducing --model-type=transformer compiler flag that deprecates --model-type=transformer-inference compiler flag. See more at Transformers Neuron (transformers-neuronx) release notes	Inf2, Trn1/Trn1n
PyTorch Neuron (torch-neuronx)	Performance optimizations in torch_neuronx.analyze API. See PyTorch Neuron (torch-neuronx) Analyze API for Inference Stable Diffusion XL(Refiner and Base) model inference support ( sample script)	Trn1/Trn1n,Inf2
Neuron Compiler (neuronx-cc)	New --O compiler option that enables different optimizations with tradeoff between faster model compile time and faster model execution. See more at Neuron Compiler CLI Reference Guide (neuronx-cc) See more at Neuron Compiler (neuronx-cc) release notes	Inf2/Trn1/Trn1n
Neuron Tools	Neuron SysFS support for showing connected devices on trn1.32xl, inf2.24xl and inf2.48xl instances. See Neuron Sysfs User Guide See more at Neuron System Tools	Inf1/Inf2/Trn1/Trn1n
Documentation Updates	Neuron Calculator now supports multiple model configurations for Tensor Parallel Degree computation. See Neuron Calculator Announcement to deprecate --model-type=transformer-inference flag. See Announcing deprecation for --model-type=transformer-inference compiler flag See more at Neuron Documentation Release Notes	Inf1, Inf2, Trn1/Trn1n
Minor enhancements and bug fixes.	See Neuron Components Release Notes	Trn1/Trn1n , Inf2, Inf1
Release Artifacts	see Release Artifacts	Trn1/Trn1n , Inf2, Inf1

For more detailed release notes of the new features and resolved issues, see Neuron Components Release Notes.

To learn about the model architectures currently supported on Inf1, Inf2, Trn1 and Trn1n instances, please see Model Architecture Fit Guidelines.

[What’s New](https://awsdocs-neuron-staging.readthedocs-hosted.com/en/latest/release-notes/index.html#id7)

Assets 2

02 Sep 01:30

aws-mesharma

v2.13.2

b7e4854

Neuron SDK Release - September 01, 2023

This release adds support for Llama 2 model training (tutorial) using neuronx-nemo-megatron library, and adds support for Llama 2 model inference using transformers-neuronx library (tutorial) .

Please follow these instructions in setup guide to upgrade to latest Neuron release.

Note

Please install transformers-neuronx from https://pip.repos.neuron.amazonaws.com/ to get latest features and improvements.

This release does not support LLama 2 model with Grouped-Query Attention

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What's New

What's New

Neuron Components Release Notes

Inf1, Trn1/Trn1n and Inf2 common packages

What’s New

What’s New

What’s New

Releases: aws-neuron/aws-neuron-sdk

Neuron SDK Release - April 1, 2024

What's New

Neuron SDK Release - February 13, 2024

What's New

Neuron Components Release Notes

Inf1, Trn1/Trn1n and Inf2 common packages

Neuron SDK Release - January 18, 2024

Neuron SDK Release - Decemeber 21, 2023

What’s New

Neuron SDK Release - November 17, 2023

Neuron SDK Release - November 9, 2023

Neuron SDK Release - October 26, 2023

What’s New

Neuron SDK Release - September 26, 2023

Neuron SDK Release - September 15, 2023

What’s New

Neuron SDK Release - September 01, 2023