Releases · aws-neuron/aws-neuron-sdk

30 Aug 04:34

v2.13.1

02b8afc

Neuron SDK Release - August 29, 2023

This release adds support for Llama 2 model training (tutorial) using neuronx-nemo-megatron library, and adds support for Llama 2 model inference using transformers-neuronx library (tutorial) .

Please follow these instructions in setup guide <setup-guide-index> to upgrade to latest Neuron release.

Note

Please install transformers-neuronx from https://pip.repos.neuron.amazonaws.com/ to get latest features and improvements.

This release does not support LLama 2 model with Grouped-Query Attention

Assets 2

29 Aug 18:28

aws-mesharma

v2.13.0

f25691d

Neuron SDK Release - August 28, 2023

What’s New

This release introduces support for GPT-NeoX 20B model training in neuronx-distributed including Zero-1 optimizer capability. It also adds support for Stable Diffusion XL and CLIP models inference in torch-neuronx. Neuron 2.13 also introduces AWS Neuron Reference for Nemo Megatron library supporting distributed training of LLMs like GPT-3 175B. This release also introduces other new features, performance optimizations, minor enhancements and bug fixes. This release introduces the following:

What’s New	Details	Instances
AWS Neuron Reference for Nemo Megatron library	Modified versions of the open-source packages NeMo and Apex that have been adapted for use with AWS Neuron and AWS EC2 Trn1 instances. GPT-3 model training support ( tutorial ) See more at neuronx-nemo-megatron github repo	Trn1/Trn1n
Transformers Neuron (transformers-neuronx) for Inference	Latency optimizations for Llama and GPT-2 models inference. Neuron Persistent Cache support (developer guide) See more at Transformers Neuron (transformers-neuronx) release notes	Inf2, Trn1/Trn1n
Neuron Distributed (neuronx-distributed) for Training	Now Stable, removed Experimental support ZeRO-1 Optimizer support with tensor parallel. (tutorial) Sequence Parallel support. (api guide) GPT-NeoX model training support. (sample script) (tutorial) See more at Neuron Distributed Release Notes (neuronx-distributed) and API Reference Guide (neuronx-distributed )	Trn1/Trn1n
Neuron Distributed (neuronx-distributed) for Inference	KV Cache Support for LLM Inference (tutorial) (release notes)	Inf2,Trn1/Trn1n
PyTorch Neuron (torch-neuronx)	Seedable dropout enabled by default for training camembert-base training script. (sample script) New models inference support that include Stable Diffusion XL , CLIP (clip-vit-base-patch32 , clip-vit-large-patch14 ) , Vision Perceiver , Language Perceiver and T5	Trn1/Trn1n,Inf2
Neuron Tools	New data types support for Neuron Collective Communication Test Utility (NCCOM-TEST) –check option: fp16, bf16, (u)int8, (u)int16, and (u)int32 Neuron SysFS support for FLOP count(flop_count) and connected Neuron Device ids (connected_devices). See Neuron Sysfs User Guide See more at Neuron System Tools	Inf1/Inf2/Trn1/Trn1n
Neuron Runtime	Runtime version and Capture Time support to NTFF Async DMA copies support to improve Neuron Device copy times for all instance types Logging and error messages improvements for Collectives timeouts and when loading NEFFs. See more at Neuron Runtime Release Notes	Inf1, Inf2, Trn1/Trn1n
End of Support Announcements and Documentation Updates	Announcing End of support for AWS Neuron reference for Megatron-LM starting Neuron 2.13. See more at AWS Neuron reference for Megatron-LM no longer supported Announcing end of support for torch-neuron version 1.9 starting Neuron 2.14. See more at Announcing end of support for torch-neuron version 1.9 Added TensorFlow 2.x (tensorflow-neuronx) analyze_model API section. See more at TensorFlow 2.x (tensorflow-neuronx) analyze_model API Upgraded numpy version to 1.21.6 in various training scripts for Text Classification Updated bert-japanese training Script to use multilingual-sentiments dataset. See `hf-bert-jp <https://github.com/aws-neuron/aws-neuron-samples/tree/master/torch-neuronx/training/hf_bert_jp>` _ See more at Neuron Documentation Release Notes	Inf1, Inf2, Trn1/Trn1n
Minor enhancements and bug fixes.	See Neuron Components Release Notes	Trn1/Trn1n , Inf2, Inf1
Known Issues and Limitations	See 2.13.0 Known Issues and Limitations	Trn1/Trn1n , Inf2, Inf1
Release Artifacts	see Release Artifacts	Trn1/Trn1n , Inf2, Inf1

For more detailed release notes of the new features and resolved issues, see Neuron Components Release Notes.

[What’s New](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/index.html#id10)

Assets 2

19 Aug 22:14

aws-mesharma

v2.12.2

782980d

Neuron SDK Release - August 19, 2023

Patch release to fix a jemalloc conflict for all Neuron customers that use Ubuntu 22. The previous releases shipped with a dependency on jemalloc that may lead to compilation failures in Ubuntu 22 only. Please follow these instructions in setup guide to upgrade to latest Neuron release.

Assets 2

10 Aug 02:14

aws-mesharma

v2.12.1

7bc5dd3

Neuron SDK Release - August 9, 2023

Patch release to improve reliability of Neuron Runtime for customers using memory constrained instances. The Neuron Runtime has reduced the contiguous memory requirement for initializing the Neuron Cores associated with applications.
This reduction allows bringup when only small amounts of contiguous memory remain on an instance.

Assets 2

20 Jul 01:20

aws-mesharma

v2.12.0

12ad249

Neuron SDK Release - July 19, 2023

What’s New

This release introduces ZeRO-1 optimizer for model training in torch-neuronx , introduces experimental support for GPT-NeoX, BLOOM , Llama and Llama 2(coming soon) models in transformers-neuronx. This release also adds support for model inference serving on Triton Inference Server for Inf2 & Trn1 instances, lazy_load API and async_load API for model loading in torch-neuronx, as well as other new features, performance optimizations, minor enhancements and bug fixes. This release introduces the following:

What’s New	Details	Instances
ZeRO-1 optimizer for model training in torch-neuronx	Support of ZeRO-Stage-1 optimizer ( ZeroRedundancyOptimizer() API) for training models using torch-neuronx See tutorial at ZeRO-1 Tutorial	Inf2, Trn1/Trn1n
Support for new models and Enhancements in transformers-neuronx	[Experimental] Support for inference of GPT-NeoX, BLOOM and Llama models. [Experimental] Support for Llama 2 coming soon. Please monitor the transformers-neuronx repository for updates. Removed constraints on tp_degree in tensor-parallel configurations for GPT2, OPT, and BLOOM . See more at Transformers Neuron (transformers-neuronx) release notes Added multi-query / multi-group attention support for GPT2. See more at Transformers Neuron (transformers-neuronx) release notes	Inf2, Trn1/Trn1n
Support for Inf2 and Trn1 instances on Triton Inference Server	Support for Model Inference serving on Triton for Inf2 and Trn1 instances. See more at Triton Server Python Backend See tutorial at Triton on SageMaker - Deploying on Inf2	Inf2, Trn1
Support for new computer vision models	Performance optimizations in Stable Diffusion 2.1 model script and added [experimental] support for Stable Diffusion 1.5 models. [Experimental] Script for training CLIP model for Image Classification. [Experimental] Script for inference of Multimodal perceiver model Please check aws-neuron-samples repository	Inf2, Trn1/Trn1n
New Features in neuronx-distributed for training	Added parallel cross entropy loss function. See more at API Reference Guide for Tensor Parallelism (neuronx-distributed )	Trn1/Trn1n
lazy_load and async_load API for model loading in inference and performance enhancements in torch-neuronx	Added lazy_load and async_load API to accelerate model loading for Inference. See more at PyTorch Neuron (torch-neuronx) Lazy and Asynchronous Loading API Optimize DataParallel API to load onto multiple cores simultaneously when device IDs specified are consecutive. See more at PyTorch Neuron (torch-neuronx) release notes	Inf2, Trn1/Trn1n
[Experimental]Asynchronous Execution support and Enhancements in Neuron Runtime	Added experimental asynchronous execution feature which can reduce latency by roughly 12% for training workloads. See more at Neuron Runtime Configuration AllReduce with All-to-all communication pattern enabled for 16 ranks on TRN1/TRN1N within the instance (intranode) See more at Neuron Runtime Release Notes	Inf1, Inf2, Trn1/Trn1n
Support for distribution_strategy compiler option in neuronx-cc	Support for optional --distribution_strategy compiler option to enable compiler specific optimizations based on distribution strategy used. See more at Neuron Compiler CLI Reference Guide (neuronx-cc)	Inf2, Trn1/Trn1n
New Micro Benchmarking Performance User Guide and Documentation Updates	Added best practices user guide for benchmarking performance of Neuron devices. See more at Benchmarking Guide and Helper scripts Announcing end of support for Ubuntu 18. See more at Announcing end of support for Ubuntu 18 Removed support for Distributed Data Parallel(DDP) Tutorial. Improved sidebar navigation in Documentation. See more at Neuron Documentation Release Notes	Inf1, Inf2, Trn1/Trn1n
Minor enhancements and bug fixes.	See Neuron Components Release Notes	Trn1/Trn1n , Inf2, Inf1
Known Issues and Limitations	See 2.12.0 Known Issues and Limitations	Trn1/Trn1n , Inf2, Inf1
Release Artifacts	see Release Artifacts	Trn1/Trn1n , Inf2, Inf1

For more detailed release notes of the new features and resolved issues, see Neuron Components Release Notes.

To learn about the model architectures currently supported on Inf1, Inf2, Trn1 and Trn1n instances, please see Model Architecture Fit Guidelines.

Assets 2

28 Jun 22:08

aws-mesharma

v2.11.0

b3d340a

Neuron SDK Release - June 14, 2023

What’s New

This release introduces Neuron Distributed, a new python library to simplify training and inference of large models, improving usability with features like S3 model caching, standalone profiler tool, support for Ubuntu22, as well as other new features, performance optimizations, minor enhancements and bug fixes. This release introduces the following:

What’s New	Details	Instances
New Features and Performance Enhancements in transformers-neuronx	Support for int8 inference. See example at int8 weight storage supportImproved prompt context encoding performance. See more at Transformers Neuron (transformers-neuronx) Developer GuideImproved collective communications performance for Tensor Parallel inference on Inf2 and Trn1.See more at Transformers Neuron (transformers-neuronx) release notes	Inf2, Trn1/Trn1n
Neuron Profiler Tool	Support for as a stand alone tool to profile and get visualized insights on execution of models on Trainium and Inferentia devices.See more at Neuron Profile User Guide	Inf1, Inf2, Trn1/Trn1n
Neuron Compilation Cache through S3	Support for sharing compiled models across Inf2 and Trn1 nodes through S3See more at PyTorch Neuron neuron_parallel_compile CLI (torch-neuronx)	Inf2, Trn1/Trn1n
New script to scan a model for supported/unsupported operators	Script to scan a model for supported/unsupported operators before training, scan output includes supported and unsupported operators at both XLA operators and PyTorch operators level.See a sample tutorial at Analyze for Training Tutorial	Inf2, Trn1/Trn1n
Neuron Distributed Library [Experimental]	New Python Library based on PyTorch enabling distributed training and inference of large models.Initial support for tensor-parallelism.See more at Neuron Distributed [Experimental]	Inf2, Trn1/Trn1n
Neuron Calculator and Documentation Updates	New Neuron Calculator Documentation section to help determine number of Neuron Cores needed for LLM Inference.Added App Note Generative LLM inference with NeuronSee more at Neuron Documentation Release Notes	Inf1, Inf2, Trn1/Trn1n
Enhancements to Neuron SysFS	Support for detailed breakdown of memory usage across the NeuronCoresSee more at Neuron Sysfs User Guide	Inf1, Inf2, Trn1/Trn1n
Support for Ubuntu 22	See more at Setup Guide for setup instructions on Ubuntu22	Inf1, Inf2, Trn1/Trn1n
Minor enhancements and bug fixes.	See Neuron Components Release Notes	Trn1/Trn1n , Inf2, Inf1
Release Artifacts	see Release Artifacts	Trn1/Trn1n , Inf2, Inf1

For more detailed release notes of the new features and resolved issues, see Neuron Components Release Notes.

To learn about the model architectures currently supported on Inf1, Inf2, Trn1 and Trn1n instances, please see Model Architecture Fit Guidelines.

Assets 2

02 May 14:07

awsjoshir

v2.10.0

f5d2d79

Neuron SDK Release - May 1, 2023

What’s New

This release introduces new features, performance optimizations, minor enhancements and bug fixes. This release introduces the following:

What’s New	Details	Instances
Initial support for computer vision models inference	Added Stable Diffusion 2.1 model script for Text to Image Generation, Added VGG model script for Image Classification Task, Added UNet model script for Image Segmentation Task, Please check aws-neuron-samples repository	Inf2, Trn1/Trn1n
Profiling support in PyTorch Neuron(torch-neuronx) for Inference with TensorBoard	See more at Profiling PyTorch Neuron (torch-neuronx) with TensorBoard	Inf2, Trn1/Trn1n
New Features and Performance Enhancements in transformers-neuronx	Support for the HuggingFace generate function, Model Serialization support including model saving, loading, and weight swapping, Improved prompt context encoding performance. See transformers_neuronx_readme for examples and usage, See more at Transformers Neuron (transformers-neuronx) release notes	Inf2, Trn1/Trn1n
Support models larger than 2GB in TensorFlow 2.x Neuron (tensorflow-neuronx)	See Special Flags for details. (tensorflow-neuronx)	Trn1/Trn1n, Inf2
Support models larger than 2GB in TensorFlow 2.x Neuron (tensorflow-neuron)	See Special Flags for details. (tensorflow-neuron)	Inf1
Performance Enhancements in PyTorch C++ Custom Operators [Experimental]	Support for using multiple GPSIMD Cores in Custom C++ Operators, See Custom Operators (Experimental)	Trn1/Trn1n
Weight Deduplication Feature (Inf1)	Support for Sharing weights when loading multiple instance versions of the same model on different NeuronCores.See more at Neuron Runtime Configuration	Inf1
nccom-test - Collective Communication Benchmarking Tool	Supports enabling benchmarking sweeps on various Neuron Collective Communication operations. See NCCOM-TEST (Beta) for more details.	Trn1/Trn1n , Inf2
Minor enhancements and bug fixes.	See Neuron Components Release Notes	Trn1/Trn1n , Inf2, Inf1
Release Artifacts	see Release Artifacts	Trn1/Trn1n , Inf2, Inf1

For more detailed release notes of the new features and resolved issues, see Neuron Components Release Notes.

To learn about the model architectures currently supported on Inf1, Inf2, Trn1 and Trn1n instances, please see Model Architecture Fit Guidelines.

Assets 2

19 Apr 18:42

aws-mesharma

v2.9.1

7e4a689

Neuron SDK Release - April 19, 2023

Minor patch release to add support for deserialized torchscript model compilation and support for multi-node training in EKS. Fixes included in this release are critical to enable training and deploying models with Amazon Sagemaker or Amazon EKS.

Assets 2

29 Mar 00:41

aws-mesharma

v2.9.0

945dda8

Neuron SDK Release - March 28, 2023

What’s New

This release adds support for EC2 Trn1n instances, introduces new features, performance optimizations, minor enhancements and bug fixes. This release introduces the following:

What’s New	Details	Instances
Support for EC2 Trn1n instances	Updated Neuron Runtime for Trn1n instances Overall documentation update to include Trn1n instances	Trn1n
New Analyze API in PyTorch Neuron (torch-neuronx)	A new API that return list of supported and unsupported PyTorch operators for a model. See PyTorch Neuron (torch-neuronx) Analyze API for Inference	Trn1, Inf2
Support models that are larger than 2GB in PyTorch Neuron (torch-neuron) on Inf1	See separate_weights flag to torch_neuron.trace() to support models that are larger than 2GB	Inf1
Performance Improvements	Up to 10% higher throughput when training GPT3 6.7B model on multi-node	Trn1
Dynamic Batching support in TensorFlow 2.x Neuron (tensorflow-neuronx)	See Special Flags for details.	Trn1, Inf2
NeuronPerf support for Trn1/Inf2 instances	Added Trn1/Inf2 support for PyTorch Neuron (torch-neuronx) and TensorFlow 2.x Neuron (tensorflow-neuronx)	Trn1, Inf2
Hierarchical All-Reduce and Reduce-Scatter collective communication	Added support for hierarchical All-Reduce and Reduce-Scatter in Neuron Runtime to enable better scalability of distributed workloads .	Trn1, Inf2
New Tutorials added	Added tutorial to fine-tune T5 model Added tutorial to demonstrate use of Libtorch with PyTorch Neuron (torch-neuronx) for inference [html]	Trn1, Inf2
New sample scripts for 3D Parallelism training in Megatron-LM reference for Neuron	See GPT3-65b and GPT3-175b examples.	Trn1, Inf2
Minor enhancements and bug fixes.	See Neuron Components Release Notes	Trn1, Inf2, Inf1
Release included packages	see Release Content	Trn1, Inf2, Inf1

For more detailed release notes of the new features and resolved issues, see Neuron Components Release Notes.

To learn about the model architectures currently supported on Inf1, Inf2, Trn1 and Trn1n instances, please see Model Architecture Fit Guidelines.

Assets 2

24 Feb 23:51

aws-mesharma

v2.8.0

93e9f5d

Neuron SDK Release - February 24, 2023

What’s New

This release adds support for EC2 Inf2 instances, introduces initial inference support with TensorFlow 2.x Neuron (tensorflow-neuronx) on Trn1 and Inf2, and introduces minor enhancements and bug fixes.

This release introduces the following:

What’s New	Details
Support for EC2 Inf2 instances	Inference support for Inf2 instances in PyTorch Neuron (torch-neuronx) Inference support for Inf2 instances in TensorFlow 2.x Neuron (tensorflow-neuronx) Overall documentation update to include Inf2 instances
TensorFlow 2.x Neuron (tensorflow-neuronx) support	This releases introduces initial inference support with TensorFlow 2.x Neuron (tensorflow-neuronx) on Trn1 and Inf2
New Neuron GitHub samples	New sample scripts for deploying LLM models with transformer-neuronx under aws-neuron-samples GitHub repository. New sample scripts for deploying models with torch-neuronx under aws-neuron-samples repository GitHub repository.
Minor enhancements and bug fixes.	See Neuron Components Release Notes
Release included packages	see Release Content

For more detailed release notes of the new features and resolved issues, see Neuron Components Release Notes.

[What’s New](https://awsdocs-neuron-staging.readthedocs-hosted.com/en/release_2.8.0/release-notes/index.html#id6)

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What’s New

What’s New

What’s New

What’s New

What’s New

What’s New

Releases: aws-neuron/aws-neuron-sdk

Neuron SDK Release - August 29, 2023

Neuron SDK Release - August 28, 2023

What’s New

Neuron SDK Release - August 19, 2023

Neuron SDK Release - August 9, 2023

Neuron SDK Release - July 19, 2023

What’s New

Neuron SDK Release - June 14, 2023

What’s New

Neuron SDK Release - May 1, 2023

What’s New

Neuron SDK Release - April 19, 2023

Neuron SDK Release - March 28, 2023

What’s New

Neuron SDK Release - February 24, 2023

What’s New