Skip to content

Neuron SDK Release - June 14, 2023

Compare
Choose a tag to compare
@aws-mesharma aws-mesharma released this 28 Jun 22:08
· 131 commits to master since this release

What’s New

This release introduces Neuron Distributed, a new python library to simplify training and inference of large models, improving usability with features like S3 model caching, standalone profiler tool, support for Ubuntu22, as well as other new features, performance optimizations, minor enhancements and bug fixes. This release introduces the following:

What’s New Details Instances
New Features and Performance Enhancements in transformers-neuronx Support for int8 inference. See example at int8 weight storage supportImproved prompt context encoding performance. See more at Transformers Neuron (transformers-neuronx) Developer GuideImproved collective communications performance for Tensor Parallel inference on Inf2 and Trn1.See more at Transformers Neuron (transformers-neuronx) release notes Inf2, Trn1/Trn1n
Neuron Profiler Tool Support for as a stand alone tool to profile and get visualized insights on execution of models on Trainium and Inferentia devices.See more at Neuron Profile User Guide Inf1, Inf2, Trn1/Trn1n
Neuron Compilation Cache through S3 Support for sharing compiled models across Inf2 and Trn1 nodes through S3See more at PyTorch Neuron neuron_parallel_compile CLI (torch-neuronx) Inf2, Trn1/Trn1n
New script to scan a model for supported/unsupported operators Script to scan a model for supported/unsupported operators before training, scan output includes supported and unsupported operators at both XLA operators and PyTorch operators level.See a sample tutorial at Analyze for Training Tutorial Inf2, Trn1/Trn1n
Neuron Distributed Library [Experimental] New Python Library based on PyTorch enabling distributed training and inference of large models.Initial support for tensor-parallelism.See more at Neuron Distributed [Experimental] Inf2, Trn1/Trn1n
Neuron Calculator and Documentation Updates New Neuron Calculator Documentation section to help determine number of Neuron Cores needed for LLM Inference.Added App Note Generative LLM inference with NeuronSee more at Neuron Documentation Release Notes Inf1, Inf2, Trn1/Trn1n
Enhancements to Neuron SysFS Support for detailed breakdown of memory usage across the NeuronCoresSee more at Neuron Sysfs User Guide Inf1, Inf2, Trn1/Trn1n
Support for Ubuntu 22 See more at Setup Guide for setup instructions on Ubuntu22 Inf1, Inf2, Trn1/Trn1n
Minor enhancements and bug fixes. See Neuron Components Release Notes Trn1/Trn1n , Inf2, Inf1
Release Artifacts see Release Artifacts Trn1/Trn1n , Inf2, Inf1

For more detailed release notes of the new features and resolved issues, see Neuron Components Release Notes.

To learn about the model architectures currently supported on Inf1, Inf2, Trn1 and Trn1n instances, please see Model Architecture Fit Guidelines.