Neuron SDK Release - June 14, 2023
What’s New
This release introduces Neuron Distributed, a new python library to simplify training and inference of large models, improving usability with features like S3 model caching, standalone profiler tool, support for Ubuntu22, as well as other new features, performance optimizations, minor enhancements and bug fixes. This release introduces the following:
What’s New | Details | Instances |
---|---|---|
New Features and Performance Enhancements in transformers-neuronx | Support for int8 inference. See example at int8 weight storage supportImproved prompt context encoding performance. See more at Transformers Neuron (transformers-neuronx) Developer GuideImproved collective communications performance for Tensor Parallel inference on Inf2 and Trn1.See more at Transformers Neuron (transformers-neuronx) release notes | Inf2, Trn1/Trn1n |
Neuron Profiler Tool | Support for as a stand alone tool to profile and get visualized insights on execution of models on Trainium and Inferentia devices.See more at Neuron Profile User Guide | Inf1, Inf2, Trn1/Trn1n |
Neuron Compilation Cache through S3 | Support for sharing compiled models across Inf2 and Trn1 nodes through S3See more at PyTorch Neuron neuron_parallel_compile CLI (torch-neuronx) | Inf2, Trn1/Trn1n |
New script to scan a model for supported/unsupported operators | Script to scan a model for supported/unsupported operators before training, scan output includes supported and unsupported operators at both XLA operators and PyTorch operators level.See a sample tutorial at Analyze for Training Tutorial | Inf2, Trn1/Trn1n |
Neuron Distributed Library [Experimental] | New Python Library based on PyTorch enabling distributed training and inference of large models.Initial support for tensor-parallelism.See more at Neuron Distributed [Experimental] | Inf2, Trn1/Trn1n |
Neuron Calculator and Documentation Updates | New Neuron Calculator Documentation section to help determine number of Neuron Cores needed for LLM Inference.Added App Note Generative LLM inference with NeuronSee more at Neuron Documentation Release Notes | Inf1, Inf2, Trn1/Trn1n |
Enhancements to Neuron SysFS | Support for detailed breakdown of memory usage across the NeuronCoresSee more at Neuron Sysfs User Guide | Inf1, Inf2, Trn1/Trn1n |
Support for Ubuntu 22 | See more at Setup Guide for setup instructions on Ubuntu22 | Inf1, Inf2, Trn1/Trn1n |
Minor enhancements and bug fixes. | See Neuron Components Release Notes | Trn1/Trn1n , Inf2, Inf1 |
Release Artifacts | see Release Artifacts | Trn1/Trn1n , Inf2, Inf1 |
For more detailed release notes of the new features and resolved issues, see Neuron Components Release Notes.
To learn about the model architectures currently supported on Inf1, Inf2, Trn1 and Trn1n instances, please see Model Architecture Fit Guidelines.