Neuron SDK Release - September 15, 2023
What’s New
This release introduces support for Llama-2-7B
model training and T5-3B
model inference using neuronx-distributed
. It also adds support for Llama-2-13B
model training using neuronx-nemo-megatron
. Neuron 2.14 also adds support for Stable Diffusion XL(Refiner and Base)
model inference using torch-neuronx
. This release also introduces other new features, performance optimizations, minor enhancements and bug fixes.
This release introduces the following:
Note
This release deprecates --model-type=transformer-inference
compiler flag. Users are highly encouraged to migrate to the --model-type=transformer
compiler flag.
What’s New | Details | Instances |
---|---|---|
AWS Neuron Reference for Nemo Megatron library (neuronx-nemo-megatron) | Llama-2-13B model training support ( tutorial ) ZeRO-1 Optimizer support that works with tensor parallelism and pipeline parallelism See more at AWS Neuron Reference for Nemo Megatron(neuronx-nemo-megatron) Release Notes and neuronx-nemo-megatron github repo | Trn1/Trn1n |
Neuron Distributed (neuronx-distributed) for Training | pad_model API to pad attention heads that do not divide by the number of NeuronCores, this will allow users to use any supported tensor-parallel degree. See API Reference Guide (neuronx-distributed ) Llama-2-7B model training support (sample script) (tutorial) See more at Neuron Distributed Release Notes (neuronx-distributed) and API Reference Guide (neuronx-distributed ) | Trn1/Trn1n |
Neuron Distributed (neuronx-distributed) for Inference | T5-3B model inference support (tutorial) pad_model API to pad attention heads that do not divide by the number of NeuronCores, this will allow users to use any supported tensor-parallel degree. See API Reference Guide (neuronx-distributed ) See more at Neuron Distributed Release Notes (neuronx-distributed) and API Reference Guide (neuronx-distributed ) | Inf2,Trn1/Trn1n |
Transformers Neuron (transformers-neuronx) for Inference | Introducing --model-type=transformer compiler flag that deprecates --model-type=transformer-inference compiler flag. See more at Transformers Neuron (transformers-neuronx) release notes | Inf2, Trn1/Trn1n |
PyTorch Neuron (torch-neuronx) | Performance optimizations in torch_neuronx.analyze API. See PyTorch Neuron (torch-neuronx) Analyze API for Inference Stable Diffusion XL(Refiner and Base) model inference support ( sample script) | Trn1/Trn1n,Inf2 |
Neuron Compiler (neuronx-cc) | New --O compiler option that enables different optimizations with tradeoff between faster model compile time and faster model execution. See more at Neuron Compiler CLI Reference Guide (neuronx-cc) See more at Neuron Compiler (neuronx-cc) release notes | Inf2/Trn1/Trn1n |
Neuron Tools | Neuron SysFS support for showing connected devices on trn1.32xl, inf2.24xl and inf2.48xl instances. See Neuron Sysfs User Guide See more at Neuron System Tools | Inf1/Inf2/Trn1/Trn1n |
Documentation Updates | Neuron Calculator now supports multiple model configurations for Tensor Parallel Degree computation. See Neuron Calculator Announcement to deprecate --model-type=transformer-inference flag. See Announcing deprecation for --model-type=transformer-inference compiler flag See more at Neuron Documentation Release Notes | Inf1, Inf2, Trn1/Trn1n |
Minor enhancements and bug fixes. | See Neuron Components Release Notes | Trn1/Trn1n , Inf2, Inf1 |
Release Artifacts | see Release Artifacts | Trn1/Trn1n , Inf2, Inf1 |
For more detailed release notes of the new features and resolved issues, see Neuron Components Release Notes.
To learn about the model architectures currently supported on Inf1, Inf2, Trn1 and Trn1n instances, please see Model Architecture Fit Guidelines.
[What’s New](https://awsdocs-neuron-staging.readthedocs-hosted.com/en/latest/release-notes/index.html#id7)