Releases · aws-neuron/aws-neuron-sdk

This release brings significant throughput improvements to running inference on a variety of models; for example Resnet50 throughput is increased by 63% (measured 1800 img/sec on inf1.xlarge up from 1100/sec, and measured 2300/sec on inf1.2xlarge). BERTbase throughput has improved by 36% compared to the re:Invent launch (up to 26100seq/sec from 19200seq/sec on inf1.24xlarge), and BERTlarge improved by 15% (230 seq/sec, compared to 200 running on inf1.2xlarge). In addition to the performance boost, this release includes various bug fixes as well as additions to the GitHub with new tech notes diving deep on how Neuron performance features work and overall improved documentation following customer input.

We continue to work on new features and improving performance further, to stay up to date follow this repository, and watch the AWS Neuron developer forum.

Important to know:

Size of neural network. The current Neuron compiler release has a limitation in terms of the size of neural network it could effectively optimize for. The size of neural network is influenced by a number of factors including: a) type of neural network (CNN, LSTM, MLP) , b) number of layers, c) sizes of input (dimension of the tensors, batch size, ...). As a result, we limit the sizes of CNN models like ResNet to have an input size limit of 480x480 fp16/32, batch size=4; LSTM models like GNMT to have a time step limit of 900; MLP models like BERT to have input size limit of sequence length=128, batch=8.
Computer-vision object detection and segmentation models are not yet supported.
INT8 data type is not currently supported by the Neuron compiler.
Neuron does not support TensorFlow 2 or PyTorch 1.4.0.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Important to know:

Releases: aws-neuron/aws-neuron-sdk

Neuron SDK Release - January 27, 2020

Important to know:

Neuron SDK Release - December 20, 2019