Intel® Optimizations for TensorFlow 2.9.1
This release of Intel® Optimized TensorFlow is based on the TensorFlow v2.9.1 tag and is built with support for oneDNN (oneAPI Deep Neural Network Library). For features and fixes that were introduced in TensorFlow 2.9.1, please see the TensorFlow 2.9.1 release notes also. This build was built from v2.9.1.
This release note covers both Intel® Optimizations for TensorFlow* and official TensorFlow v2.9.1 which has oneDNN optimizations enabled by default on Linux x86 packages and for CPUs with neural-network-focused hardware features such as AVX512_VNNI, AVX512_BF16, AMX, and others, which are found on Intel Cascade Lake and newer CPUs.
Major features:
• Please see the TensorFlow 2.9.1 release notes
• No longer needs environment variable TF_ENABLE_ONEDNN_OPTS to be set to “1”, to turn on oneDNN optimizations on Intel Cascade lake and newer CPUs on Linux OS
• Further Performance improvements for Bfloat16 models with AMX optimizations and more operations are supported with BF16 datatype.
• Supported platforms: Linux, Windows 10, and Windows 11.
Improvements:
• Updated oneDNN to version v2.6
• Performance enhancement for models like 3D-UNet and Yolo-V4 with addition of 3D Convolution-Add fusion and Mish (Softplus-Tanh-Mul) fusions
• Improved performance on MatMul operations with smaller shapes like (50x50).
• Enabled user mode scratchpad for inner-product (FusedMatMul & quantized MatMul) for better memory usage control
• Performance enhancement for SSD-Resnet34 by eliminating unnecessary data copying in per class NMS computation, this reduces memory usage and improves performance
• Throughput improvement for recommendation models by parallelizing UnSortedSegmentOp
• Added auto_mixed_precision_mkl as an optimizer option to be enabled for saved_model in eager mode
• Improvement in saved_model inference performance by removing eager check in remapper for oneDNN specific optimizations.
Bug fixes:
• Issues resolved in TensorFlow 2.9.1
• oneDNN resolved issues. 2.6 resolved issues
• Static scan analysis findings are all fixed.
• Fixed failure related to transformer-mlperf model training with BF16 datatype
• Fixed a failure in //tensorflow/python/framework:node_file_writer_test exposed after eager_op_as_function feature was enabled by default
• Fixed gruv2_test_gpu and layer_correctness_test_gpu tests
Versions and components
• Intel optimized TensorFlow based on TensorFlow v2.9.1: https://github.com/Intel-tensorflow/tensorflow/tree/v2.9.1
• TensorFlow v2.9.1: https://github.com/tensorflow/tensorflow/tree/v2.9.1
• oneDNN: https://github.com/oneapi-src/oneDNN/releases/tag/v2.6
• Model Zoo: https://github.com/IntelAI/models
Known issues
• Open issues: open issues for oneDNN optimizations
• Bfloat16 is not guaranteed to work on AVX or AVX2
• conv3d_backprop_filter_v2_grad_test_cpu, Mkl_fused_op_test unit tests fail , will be fixed in next release.
• In Windows OS, to use oneDNN enabled TensorFlow, users need to run “set TF_ENABLE_ONEDNN_OPTS=1”. Also, if the PC has hyperthreading enabled, users need to bind the ML application to one logical core per CPU in order to get the best runtime performance.
• Use the initialization script from the following link, to get best performance on windows : https://github.com/IntelAI/models/blob/r2.7/benchmarks/common/windows_intel1dnn_setenv.bat