Intel® Optimizations for TensorFlow 2.8.0
This release of Intel® Optimized TensorFlow is based on the TensorFlow v2.8.0 tag and is built with support for oneDNN (oneAPI Deep Neural Network Library). For features and fixes that were introduced in TensorFlow 2.8.0, please see the TensorFlow 2.8.0 release notes also. This build was built from v2.8.0.
This release note covers both Intel® Optimizations for TensorFlow* and official TensorFlow v2.8.0 with oneDNN enabled (via setting the environment variable TF_ENABLE_ONEDNN_OPTS to 1).
Major features
• Please see the TensorFlow 2.8.0 release notes
• Performance improvements for Bfloat16 models with AMX optimizations.
• Enabled support for 12th Gen Intel(R) Core (TM) (code named Alder Lake) platform.
• No longer supports oneDNN block format, i.e., setting TF_ENABLE_MKL_NATIVE_FORMAT=0 will not enable blocked format.
• To enable AMX optimization, you no longer need DNNL_MAX_CPU_ISA = AVX512_CORE_AMX.
• Supported platforms: Linux, Windows 10, and Windows 11.
Improvements
• Updated oneDNN to version 2.5.1
• oneDNN namespace changed from “mkldnn” to “dnnl” and cleaned up source code to remove unnecessary header files, dangling methods and/or data members which were part of older MKL-DNN support
• Improved _FusedMatMul operation, which enhances the performance of models like BERT
• Added LayerNormalization ops fusion and BatchMatMul – Mul – AddV2 fusion to improve performance of Transformer based language models
• Improved performance of EfficentNet and EfficientDet models with addition of swish (Sigmoid – Mul) fusion
• Removed unnecessary transpose elimination to enhance performance for 3DUnet model
Bug fixes
• Issues resolved in TensorFlow 2.8
• oneDNN resolved issues. 2.5.1 resolved issues
• Fixed undefined behavior for cases when different number of threads used at primitive creation and execution
• Static scan analysis findings are all fixed.
• Fixed a bug with _FusedConv3D op registration
• Fixed run_eager_op_as_function_test and nn_fused_batchnorm_deterministic test failures
• Transformer-LT performance degradation is fixed
• Wide-and-deep INT8 performance degradation is fixed.
Versions and components
• Intel optimized TensorFlow based on TensorFlow v2.8.0: https://github.com/Intel-tensorflow/tensorflow/tree/v2.8.0
• TensorFlow v2.8.0: https://github.com/tensorflow/tensorflow/tree/v2.8.0
• oneDNN: https://github.com/oneapi-src/oneDNN/releases/tag/v2.5.1
• Model Zoo: https://github.com/IntelAI/models
Known issues
• Open issues: open issues for oneDNN optimizations
• Bfloat16 is not guaranteed to work on AVX or AVX2
• Mkl_fused_op_test unit test fails, will be fixed in next release.
• In Windows OS, to use oneDNN enabled TensorFlow, users need to run “set TF_ENABLE_ONEDNN_OPTS=1”. Also, if the PC has hyperthreading enabled, users need to bind the ML application to one logical core per CPU in order to get the best runtime performance.