Skip to content

Intel® Optimizations for TensorFlow 2.10.0

Compare
Choose a tag to compare
@justkw justkw released this 21 Sep 21:25
· 40833 commits to master since this release
468e669

This release of Intel® Optimized TensorFlow is based on the TensorFlow v2.10.0 tag and is built with support for oneDNN (oneAPI Deep Neural Network Library). For features and fixes that were introduced in TensorFlow 2.10.0, please see the TensorFlow 2.10 release notes also. This build was built from v2.10.0

This release note covers both Intel® Optimizations for TensorFlow* and official TensorFlow v2.10 which has oneDNN optimizations enabled by default on Linux x86 packages and for CPUs with neural-network-focused hardware features such as AVX512_VNNI, AVX512_BF16, AMX, and others, which are found on Intel Cascade Lake and newer CPUs.

Major features:

· Please see the TensorFlow 2.10.0 release notes
· Further Performance improvements for Bfloat16 models with AMX optimizations and more operations are supported with BF16 datatype.
· Supported platforms: Linux.

Improvements:

· Updated oneDNN to version v2.6.1
· Improved performance on MatMul on Broadwell-2-socket system.
· Improved performance on SSD-MobileNet 300x300 inference.
· pattern matcher is a generic fix in BFloat16, performance enhancement for GAN models
· Added support for div and log ops BFloat16.
· Enabled user mode scratchpad for inner-product (FusedMatMul & quantized MatMul) for better memory usage control
· Renamed “grappler flag” config auto_mixed_precision_mkl API change
· Performance improvement on Resnet50-eager 1S

Bug fixes:

· Tensorflow 2.10.0 resolved issues
· oneDNN resolved issues. 2.6 resolved issues
· Static scan analysis findings are all fixed.
· Fixed conv3d_backprop_filter_v2_grad_test_cpu issue
· Fixed mkl_fused_ops_test failure and disable blocked format
· Fixed a failure in //tensorflow/python/kernel_tests/nn_ops:conv_ops_test_cpu exposed after adding security vulnerability test for raw_ops.Conv2DBackpropInputfunction feature was enabled by default
· Fixed Shape inference fix for INT8 convolutions test
· Fixed pooling_ops_3d_test unit test failure
· Fixed ValueError: operands could not be broadcast together with shapes (0,) (96,) bug in optimize for inference
· Fixed two major bugs //tensorflow/python:quantized_ops_test and //tensorflow/python:dequantized_ops_test
· Fixed Segmentation fault on tf.matmul and tf.einsum with batched input tensors using intel-tensorflow-avx512 issue by adding name of the primitive as key of mkl primitive which intends to avoid collision in the cache
· Fixed unit test quantization_ops:quantization_ops_test failure
· Fixed memory corruption issue by disabling oneDNN primitive cache

Versions and components

• Intel optimized TensorFlow based on TensorFlow v2.10.0: https://github.com/Intel-tensorflow/tensorflow/tree/v2.10.0
• TensorFlow v2.10.0: https://github.com/tensorflow/tensorflow/tree/v2.10.0
• oneDNN: https://github.com/oneapi-src/oneDNN/releases/tag/v2.6.1
• Model Zoo: https://github.com/IntelAI/models

Known issues

  • Bfloat16 is not guaranteed to work on AVX or AVX2
  • In Windows OS, to use oneDNN enabled TensorFlow, users need to run “set TF_ENABLE_ONEDNN_OPTS=1”. Also, if the PC has hyperthreading enabled, users need to bind the ML application to one logical core per CPU in order to get the best runtime performance.
  • Use the initialization script from the following link, to get best performance on windows : https://github.com/IntelAI/models/blob/r2.7/benchmarks/common/windows_intel1dnn_setenv.bat