r1.15.5-deeprec2210
liutongxuan
released this
17 Nov 12:39
·
442 commits
to main
since this release
Major Features and Improvements
Embedding
- Support HBM-DRAM-SSD storage in EmbeddingVariable multi-tier storage.
- Support multi-tier EmbeddingVariable initialized based on frequency when restore model.
- Support to lookup location of ids of EmbeddingVariable.
- Support kv_initialized_op for GPU Embedding Variable.
- Support restore compatibility of EmbeddingVariable using init_from_proto.
- Improve performance of apply/gather ops for EmbeddingVariable.
- Add Eviction Manager in EmbeddingVariable Multi-tier storage.
- Add unified thread pool for cache of Multi-tier storage in EmbeddingVariable.
- Save frequencies and versions of features in SSDHash and LevelDB storage of EmbeddingVariable.
- Avoid invalid eviction use HBM-DRAM storage of EmbeddingVariable.
- Preventing from accessing uninitialized data use EmbeddingVariable.
Graph & Grappler Optimization
- Optimize Async EmbeddingLookup by placement optimization.
- Place VarHandlerOp to Compute main graph for SmartStage.
- Support independent thread pool for stage subgraph to avoid thread contention.
- Implement device placement optimization.
Runtime Optimization
- Support CUDA Graph execution by adding CUDA Graph mode session.
- Support CUDA Graph execution in JIT mode.
- Support intra task cost estimate in CostModel in Executor.
- Support tf.stream and tf.colocate python API for CUDA multi-stream.
- Support embedding subgraphs partition policy when use CUDA multi-stream.
- Optimize CUDA multi-stream by merging copy stream into compute stream.
Ops & Hardware Acceleration
- Add a list of Quantized* and _MklQuantized* ops.
- Implement GPU version of SparseFillEmptyRows.
- Implement c version of spin_lock to support multi-architectures.
- Upgrade the OneDNN version to v2.7.
Distributed
- Support distributed training use SOK based on EmbeddingVariable.
- Add NETWORK_MAX_CONNECTION_TIMEOUT to support connection timeout configurable in StarServer.
- Upgrade the SOK version to v4.2.
IO
- Add TF_NEED_PARQUET_DATASET to enable ParquetDataset.
Serving
- Optimize embedding lookup performance by disable feature filter when serving.
- Optimize error code for user when parse request or response failed.
- Support independent update model threadpool to avoid performance jitter.
ModelZoo
- Add MaskNet Model.
- Add PLE Model.
- Support variable type BF16 in DCN model.
BugFix
- Fix tf.nn.embedding_lookup interface bug and session hang bug when enabling async embedding.
- Fix warmup failed bug when user set warmup file path.
- Fix build failure in ev_allocator.cc and hash.cc on ARM.
- Fix build failure in arrow when build on ARM
- Fix redefined error in NEON header file for ARM.
- Fix _mm_malloc build failure in sparsehash on ARM.
- Fix warmup failed bug when use session_group.
- Fix build save graph bug when creating partitioned EmbeddingVariable in feature_column API.
- Fix the colocation error when using EmbeddingVariable in distribution.
- Fix HostNameToIp fails by replacing gethostbyname by getaddrinfo in StarServer.
More details of features: https://deeprec.readthedocs.io/zh/latest/
Release Images
CPU Image
alideeprec/deeprec-release:deeprec2210-cpu-py36-ubuntu18.04
GPU Image
alideeprec/deeprec-release:deeprec2210-gpu-py36-cu116-ubuntu18.04
Thanks to our Contributors
Duyi-Wang, Locke, shijieliu, Honglin Zhu, chenxujun, GosTraight2020, LALBJ, Nanno