Skip to content

Releases: openucx/ucx

v1.11.0-rc4

19 Jul 20:01
8ff239c
Compare
Choose a tag to compare
v1.11.0-rc4 Pre-release
Pre-release

1.11.0 RC4 (July 19, 2021)

Features:

Core

  • Added support for UCX monitoring using virtual file system (VFS)/FUSE
  • Added support for applications with static CUDA runtime linking
  • Added support for a configuration file
  • Updated clang format configuration

UCP

  • Added rendezvous API for active messages
  • Added user-defined name to context, worker, and endpoint objects
  • Added flag to silence request leak check
  • Added API for endpoint performance evaluation
  • Added API - ucp_request_query
  • Added API - ucp_lib_query
  • Ported connection manager to a new UCT API
  • Added bandwidth optimizations for new protocols multi-lane
  • Added support for multi-rail over lanes with BW ratio >= 1/4
  • Added support for tracking outstanding requests and aborting those in case of connection failure
  • Refactored keep-alive protocol
  • Added device id to wireup protocol
  • Added support up to 128 transport layer resources in UCP context
  • Added support CUDA memory allocations with ucp_mem_map
  • Increased UCP_WORKER_MAX_EP_CONFIG to 64
  • Adjusted memory type zcopy threshold when UCX_ZCOPY_THRESH set
  • Refactored wireup protocols, rendezvous, get, zcopy protocols
  • Added put zcopy multi-rail
  • Improved logging for new protocols
  • Added system topology information
  • Added new protocols for eager offload protocols

UCT

  • Extended connection establishment API
  • Added active message AM alignment in iface params
  • Added active message short IOV API.
  • Added support for interface query by operation and memory type
  • Added API to get allocation base address and length
  • Added md_dereg_v2 API

UCS

  • Added log filter by source file name.
  • Added checking for last element in fraglist queue
  • Added a method to get IP address from sockaddr.
  • Added memory usage limits to registration cache

UCM

  • Improved x86 parser to recognize some mov flavors

CUDA

  • Added registration for whole CUDA allocations
  • Added CUDA-IPC keepalive
  • Adjusted performance estimations
  • Added Improve logging
  • Added allocation methods for CUDA pinned/managed memory
  • Added support for a global cuda_ipc cache

RDMA CORE (IB, ROCE, etc.)

  • Added report of QP info in case of completion with error
  • Refactored of FC send operations
  • Added support for DevX unique QPN allocation
  • Optimized endpoint lookup for DCI
  • Added support for RDMA sub-function (SF)
  • Added support for DCI via DEVX
  • Added DCI pool per LAG port
  • Added support for RoCE IP reachability check using a subnet mask
  • Added active message short IOV for UD/DC/RC mlx, UD/RC verbs
  • Added endpoint keep alive check for UD
  • Suppressed warning if device can't be opened
  • Added support for multiple flush cancel without completion
  • Added ignore for devices with invalid GID
  • Added support for SRQ linked list reordering
  • Added flush by flow control on old devices
  • Added support for configurable rdma_resolve_addr/route timeout

Shared memory

  • Added active message short IOV support for posix, sysv, and self transports

TCP

  • Added support for peer failure in case of CONNECT_TO_EP
  • Added support for active message short IOV

Java

  • Added full support for UCP Java API

Tests

  • Added length/mem_type for UCP client server example
  • Added port sockaddr tests for a new API
  • Added test send-recv between client/server with diff UCX_IB_NUM_PATHS
  • Added support for CUDA and CUDA managed memory in io_demoo
  • Added support for a custom watchdog timeout from command line
  • Extended memtype hook tests

Tools

  • Added UCP active message support to perftest
  • Added error handling option to perftest
  • Added wakeup option
  • Added performance tests for am short iov

CI

  • Added RHEL 7.6 with MOFED 4.7
  • Added Fedora 34, RHEL 7.2, 7.4
  • Added PGI support from HPC-SDK module
  • Added docker image with CUDA 11.2
  • Added IODEMO test
  • Added Ubuntu 20.4
  • Added test for connection manager fallback in client-server testing
  • Added loopback interface for tcp testing

Bugfixes:

Build

  • Fixes in libnuma detection macro
  • Fixes for cross compilation support
  • Fixes for --without-dc compilation

Continues Integration

  • Fixes in Azure pipeline build system
  • Fixes in Coverity CI
  • Fixes in Azure release pipeline

Packaging

  • Fixed in DEB package - added essential system dependencies

Documentation

  • Fixes in UCP, UCT, Readme, FAQ, and Read-the-docs documentation

Tests

  • Fixes in CMA peer failure test
  • Fixes in SRQ tests
  • Fixes in the usage requests_wait
  • Fixes in test_uct_query
  • Fixes addressing race conditions on client user data in test_uct_sockaddr
  • Fixes in IODEMO app
  • Fixes in error handling flow for perftest
  • Fixes in perftest batch tests
  • Fixes addressing hang issues for rendezvous protocol in UCP client server example

UCP

  • Fixes in endpoint error handling
  • Fixes in error reporting failed CM lanes
  • Fixes in progress worker flush
  • Fixes in rendezvous pipeline flow
  • Fixes in recursive protocol selection
  • Fixes in error handling for AM_ZCOPY
  • Fixes in length check condition in RMA PUT short
  • Fixes in failure handling rendezvous offload send
  • Fixes in offload completion with inlined data
  • Fixes in statistics calculations for rendezvous protocol
  • Fixes in ucp_worker_query() thread mode for SERIALIZED
  • Fixes preventing leaks of UCP requests

ROCM

  • Fixes in device memory registration and de-registration
  • Fixes in missing mem_query definition for rocm_copy
  • Fixes addressing build failure due to const violation
  • Fixes in sockaddr_accessibility test for rocm_copy and rocm_ipc
  • Fixes in bandwidth estimation for rocm_ipc

RDMA CORE (IB, ROCE, etc.)

  • Fixes addressing deadlock between DCI resources and RDMA_READ credits
  • Fixes in DSCP for RoCE DCT
  • Fixes in flush(cancel) flow
  • Fixes preventing segfault in uct_rdmacm_cm_ep_str
  • Fixes in scatter-gather entries logging
  • Fixes for compilation with experimental verbs
  • Fixes in UD dgid filtering
  • Fixes in domain resources destroying
  • Fixes in PCIe bandwidth calculation
  • Fixes addressing CQ creation failure using legacy ibv API
  • Fixes in iov2sge converter
  • Fixes in port width check on HDR100
  • Fixes in SL selection
  • Fixes in hardware tag matching compilation
  • Fixes in uct_rdmacm_cm_cqs hash key
  • Fixes for compilation with rdma-core 20

Java

  • Fixes in tag sender mask

UCT

  • Fixes in reachability of loopback ifaces
  • Fixes addressing possible uninitialized memory accesses
  • Fixes in error flow for endpoints created upon receiving connection request
  • Fixes in TCP keepalive to avoid false-positive error detection

UCM

  • Fixes addressing heap corruption caused by ucp_set_event_handler()
  • Fixes in mmap events test

v1.11.0-rc3

06 Jul 14:49
84dcd80
Compare
Choose a tag to compare
v1.11.0-rc3 Pre-release
Pre-release

Features:

Core

  • Added support for UCX monitoring using virtual file system (VFS)/FUSE
  • Added support for applications with static CUDA runtime linking
  • Added support for a configuration file
  • Updated clang format configuration

UCP

  • Added rendezvous API for active messages
  • Added user-defined name to context, worker, and endpoint objects
  • Added flag to silence request leak check
  • Added API for endpoint performance evaluation
  • Added API - ucp_request_query
  • Added API - ucp_lib_query
  • Ported connection manager to a new UCT API
  • Added bandwidth optimizations for new protocols multi-lane
  • Added support for multi-rail over lanes with BW ratio >= 1/4
  • Added support for tracking outstanding requests and aborting those in case of connection failure
  • Refactored keep-alive protocol
  • Added device id to wireup protocol
  • Added support up to 128 transport layer resources in UCP context
  • Added support CUDA memory allocations with ucp_mem_map
  • Increased UCP_WORKER_MAX_EP_CONFIG to 64
  • Adjusted memory type zcopy threshold when UCX_ZCOPY_THRESH set
  • Refactored wireup protocols, rendezvous, get, zcopy protocols
  • Added put zcopy multi-rail
  • Improved logging for new protocols
  • Added system topology information
  • Added new protocols for eager offload protocols

UCT

  • Extended connection establishment API
  • Added active message AM alignment in iface params
  • Added active message short IOV API.
  • Added support for interface query by operation and memory type
  • Added API to get allocation base address and length
  • Added md_dereg_v2 API

UCS

  • Added log filter by source file name.
  • Added checking for last element in fraglist queue
  • Added a method to get IP address from sockaddr.
  • Added memory usage limits to registration cache

UCM

  • Improved x86 parser to recognize some mov flavors

CUDA

  • Added registration for whole CUDA allocations
  • Added CUDA-IPC keepalive
  • Adjusted performance estimations
  • Added Improve logging
  • Added allocation methods for CUDA pinned/managed memory
  • Added support for a global cuda_ipc cache

RDMA CORE (IB, ROCE, etc.)

  • Added report of QP info in case of completion with error
  • Refactored of FC send operations
  • Added support for DevX unique QPN allocation
  • Optimized endpoint lookup for DCI
  • Added support for RDMA sub-function (SF)
  • Added support for DCI via DEVX
  • Added DCI pool per LAG port
  • Added support for RoCE IP reachability check using a subnet mask
  • Added active message short IOV for UD/DC/RC mlx, UD/RC verbs
  • Added endpoint keep alive check for UD
  • Suppressed warning if device can't be opened
  • Added support for multiple flush cancel without completion
  • Added ignore for devices with invalid GID
  • Added support for SRQ linked list reordering
  • Added flush by flow control on old devices
  • Added support for configurable rdma_resolve_addr/route timeout

Shared memory

  • Added active message short IOV support for posix, sysv, and self transports

TCP

  • Added support for peer failure in case of CONNECT_TO_EP
  • Added support for active message short IOV

Java

  • Added full support for UCP Java API

Tests

  • Added length/mem_type for UCP client server example
  • Added port sockaddr tests for a new API
  • Added test send-recv between client/server with diff UCX_IB_NUM_PATHS
  • Added support for CUDA and CUDA managed memory in io_demoo
  • Added support for a custom watchdog timeout from command line
  • Extended memtype hook tests

Tools

  • Added UCP active message support to perftest
  • Added error handling option to perftest
  • Added wakeup option
  • Added performance tests for am short iov

CI

  • Added RHEL 7.6 with MOFED 4.7
  • Added Fedora 34, RHEL 7.2, 7.4
  • Added PGI support from HPC-SDK module
  • Added docker image with CUDA 11.2
  • Added IODEMO test
  • Added Ubuntu 20.4
  • Added test for connection manager fallback in client-server testing
  • Added loopback interface for tcp testing

Bugfixes:

Build

  • Fixes in libnuma detection macro
  • Fixes for cross compilation support
  • Fixes for --without-dc compilation

Continues Integration

  • Fixes in Azure pipeline build system
  • Fixes in Coverity CI
  • Fixes in Azure release pipeline

Packaging

  • Fixed in DEB package - added essential system dependencies

Documentation

  • Fixes in UCP, UCT, Readme, FAQ, and Read-the-docs documentation

Tests

  • Fixes in CMA peer failure test
  • Fixes in SRQ tests
  • Fixes in the usage requests_wait
  • Fixes in test_uct_query
  • Fixes addressing race conditions on client user data in test_uct_sockaddr
  • Fixes in IODEMO app
  • Fixes in error handling flow for perftest
  • Fixes in perftest batch tests
  • Fixes addressing hang issues for rendezvous protocol in UCP client server example

UCP

  • Fixes in endpoint error handling
  • Fixes in error reporting failed CM lanes
  • Fixes in progress worker flush
  • Fixes in rendezvous pipeline flow
  • Fixes in recursive protocol selection
  • Fixes in error handling for AM_ZCOPY
  • Fixes in length check condition in RMA PUT short
  • Fixes in failure handling rendezvous offload send
  • Fixes in offload completion with inlined data
  • Fixes in statistics calculations for rendezvous protocol
  • Fixes in ucp_worker_query() thread mode for SERIALIZED
  • Fixes preventing leaks of UCP requests

ROCM

  • Fixes in device memory registration and de-registration
  • Fixes in missing mem_query definition for rocm_copy
  • Fixes addressing build failure due to const violation
  • Fixes in sockaddr_accessibility test for rocm_copy and rocm_ipc
  • Fixes in bandwidth estimation for rocm_ipc

RDMA CORE (IB, ROCE, etc.)

  • Fixes addressing deadlock between DCI resources and RDMA_READ credits
  • Fixes in DSCP for RoCE DCT
  • Fixes in flush(cancel) flow
  • Fixes preventing segfault in uct_rdmacm_cm_ep_str
  • Fixes in scatter-gather entries logging
  • Fixes for compilation with experimental verbs
  • Fixes in UD dgid filtering
  • Fixes in domain resources destroying
  • Fixes in PCIe bandwidth calculation
  • Fixes addressing CQ creation failure using legacy ibv API
  • Fixes in iov2sge converter
  • Fixes in port width check on HDR100
  • Fixes in SL selection
  • Fixes in hardware tag matching compilation
  • Fixes in uct_rdmacm_cm_cqs hash key

Java

  • Fixes in tag sender mask

UCT

  • Fixes in reachability of loopback ifaces
  • Fixes addressing possible uninitialized memory accesses
  • Fixes in error flow for endpoints created upon receiving connection request

UCM

  • Fixes addressing heap corruption caused by ucp_set_event_handler()
  • Fixes in mmap events test

v1.11.0-rc1

23 Jun 14:46
5a42a81
Compare
Choose a tag to compare
v1.11.0-rc1 Pre-release
Pre-release

TBD

v1.10.1

13 May 02:01
6a5856e
Compare
Choose a tag to compare

1.10.1 (May 12, 2021)

Bugfixes:

  • Fixes in Infiniband port speed detection for HDR100
  • Fixes in building gtest-all.cc and sock.c with GCC11
  • Fixes addressing performance degradation with cuda memory on a self endpoint
  • Fixes in JUCX listener connection handler
  • Fixed in configuration of loopback TCP transport (disable by default)
  • Fixes in RPM dependency on libibverbs
  • Fixes in ABI backward compatibility for active message protocol
  • Fixes in the DC transport - adding support for full-handshake mode (off by default)
  • Fixes in Active Messages short reply protocol
  • Fixes for segmentation fault while listening for connections

v1.10.1-rc2

10 May 21:41
f633e85
Compare
Choose a tag to compare
v1.10.1-rc2 Pre-release
Pre-release

1.10.1 RC2 (May 10, 2021)

Bugfixes:

  • Fixes in Infiniband port speed detection for HDR100
  • Fixes in building gtest-all.cc and sock.c with GCC11
  • Fixes addressing performance degradation with cuda memory on a self endpoint
  • Fixes in JUCX listener connection handler
  • Fixed in configuration of loopback TCP transport (disable by default)
  • Fixes in RPM dependency on libibverbs
  • Fixes in ABI backward compatibility for active message protocol
  • Add support for DC full-handshake mode (off by default)
  • Fixes in Active Messages short reply protocol
  • Fixes for segmentation fault while listening for connections

v1.10.1-rc1

22 Apr 23:57
cbcc551
Compare
Choose a tag to compare
v1.10.1-rc1 Pre-release
Pre-release

1.10.1-rc1

Bugfixes:

  • Fix Infiniband port speed detection for HDR100
  • Fix build issues in gtest-all.cc and sock.c with GCC11
  • Fix performance degradation with cuda memory on self endpoint
  • Fix bug in JUCX listener connection handler.

v1.10.0

09 Mar 23:04
20697e5
Compare
Choose a tag to compare

Features:

Core

  • Added support for Nvidia HPC SDK
  • Added support for latest PGI and Clang
  • Added support for ROCM-3.7+ (warning generated if older version detected)
  • Added support for GCC11

Architecture

  • Added Arm SVE memcpy()
  • Redesigned Arm WFE support
  • Improved clear_cache performance for Arm
  • Added architecture detection for Zhaoxin CPU

CI

  • Added release builds on CUDA 11
  • Enabled performance validation in gtest
  • Added new OS for release CI

UCP

  • Added locality awareness to the transport selection logic for GPU devices
  • Added put/offload/short and put/offload/zcopy protocols
  • Added receive message nbx routine
  • Reworked AM implementation and API, which adds support for RNDV semantics
  • Added support for multi-lane connection manager over TCP
  • Added support for printing AM tls with info log level
  • Implement flush and destroy for UCT EPs on UCP worker
  • Reduced UCP request size
  • Added support for keepalive protocol
  • Added support for multi-fragment protocol
  • Added implementation for protocol progress for eager, bcopy, and multicopy
  • Improved selection logic for protocol selection
  • Added new protocols for UCP get operation
  • Added bcopy protocols with support for GPU memory
  • Added RNDV protocol implementation for GPU devices (CUDA, ROCm)
  • Set SOCKADDR_CM_ENABLE=y by default
  • Added support for fast-path short with new tag protocols
  • Added a new parameter to control the CM listener's backlog
  • Added support sending AM RTS over short message protocol
  • Added support for shared memory multi-lane when CM is used
  • Added missing async locks

UCT

  • Added API for keepalive_timeout value
  • Added add uct_completion.status
  • Allowed transports to access multiple mem_types
  • Removed status arg from uct_completion_callback_t
  • Restructured uct_mem_alloc/uct_md_mem_alloc to use mem_type
  • Updated documentation for uct_listener_params
  • Lowered the log level for certain network errors
  • Added cuda_copy wakeup feature
  • Added wakeup support for shared memory

UCS

  • Added "inf" and "auto" values to time units
  • Added on-stack constructors for array and string buffer
  • Added ucs_ptr_map_t data structure
  • Added bool CSWAP
  • Improved logging
  • Added optimization for namespace processing
  • Fixes for connection matching functionality

CUDA

  • Added support for global IPC cache

RDMA CORE (IB, ROCE, etc.)

  • Added support for auto detection of adapative routing settings
  • Added an option to poll TX CQ every progress iteration
  • Added local and remote addresses to the reject error message
  • Added support for UAR allocation with non-cacheable memory type
  • Added support for multiple flush cancel without completion
  • Added async events callback support
  • Added detection for ConnectX-6, ConnectX-7 and BlueField-1/2 devices
  • Added support for connection matching for UD
  • Added a check for AM ordering
  • Added better support for non-4K MTU values

Java (preview)

  • Added support for a different javadoc executable path for different java versions
  • Added UCS memory type constants
  • Added support build on Java10+
  • Added support for io-vector datatype.
  • Removed libjucx from packages.

Tests

  • Added CI for CUDA 11
  • Added test_ucp_sockaddr_protocols.stream_short
  • Reimplemented tests using NBX API
  • Added flush(cancel) test
  • Added memory_wait mode to perftest
  • Added support for clang 10
  • Refactored RMA and atomic tests, add memtype support
  • Added test for uct_md_mem_query()
  • Added request interrupt support
  • Added support for connection manager fallbacks
  • Added new ucp request test checking for leaks from the ptr_map

Documentation

  • Added glossaries

Bugfixes:

Portability

  • Fixes in print functions to use format string like PRIx64, etc.
  • Fixes for Arm v8 cross compilation support

Continues Integration:

  • Fixes in Github release flow
  • Fixes in docker image

Packaging

  • Removed deb package dependencies
  • Fixes in SPEC to make the RPM relocatable

Documentation

  • Fixes in documentation for ucp_am_recv_data_nbx
  • Fixes in quick start example
  • Fixes in installation instruction
  • Fixes in updates in author list

Tests

  • Fixes for failures under valgrind runtime
  • Fixes in mmap tests for 0-length RMA
  • Fixes in definition of LAST_WQE wait timeout
  • Fixes in ROCm for mem_buffer test
  • Fixes in test name printing format
  • Fixes in tcp_sockcm test

UCP

  • Fixes in worker cleanup flow
  • Fixes in RNDV RTS flow
  • Fix in length check condition for RMA PUT short
  • Fixes in handling failures from AM Bcopy
  • Fix in a release flow of deferred data
  • Fixes for invalid ID and handling of status in RNDV
  • Fixes in short active message reply protocol

CUDA

  • Fixes in managed memory support
  • Fixes in topology detection

RDMA CORE (IB, ROCE, etc.)

  • Fixes in assert definitions
  • Fixes in printing an error about invalid AM Bcopy length for UD
  • Fixes for thread safety support
  • Fixes to get ROCE device name according to GID
  • Fixes for SL selection
  • Fixes in create STRICT_ORDER key
  • Fixes addressing performance degradation in UD transport due to excess async events
  • Fixes in QP destroy
  • Fixes for CQ creation failure using old Verbs API

UGNI

  • Fixing disable logic in config
  • Fixing clang 11 warnings

Java

  • Fixes in build dependencies
  • Fixes in constructing UcpRequest object on error
  • Fixes in exception handling on endpoint closure request
  • Fixes for segfault in UcpErrorHandler

UCP

  • Fixes in datatype support for get_zcopy RNDV
  • Fixes in connection manager disconnect
  • Fixes in assert definitions
  • Fixes in completion flow for failed EP
  • Fixes in flush error handling flow
  • Fixes in latency calculations for wireup protocol
  • Fixes in offload completion with inlined data
  • Fixes in unpacking flow
  • Fixes in error handling for various protocols

UCT

  • Fixes in flush TX
  • Fixes in checks for enabling GPU Direct RDMA

UCS

  • Fixes for crashes on incorrect value set in config
  • Fixes in ptr_array
  • Fixes in maximal size for ucs_snprintf_safe()
  • Fixes in compilation warning
  • Fixes in ucs_aarch64_dsb(_op) definition

TCP

  • Fixes in default route interface confirmation flow
  • Fixes in PUT protocol
  • Fixes in max connection limit and improved error reporting

UCM

  • Fixing crash on prevent unload
  • Fixes in libucm_rocm
  • Fixes for few racing conditions

v1.10.0-rc5

27 Feb 15:00
b54f3b2
Compare
Choose a tag to compare
v1.10.0-rc5 Pre-release
Pre-release

1.10.0-rc5 (February 26, 2021)

Features:

Core

  • Added support for Nvidia HPC SDK
  • Added support for latest PGI and Clang
  • Added support for ROCM-3.7+ (warning generated if older version detected)
  • Added support for GCC11

Architecture

  • Added Arm SVE memcpy()
  • Redesigned Arm WFE support
  • Improved clear_cache performance for Arm
  • Added architecture detection for Zhaoxin CPU

CI

  • Added release builds on CUDA 11
  • Enabled performance validation in gtest
  • Added new OS for release CI

UCP

  • Added locality awareness to the transport selection logic for GPU devices
  • Added put/offload/short and put/offload/zcopy protocols
  • Added receive message nbx routine
  • Reworked AM implementation and API, which adds support for RNDV semantics
  • Added support for multi-lane connection manager over TCP
  • Added support for printing AM tls with info log level
  • Implement flush and destroy for UCT EPs on UCP worker
  • Reduced UCP request size
  • Added support for keepalive protocol
  • Added support for multi-fragment protocol
  • Added implementation for protocol progress for eager, bcopy, and multicopy
  • Improved selection logic for protocol selection
  • Added new protocols for UCP get operation
  • Added bcopy protocols with support for GPU memory
  • Added RNDV protocol implementation for GPU devices (CUDA, ROCm)
  • Set SOCKADDR_CM_ENABLE=y by default
  • Added support for fast-path short with new tag protocols
  • Added a new parameter to control the CM listener's backlog
  • Added support sending AM RTS over short message protocol
  • Added support for shared memory multi-lane when CM is used
  • Added missing async locks

UCT

  • Added API for keepalive_timeout value
  • Added add uct_completion.status
  • Allowed transports to access multiple mem_types
  • Removed status arg from uct_completion_callback_t
  • Restructured uct_mem_alloc/uct_md_mem_alloc to use mem_type
  • Updated documentation for uct_listener_params
  • Lowered the log level for certain network errors
  • Added cuda_copy wakeup feature
  • Added wakeup support for shared memory

UCS

  • Added "inf" and "auto" values to time units
  • Added on-stack constructors for array and string buffer
  • Added ucs_ptr_map_t data structure
  • Added bool CSWAP
  • Improved logging
  • Added optimization for namespace processing
  • Fixes for connection matching functionality

CUDA

  • Added support for global IPC cache

RDMA CORE (IB, ROCE, etc.)

  • Added support for auto detection of adapative routing settings
  • Added an option to poll TX CQ every progress iteration
  • Added local and remote addresses to the reject error message
  • Added support for UAR allocation with non-cacheable memory type
  • Added support for multiple flush cancel without completion
  • Added async events callback support
  • Added detection for ConnectX-6, ConnectX-7 and BlueField-1/2 devices
  • Added support for connection matching for UD
  • Added a check for AM ordering
  • Added better support for non-4K MTU values

Java (preview)

  • Added support for a different javadoc executable path for different java versions
  • Added UCS memory type constants
  • Added support build on Java10+
  • Added support for io-vector datatype.

Tests

  • Added CI for CUDA 11
  • Added test_ucp_sockaddr_protocols.stream_short
  • Reimplemented tests using NBX API
  • Added flush(cancel) test
  • Added memory_wait mode to perftest
  • Added support for clang 10
  • Refactored RMA and atomic tests, add memtype support
  • Added test for uct_md_mem_query()
  • Added request interrupt support
  • Added support for connection manager fallbacks
  • Added new ucp request test checking for leaks from the ptr_map

Documentation

  • Added glossaries

Bugfixes:

Portability

  • Fixes in print functions to use format string like PRIx64, etc.
  • Fixes for Arm v8 cross compilation support

Continues Integration:

  • Fixes in Github release flow
  • Fixes in docker image

Packaging

  • Removed deb package dependencies
  • Fixes in SPEC to make the RPM relocatable

Documentation

  • Fixes in documentation for ucp_am_recv_data_nbx
  • Fixes in quick start example
  • Fixes in installation instruction
  • Fixes in updates in author list

Tests

  • Fixes for failures under valgrind runtime
  • Fixes in mmap tests for 0-length RMA
  • Fixes in definition of LAST_WQE wait timeout
  • Fixes in ROCm for mem_buffer test
  • Fixes in test name printing format
  • Fixes in tcp_sockcm test

UCP

  • Fixes in worker cleanup flow
  • Fixes in RNDV RTS flow
  • Fix in length check condition for RMA PUT short
  • Fixes in handling failures from AM Bcopy
  • Fix in a release flow of deferred data
  • Fixes for invalid ID and handling of status in RNDV
  • Fixes in short active message reply protocol

CUDA

  • Fixes in managed memory support
  • Fixes in topology detection

RDMA CORE (IB, ROCE, etc.)

  • Fixes in assert definitions
  • Fixes in printing an error about invalid AM Bcopy length for UD
  • Fixes for thread safety support
  • Fixes to get ROCE device name according to GID
  • Fixes for SL selection
  • Fixes in create STRICT_ORDER key
  • Fixes addressing performance degradation in UD transport due to excess async events
  • Fixes in QP destroy
  • Fixes for CQ creation failure using old Verbs API

UGNI

  • Fixing disable logic in config
  • Fixing clang 11 warnings

Java

  • Fixes in build dependencies
  • Fixes in constructing UcpRequest object on error
  • Fixes in exception handling on endpoint closure request
  • Fixes for segfault in UcpErrorHandler

UCP

  • Fixes in datatype support for get_zcopy RNDV
  • Fixes in connection manager disconnect
  • Fixes in assert definitions
  • Fixes in completion flow for failed EP
  • Fixes in flush error handling flow
  • Fixes in latency calculations for wireup protocol
  • Fixes in offload completion with inlined data
  • Fixes in unpacking flow
  • Fixes in error handling for various protocols

UCT

  • Fixes in flush TX
  • Fixes in checks for enabling GPU Direct RDMA

UCS

  • Fixes for crashes on incorrect value set in config
  • Fixes in ptr_array
  • Fixes in maximal size for ucs_snprintf_safe()
  • Fixes in compilation warning
  • Fixes in ucs_aarch64_dsb(_op) definition

TCP

  • Fixes in default route interface confirmation flow
  • Fixes in PUT protocol
  • Fixes in max connection limit and improved error reporting

UCM

  • Fixing crash on prevent unload
  • Fixes in libucm_rocm
  • Fixes for few racing conditions

v1.10.0-rc4

21 Feb 17:35
96422ce
Compare
Choose a tag to compare
v1.10.0-rc4 Pre-release
Pre-release

1.10.0-rc4 (February 20, 2021)

Features:

Core

  • Added support for Nvidia HPC SDK
  • Added support for latest PGI and Clang
  • Added support for ROCM-3.7+ (warning generated if older version detected)
  • Added support for GCC11

Architecture

  • Added Arm SVE memcpy()
  • Redesigned Arm WFE support
  • Improved clear_cache performance for Arm
  • Added architecture detection for Zhaoxin CPU

CI

  • Added release builds on CUDA 11
  • Enabled performance validation in gtest
  • Added new OS for release CI

UCP

  • Added locality awareness to the transport selection logic for GPU devices
  • Added put/offload/short and put/offload/zcopy protocols
  • Added receive message nbx routine
  • Reworked AM implementation and API, which adds support for RNDV semantics
  • Added support for multi-lane connection manager over TCP
  • Added support for printing AM tls with info log level
  • Implement flush and destroy for UCT EPs on UCP worker
  • Reduced UCP request size
  • Added support for keepalive protocol
  • Added support for multi-fragment protocol
  • Added implementation for protocol progress for eager, bcopy, and multicopy
  • Improved selection logic for protocol selection
  • Added new protocols for UCP get operation
  • Added bcopy protocols with support for GPU memory
  • Added RNDV protocol implementation for GPU devices (CUDA, ROCm)
  • Set SOCKADDR_CM_ENABLE=y by default
  • Added support for fast-path short with new tag protocols
  • Added a new parameter to control the CM listener's backlog
  • Added support sending AM RTS over short message protocol
  • Added support for shared memory multi-lane when CM is used
  • Added missing async locks

UCT

  • Added API for keepalive_timeout value
  • Added add uct_completion.status
  • Allowed transports to access multiple mem_types
  • Removed status arg from uct_completion_callback_t
  • Restructured uct_mem_alloc/uct_md_mem_alloc to use mem_type
  • Updated documentation for uct_listener_params
  • Lowered the log level for certain network errors
  • Added cuda_copy wakeup feature
  • Added wakeup support for shared memory

UCS

  • Added "inf" and "auto" values to time units
  • Added on-stack constructors for array and string buffer
  • Added ucs_ptr_map_t data structure
  • Added bool CSWAP
  • Improved logging
  • Added optimization for namespace processing
  • Fixes for connection matching functionality

CUDA

  • Added support for global IPC cache

RDMA CORE (IB, ROCE, etc.)

  • Added support for auto detection of adapative routing settings
  • Added an option to poll TX CQ every progress iteration
  • Added local and remote addresses to the reject error message
  • Added support for UAR allocation with non-cacheable memory type
  • Added support for multiple flush cancel without completion
  • Added async events callback support
  • Added detection for ConnectX-6, ConnectX-7 and BlueField-1/2 devices
  • Added support for connection matching for UD
  • Added a check for AM ordering
  • Added better support for non-4K MTU values

Java (preview)

  • Added support for a different javadoc executable path for different java versions
  • Added UCS memory type constants
  • Added support build on Java10+
  • Added support for io-vector datatype.

Tests

  • Added CI for CUDA 11
  • Added test_ucp_sockaddr_protocols.stream_short
  • Reimplemented tests using NBX API
  • Added flush(cancel) test
  • Added memory_wait mode to perftest
  • Added support for clang 10
  • Refactored RMA and atomic tests, add memtype support
  • Added test for uct_md_mem_query()
  • Added request interrupt support
  • Added support for connection manager fallbacks
  • Added new ucp request test checking for leaks from the ptr_map

Documentation

  • Added glossaries

Bugfixes:

Portability

  • Fixes in print functions to use format string like PRIx64, etc.
  • Fixes for Arm v8 cross compilation support

Continues Integration:

  • Fixes in Github release flow
  • Fixes in docker image

Packaging

  • Removed deb package dependencies
  • Fixes in SPEC to make the RPM relocatable

Documentation

  • Fixes in documentation for ucp_am_recv_data_nbx
  • Fixes in quick start example
  • Fixes in installation instruction
  • Fixes in updates in author list

Tests

  • Fixes for failures under valgrind runtime
  • Fixes in mmap tests for 0-length RMA
  • Fixes in definition of LAST_WQE wait timeout
  • Fixes in ROCm for mem_buffer test
  • Fixes in test name printing format
  • Fixes in tcp_sockcm test

UCP

  • Fixes in worker cleanup flow
  • Fixes in RNDV RTS flow
  • Fix in length check condition for RMA PUT short
  • Fixes in handling failures from AM Bcopy
  • Fix in a release flow of deferred data
  • Fixes for invalid ID and handling of status in RNDV

CUDA

  • Fixes in managed memory support
  • Fixes in topology detection

RDMA CORE (IB, ROCE, etc.)

  • Fixes in assert definitions
  • Fixes in printing an error about invalid AM Bcopy length for UD
  • Fixes for thread safety support
  • Fixes to get ROCE device name according to GID
  • Fixes for SL selection
  • Fixes in create STRICT_ORDER key
  • Fixes addressing performance degradation in UD transport due to excess async events
  • Fixes in QP destroy
  • Fixes for CQ creation failure using old Verbs API

UGNI

  • Fixing disable logic in config
  • Fixing clang 11 warnings

Java

  • Fixes in build dependencies
  • Fixes in constructing UcpRequest object on error
  • Fixes in exception handling on endpoint closure request
  • Fixes for segfault in UcpErrorHandler

UCP

  • Fixes in datatype support for get_zcopy RNDV
  • Fixes in connection manager disconnect
  • Fixes in assert definitions
  • Fixes in completion flow for failed EP
  • Fixes in flush error handling flow
  • Fixes in latency calculations for wireup protocol
  • Fixes in offload completion with inlined data
  • Fixes in unpacking flow
  • Fixes in error handling for various protocols

UCT

  • Fixes in flush TX
  • Fixes in checks for enabling GPU Direct RDMA

UCS

  • Fixes for crashes on incorrect value set in config
  • Fixes in ptr_array
  • Fixes in maximal size for ucs_snprintf_safe()
  • Fixes in compilation warning
  • Fixes in ucs_aarch64_dsb(_op) definition

TCP

  • Fixes in default route interface confirmation flow
  • Fixes in PUT protocol
  • Fixes in max connection limit and improved error reporting

UCM

  • Fixing crash on prevent unload
  • Fixes in libucm_rocm
  • Fixes for few racing conditions

v1.10.0-rc3

15 Feb 16:27
c334359
Compare
Choose a tag to compare
v1.10.0-rc3 Pre-release
Pre-release

1.10.0-rc3 (February 15, 2021)

Features:

Core

  • Added support for Nvidia HPC SDK
  • Added support for latest PGI and Clang
  • Added support for ROCM-3.7+ (warning generated if older version detected)
  • Added support for GCC11

Architecture

  • Added Arm SVE memcpy()
  • Redesigned Arm WFE support
  • Improved clear_cache performance for Arm
  • Added architecture detection for Zhaoxin CPU

CI

  • Added release builds on CUDA 11
  • Enabled performance validation in gtest

UCP

  • Added locality awareness to the transport selection logic for GPU devices
  • Added put/offload/short and put/offload/zcopy protocols
  • Added receive message nbx routine
  • Reworked AM implementation and API, which adds support for RNDV semantics
  • Added support for multi-lane connection manager over TCP
  • Added support for printing AM tls with info log level
  • Implement flush and destroy for UCT EPs on UCP worker
  • Reduced UCP request size
  • Added support for keepalive protocol
  • Added support for multi-fragment protocol
  • Added implementation for protocol progress for eager, bcopy, and multicopy
  • Improved selection logic for protocol selection
  • Added new protocols for UCP get operation
  • Added bcopy protocols with support for GPU memory
  • Added RNDV protocol implementation for GPU devices (CUDA, ROCm)
  • Set SOCKADDR_CM_ENABLE=y by default
  • Added support for fast-path short with new tag protocols
  • Added a new parameter to control the CM listener's backlog
  • Added support sending AM RTS over short message protocol
  • Added support for shared memory multi-lane when CM is used

UCT

  • Added API for keepalive_timeout value
  • Added add uct_completion.status
  • Allowed transports to access multiple mem_types
  • Removed status arg from uct_completion_callback_t
  • Restructured uct_mem_alloc/uct_md_mem_alloc to use mem_type
  • Updated documentation for uct_listener_params
  • Lowered the log level for certain network errors
  • Added cuda_copy wakeup feature
  • Added wakeup support for shared memory

UCS

  • Added "inf" and "auto" values to time units
  • Added on-stack constructors for array and string buffer
  • Added ucs_ptr_map_t data structure
  • Added bool CSWAP
  • Improved logging
  • Added optimization for namespace processing
  • Fixes for connection matching functionality

RDMA CORE (IB, ROCE, etc.)

  • Added support for auto detection of adapative routing settings
  • Added an option to poll TX CQ every progress iteration
  • Added local and remote addresses to the reject error message
  • Added support for UAR allocation with non-cacheable memory type
  • Added support for multiple flush cancel without completion
  • Added async events callback support
  • Added detection for ConnectX-6, ConnectX-7 and BlueField-1/2 devices
  • Added support for connection matching for UD
  • Added a check for AM ordering
  • Added better support for non-4K MTU values

Java (preview)

  • Added support for a different javadoc executable path for different java versions
  • Added UCS memory type constants
  • Added support build on Java10+
  • Added support for io-vector datatype.

Tests

  • Added CI for CUDA 11
  • Added test_ucp_sockaddr_protocols.stream_short
  • Reimplemented tests using NBX API
  • Added flush(cancel) test
  • Added memory_wait mode to perftest
  • Added support for clang 10
  • Refactored RMA and atomic tests, add memtype support
  • Added test for uct_md_mem_query()
  • Added request interrupt support
  • Added support for connection manager fallbacks
  • Added new ucp request test checking for leaks from the ptr_map

Documentation

  • Added glossaries

Bugfixes:

Portability

  • Fixes in print functions to use format string like PRIx64, etc.
  • Fixes for Arm v8 cross compilation support

Continues Integration:

  • Fixes in Github release flow
  • Fixes in docker image

Packaging

  • Removed deb package dependencies
  • Fixes in SPEC to make the RPM relocatable

Documentation

  • Fixes in documentation for ucp_am_recv_data_nbx
  • Fixes in quick start example
  • Fixes in installation instruction
  • Fixes in updates in author list

Tests

  • Fixes for failures under valgrind runtime
  • Fixes in mmap tests for 0-length RMA
  • Fixes in definition of LAST_WQE wait timeout
  • Fixes in ROCm for mem_buffer test
  • Fixes in test name printing format
  • Fixes in tcp_sockcm test

UCP

  • Fixes in worker cleanup flow
  • Fixes in RNDV RTS flow
  • Fix in length check condition for RMA PUT short
  • Fixes in handling failures from AM Bcopy
  • Fix in a release flow of deferred data
  • Fixes for invalid ID and handling of status in RNDV

CUDA

  • Fixes in managed memory support

RDMA CORE (IB, ROCE, etc.)

  • Fixes in assert definitions
  • Fixes in printing an error about invalid AM Bcopy length for UD
  • Fixes for thread safety support
  • Fixes to get ROCE device name according to GID
  • Fixes for SL selection
  • Fixes in create STRICT_ORDER key
  • Fixes addressing performance degradation in UD transport due to excess async events
  • Fixes in QP destroy
  • Fixes for CQ creation failure using old Verbs API

UGNI

  • Fixing disable logic in config
  • Fixing clang 11 warnings

Java

  • Fixes in build dependencies
  • Fixes in constructing UcpRequest object on error
  • Fixes in exception handling on endpoint closure request
  • Fixes for segfault in UcpErrorHandler

UCP

  • Fixes in datatype support for get_zcopy RNDV
  • Fixes in connection manager disconnect
  • Fixes in assert definitions
  • Fixes in completion flow for failed EP
  • Fixes in flush error handling flow
  • Fixes in latency calculations for wireup protocol
  • Fixes in offload completion with inlined data
  • Fixes in unpacking flow
  • Fixes in error handling for various protocols

UCT

  • Fixes in flush TX
  • Fixes in checks for enabling GPU Direct RDMA

UCS

  • Fixes for crashes on incorrect value set in config
  • Fixes in ptr_array
  • Fixes in maximal size for ucs_snprintf_safe()
  • Fixes in compilation warning
  • Fixes in ucs_aarch64_dsb(_op) definition

TCP

  • Fixes in default route interface confirmation flow
  • Fixes in PUT protocol
  • Fixes in max connection limit and improved error reporting

UCM

  • Fixing crash on prevent unload
  • Fixes in libucm_rocm
  • Fixes for few racing conditions