Skip to content

Version 0.6.3 - Memory pool support, compatibility improvements, minor changes

Compare
Choose a tag to compare
@eyalroz eyalroz released this 29 Apr 17:33
· 171 commits to master since this release

Changes since v0.6.2:

Memory pool support (#249) and related changes

  • Added a cuda::memory::pool_t proxy class
  • Memory pools are created using cuda::memory::pool::create() or via device methods
  • IPC: Can import (and export) pools and their allocations to/from other processes
  • #485 Moved the physical_allocation namespace up from memory::virtual_ into memory, as it is used also for memory pools

Bug fixes

  • #508 With CUDA >= 11.3, we no longer give up on creating a module from a compiled program just because no CUBIN is available - and try to use the PTX like with earlier CUDA versions.
  • #493 cuda::launch_config_builder_t::overall_size() now takes a cuda::grid::overall_dimension_t rather than a cuda::grid::dimension_t (not the same size).
  • #492 Avoiding inclusion of cooperative_groups.h from within the API headers

Other API changes

  • #511 Can now create CUDA runtime errors with fully-user-overriden what() message (probably not very interesting outside the API's internals).
  • #510 When NVRTC/PTX compiler complains about an invalid option, we now include the options passed in the thrown exceptions
  • #499 No longer exposing deprecated surface-related API functions with CUDA 12 and later.
  • #498 The launch config builder class now supports num_blocks() as an alias for the grid_size() method.
  • #488 cuda::memory::host::allocate() now returns a cuda::memory::region_t, for better consistency.
  • #486 Some changes to cuda::kernel::wrap.
  • Renamed cuda::memory::attribute_value_type_t -> cuda::memory::attribute_value_t
  • #483 It's now easier to convert memory::region_t's into typed spans.
  • #482 Improvements to the built-in cuda::span (which is used when std::span is unavailable) - making it somewhat more compatible with std::span
  • Make more comparison operators constexpr and noexcept

Compatibility

  • Wrappers now build correctly (again) with --std=c++20.
  • #501 Added a new NVRTC error code introduced in CUDA 12.1
  • #500 When using CUDA 12, use the term "LTO IR" rather than "NVVM" as appropriate
  • #494 Work around an MSVC issue with variadic template-templates
  • #491 Avoiding some warnings issued by MSVC
  • #480 Add example program built with each C++ version after 11 supported by the compiler

Build issues

  • Now requiring CMake version 3.25. You can download an up-to-date version from Kitware's website; it doesn't require any special installation.
  • #490 Switched from depending on CUDA::nvToolkitExt to depending on CUDA::nvtx, for CUDA versions 10.0 and above.