Version 0.6.3 - Memory pool support, compatibility improvements, minor changes
Changes since v0.6.2:
Memory pool support (#249) and related changes
- Added a
cuda::memory::pool_t
proxy class - Memory pools are created using
cuda::memory::pool::create()
or via device methods - IPC: Can import (and export) pools and their allocations to/from other processes
- #485 Moved the
physical_allocation
namespace up frommemory::virtual_
intomemory
, as it is used also for memory pools
Bug fixes
- #508 With CUDA >= 11.3, we no longer give up on creating a module from a compiled program just because no CUBIN is available - and try to use the PTX like with earlier CUDA versions.
- #493
cuda::launch_config_builder_t::overall_size()
now takes acuda::grid::overall_dimension_t
rather than acuda::grid::dimension_t
(not the same size). - #492 Avoiding inclusion of
cooperative_groups.h
from within the API headers
Other API changes
- #511 Can now create CUDA runtime errors with fully-user-overriden
what()
message (probably not very interesting outside the API's internals). - #510 When NVRTC/PTX compiler complains about an invalid option, we now include the options passed in the thrown exceptions
- #499 No longer exposing deprecated surface-related API functions with CUDA 12 and later.
- #498 The launch config builder class now supports
num_blocks()
as an alias for thegrid_size()
method. - #488
cuda::memory::host::allocate()
now returns acuda::memory::region_t
, for better consistency. - #486 Some changes to
cuda::kernel::wrap
. - Renamed
cuda::memory::attribute_value_type_t
->cuda::memory::attribute_value_t
- #483 It's now easier to convert
memory::region_t
's into typed spans. - #482 Improvements to the built-in
cuda::span
(which is used whenstd::span
is unavailable) - making it somewhat more compatible withstd::span
- Make more comparison operators
constexpr
andnoexcept
Compatibility
- Wrappers now build correctly (again) with
--std=c++20
. - #501 Added a new NVRTC error code introduced in CUDA 12.1
- #500 When using CUDA 12, use the term "LTO IR" rather than "NVVM" as appropriate
- #494 Work around an MSVC issue with variadic template-templates
- #491 Avoiding some warnings issued by MSVC
- #480 Add example program built with each C++ version after 11 supported by the compiler
Build issues
- Now requiring CMake version 3.25. You can download an up-to-date version from Kitware's website; it doesn't require any special installation.
- #490 Switched from depending on
CUDA::nvToolkitExt
to depending onCUDA::nvtx
, for CUDA versions 10.0 and above.