-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cudastf #1
Cudastf #1
Commits on Jul 17, 2024
-
Configuration menu - View commit details
-
Copy full SHA for cc316db - Browse repository at this point
Copy the full SHA cc316dbView commit details -
Use libcu++ void_t everywhere (NVIDIA#1977)
* Make libcu++ void_t available in C++11 * Replace uses of CUB and Thrust's void_t by libcu++ * Deprecate Thrust's void_t * Deprecate libcu++'s __void_t
Configuration menu - View commit details
-
Copy full SHA for 72f77c1 - Browse repository at this point
Copy the full SHA 72f77c1View commit details -
Configuration menu - View commit details
-
Copy full SHA for be91914 - Browse repository at this point
Copy the full SHA be91914View commit details
Commits on Jul 18, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 64e7a06 - Browse repository at this point
Copy the full SHA 64e7a06View commit details -
Configuration menu - View commit details
-
Copy full SHA for 87d0849 - Browse repository at this point
Copy the full SHA 87d0849View commit details -
Configuration menu - View commit details
-
Copy full SHA for 92b4b0b - Browse repository at this point
Copy the full SHA 92b4b0bView commit details -
Add CI slack notifications. (NVIDIA#1961)
* Fix divide by zero in `parse-job-times.py` * Add CI slack notifications.
Configuration menu - View commit details
-
Copy full SHA for 56d99db - Browse repository at this point
Copy the full SHA 56d99dbView commit details -
Allow nightly workflow to be manually invoked. (NVIDIA#2007)
[skip-rapids][skip-vdc][skip-matrix][skip-docs]
Configuration menu - View commit details
-
Copy full SHA for fc457b4 - Browse repository at this point
Copy the full SHA fc457b4View commit details -
Need to use a different approach to reuse secrets in reusable workflo…
…ws vs. actions. (NVIDIA#2008) [skip-matrix][skip-vdc][skip-docs][skip-rapids]
Configuration menu - View commit details
-
Copy full SHA for eb62dc6 - Browse repository at this point
Copy the full SHA eb62dc6View commit details
Commits on Jul 19, 2024
-
Enable RAPIDS builds for manually dispatched workflows. (NVIDIA#2009)
[skip-rapids][skip-vdc][skip-matrix][skip-docs]
Configuration menu - View commit details
-
Copy full SHA for 97e699f - Browse repository at this point
Copy the full SHA 97e699fView commit details -
Configuration menu - View commit details
-
Copy full SHA for 2ff83a2 - Browse repository at this point
Copy the full SHA 2ff83a2View commit details -
Add github token to nightly workflow-results action. (NVIDIA#2012)
This is needed to fetch the runtime info from the GHA API. This only modifies the nightly workflow, the PR tests are unaffected: [skip-matrix][skip-vdc][skip-rapids][skip-docs]
Configuration menu - View commit details
-
Copy full SHA for 8a5e56a - Browse repository at this point
Copy the full SHA 8a5e56aView commit details
Commits on Jul 20, 2024
-
Configuration menu - View commit details
-
Copy full SHA for e5fcebe - Browse repository at this point
Copy the full SHA e5fcebeView commit details
Commits on Jul 22, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 496d88d - Browse repository at this point
Copy the full SHA 496d88dView commit details -
Configuration menu - View commit details
-
Copy full SHA for e61bafe - Browse repository at this point
Copy the full SHA e61bafeView commit details -
Spell value initialization where used by thrust vectors (NVIDIA#1990)
Co-authored-by: Georgii Evtushenko <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for b8116c3 - Browse repository at this point
Copy the full SHA b8116c3View commit details -
Do no redefine
__ELF__
macro (NVIDIA#2018)We currently employ a workaround against an nvc++ bug where `__ELF__` is not properly defined. However, we should not define that macro if it is already present. Fixes [BUG]: incompatible redefinition of macro "__ELF__" with NVC++ host compiler NVIDIA#1995
Configuration menu - View commit details
-
Copy full SHA for 1b16af7 - Browse repository at this point
Copy the full SHA 1b16af7View commit details
Commits on Jul 23, 2024
-
Port
thrust::merge[_by_key]
to CUB (NVIDIA#1817)* Refactor thrust/CUB merge * Port thurst::merge[_by_key] to cub::DeviceMerge Fixes NVIDIA#1763 Co-authored-by: Georgii Evtushenko <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 8635429 - Browse repository at this point
Copy the full SHA 8635429View commit details -
Configuration menu - View commit details
-
Copy full SHA for 53fe08f - Browse repository at this point
Copy the full SHA 53fe08fView commit details
Commits on Jul 24, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 18cd90f - Browse repository at this point
Copy the full SHA 18cd90fView commit details -
Add tests to ensure that we properly propagate common_type for comple…
…x types (NVIDIA#2025) Addresses NVIDIA#2013
Configuration menu - View commit details
-
Copy full SHA for f6d3d0b - Browse repository at this point
Copy the full SHA f6d3d0bView commit details -
Configuration menu - View commit details
-
Copy full SHA for a69c8ac - Browse repository at this point
Copy the full SHA a69c8acView commit details -
Configuration menu - View commit details
-
Copy full SHA for 82a3ed0 - Browse repository at this point
Copy the full SHA 82a3ed0View commit details
Commits on Jul 25, 2024
-
Avoid ADL issues with
thrust::distance
(NVIDIA#2053)We almost always pull in `cuda::std::distance`, which would be ambiguous with `thrust::distance` if the used type pulls in `cuda::std::` within a thrust algorithm
Configuration menu - View commit details
-
Copy full SHA for 46759c5 - Browse repository at this point
Copy the full SHA 46759c5View commit details -
Simplify thrust::detail::wrapped_function (NVIDIA#2019)
Co-authored-by: Michael Schellenberger Costa <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for e25344c - Browse repository at this point
Copy the full SHA e25344cView commit details -
Add a test for Thrust scan with non-commutative op (NVIDIA#2024)
* Add a test for Thrust scan with non-commutative op * Fix printing mismatching sequences of non-addable types before C++17 in Thrust unit tests
Configuration menu - View commit details
-
Copy full SHA for 5ba23b6 - Browse repository at this point
Copy the full SHA 5ba23b6View commit details -
Update memory_resource docs (NVIDIA#1883)
Move the docs to rst format and also fix some minor issues in the documentation Co-authored-by: Bernhard Manfred Gruber <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 30eaa9c - Browse repository at this point
Copy the full SHA 30eaa9cView commit details -
Temporarily switch nightly H100 CI to build-only. (NVIDIA#2060)
These are on a testing pool and the machines are currently experiencing infra failures.
Configuration menu - View commit details
-
Copy full SHA for 04db77a - Browse repository at this point
Copy the full SHA 04db77aView commit details -
Do not rely on conversions between float and extended floating point …
…types (NVIDIA#2046) The issue we have is that our tests rely extensively on those conversions which makes it incredibly painfull to test
Configuration menu - View commit details
-
Copy full SHA for 1797742 - Browse repository at this point
Copy the full SHA 1797742View commit details
Commits on Jul 26, 2024
-
experimental wrapper types for
cudaEvent_t
that provide a modern C+……+ interface. (NVIDIA#2017) * Wrapper types for `cudaEvent_t` that provide a modern C++ interface. * `cuda::experimental::event_ref` is a non-owning wrapper around a `cudaEvent_t`. * `cuda::experimental::event` is an owning wrapper around a `cudaEvent_t`. * `cuda::experimental::timed_event` is a `cuda::experimental::event` that also records the time at which it was recorded. * apparently `__event` is a word of power for msvc * represent the elapsed time between two events with nanoseconds instead of microsoconds according to the CUDA docs for `cudaEventElapsedTime`, the elapsed time has sub-microsecond resolution, so it is more appropriate to represent it in nanoseconds. * prune unused headers, switch to rst-friendly doxygen comment style * add class synopsis comments * construct with a stream_ref and record the event on construction * review feedback * tests for `cudax::event` and `cudax::timed_event` * change `event_ref::wait` to use `cudaEventSynchronize` * Use a struct for windows instead * Do not include superfluous config header * Add clang-format rule for cudax * Spell `cudax_add_catch2_test` correctly * Fix formatting issues --------- Co-authored-by: Michael Schellenberger Costa <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for a4cd52e - Browse repository at this point
Copy the full SHA a4cd52eView commit details -
[CUDAX] Add a dummy device struct for now (NVIDIA#2066)
* Add dummy device struct * add `__scoped_device` for changing and restoring the current device --------- Co-authored-by: Michael Schellenberger Costa <[email protected]> Co-authored-by: Eric Niebler <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for c60d687 - Browse repository at this point
Copy the full SHA c60d687View commit details -
Allow (somewhat) different input value types for merge (NVIDIA#2075)
* Add a cuDF inspired test for merge_by_key * Allow CUB MergePath to support iterators with different value types * Allow different input value types for merge, as long as they are convertible to the value type of the first iterator. This weakens the publicly documented guarantees of equal value types to restore the old behavior of the thrust implementation replaced in NVIDIA#1817.
Configuration menu - View commit details
-
Copy full SHA for 4b9de3b - Browse repository at this point
Copy the full SHA 4b9de3bView commit details -
Avoid
::result_type
for partial sums in TBB reduce_by_key (NVIDIA#1998) This allows us to get rid of partial_sum_type, which still uses the C++11-deprecated function object API ::result_type. Co-authored-by: Georgii Evtushenko <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for b761538 - Browse repository at this point
Copy the full SHA b761538View commit details
Commits on Jul 27, 2024
-
Configuration menu - View commit details
-
Copy full SHA for a8db0a9 - Browse repository at this point
Copy the full SHA a8db0a9View commit details -
Configuration menu - View commit details
-
Copy full SHA for 94c86b6 - Browse repository at this point
Copy the full SHA 94c86b6View commit details -
Configuration menu - View commit details
-
Copy full SHA for ad57b1e - Browse repository at this point
Copy the full SHA ad57b1eView commit details -
Configuration menu - View commit details
-
Copy full SHA for 59d7a4b - Browse repository at this point
Copy the full SHA 59d7a4bView commit details
Commits on Jul 29, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 8a185fe - Browse repository at this point
Copy the full SHA 8a185feView commit details
Commits on Jul 30, 2024
-
add support to
cudax::device
for querying a device's attributes (NV……IDIA#2084) this commit encodes all of the device attributes as query objects. whereas the CUDA runtime has `cudaDevAttrMaxThreadsPerBlock`, cudax has `device::attrs::max_threads_per_block`. Querying a device looks like: ``` device dev0(0); auto tpb = dev0.attr(device::attrs::max_threads_per_block); ``` this syntax also works, which is intended to help people already familiar with the C-style interface: ``` device dev0(0); auto tpb = dev0.attr<::cudaDevAttrMaxThreadsPerBlock>(); ``` the attribute tags are convertible to `::cudaDeviceAttr`, so that `device::attrs::max_threads_per_block == ::cudaDevAttrMaxThreadsPerBlock`. some effort has been made to make the query return types correct. for instance, `dev0.attr(device::attrs::managed_memory)` returns a `bool`, and `dev0.attr(device::attrs::compute_mode)` returns a value from the `::cudaComputeMode` enumeration. the attributes that return enumerations provide convenience aliases for the enum values. so this: ``` if (dev0.attr<::cudaDevAttrComputeMode>() == ::cudaComputeModeDefault) ``` is equivalent to: ``` if (dev0.attr(device::attrs::compute_mode) == device::attrs::compute_mode._default) ``` closes NVIDIA#2083 --------- Co-authored-by: pciolkosz <[email protected]> Co-authored-by: Michael Schellenberger Costa <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 731c84c - Browse repository at this point
Copy the full SHA 731c84cView commit details -
[CUDAX] Add experimental owning abstraction for cudaStream_t (NVIDIA#…
…2093) * construct with a stream_ref and record the event on construction --------- Co-authored-by: Eric Niebler <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 15e2ce0 - Browse repository at this point
Copy the full SHA 15e2ce0View commit details -
Do not query NVRTC for cuda runtime header (NVIDIA#2102)
This may fail as discussed in NVIDIA#2095 Fixes 2095
Configuration menu - View commit details
-
Copy full SHA for 1e67aa7 - Browse repository at this point
Copy the full SHA 1e67aa7View commit details -
Configuration menu - View commit details
-
Copy full SHA for 6dfc8dd - Browse repository at this point
Copy the full SHA 6dfc8ddView commit details -
Improve binary function objects and replace thrust implementation (NV…
…IDIA#1872) * Improve binary function objects and replace thrust implementation * simplify use of ::cuda::std binary_function_objects * Replace _CCCL_CONSTEXPR_CXX14 with constexpr in all libcudacxx binary function objects that are imported in thrust. * Determine partial sum type without ::result_type * Ignore _LIBCUDACXX_DEPRECATED_IN_CXX11 for doxygen Co-authored-by: Bernhard Manfred Gruber <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 4188fb0 - Browse repository at this point
Copy the full SHA 4188fb0View commit details -
Configuration menu - View commit details
-
Copy full SHA for d92ef23 - Browse repository at this point
Copy the full SHA d92ef23View commit details -
Add script to update RAPIDS version. (NVIDIA#2082)
* Add script to update RAPIDS version. * Update to 24.10.
Configuration menu - View commit details
-
Copy full SHA for d4f928e - Browse repository at this point
Copy the full SHA d4f928eView commit details -
Update bad links (NVIDIA#2080)
* fix broken links * revert repo.toml * linkchecker fixes * fix .cuh errors * lint
Configuration menu - View commit details
-
Copy full SHA for ce95739 - Browse repository at this point
Copy the full SHA ce95739View commit details -
Configuration menu - View commit details
-
Copy full SHA for c0cfbd0 - Browse repository at this point
Copy the full SHA c0cfbd0View commit details
Commits on Jul 31, 2024
-
Add internal wrapper for cuda driver APIs (NVIDIA#2070)
* Add a header to interact with driver APIs * Add a test for the driver API interaction * Format * Fix formatting
Configuration menu - View commit details
-
Copy full SHA for 7a3dae7 - Browse repository at this point
Copy the full SHA 7a3dae7View commit details -
Use
common_type
for complexpow
(NVIDIA#1800)* Use `common_type` for complex `pow` Previously we would rely on our internal `__promote` function. However, that could have surprising results, e.g. `pow(complexy<float>, int)` would return `complex<double>` With C++23, this situation got clarified and we should use `common_type` to determine the return type.
Configuration menu - View commit details
-
Copy full SHA for 694e963 - Browse repository at this point
Copy the full SHA 694e963View commit details -
rename
device
todevice_ref
, add immovabledevice
as a place to…… cache properties (NVIDIA#2110)
Configuration menu - View commit details
-
Copy full SHA for a2a3824 - Browse repository at this point
Copy the full SHA a2a3824View commit details -
Use the float flavors of the cmath functions in the extended floating…
… point fallbacks (NVIDIA#2106) Fixes NVIDIA#2078
Configuration menu - View commit details
-
Copy full SHA for bddcd20 - Browse repository at this point
Copy the full SHA bddcd20View commit details -
[PoC]: Implement
cuda::experimental::uninitialized_buffer
(NVIDIA#1831) * Drop `cuda::get_property` CPO It serves no purpose as it only ever forwards via ADL and also breaks older nvcc * Ensure that we test memory resources * Implement `cuda::uninitialized_buffer` `cuda::uninitialized_buffer` provides an allocation of `N` elements of type `T` utilitzing a `cuda::mr::resource` to allocate the storage. `cuda::uninitialized_buffer` takes care of alignment and deallocation of the storage. The user is required to ensure that the lifetime of the memory resource exceeds the lifetime of the buffer.
Configuration menu - View commit details
-
Copy full SHA for 27253d7 - Browse repository at this point
Copy the full SHA 27253d7View commit details -
Configuration menu - View commit details
-
Copy full SHA for 2600135 - Browse repository at this point
Copy the full SHA 2600135View commit details
Commits on Aug 1, 2024
-
Ensure that
cuda_memory_resource
allocates memory on the proper dev……ice (NVIDIA#2073) * Ensure that `cuda_memory_resource` allocates memory on the proper device * Move `__ensure_current_device` to own header
Configuration menu - View commit details
-
Copy full SHA for 39b926a - Browse repository at this point
Copy the full SHA 39b926aView commit details -
Clarify compatibility wrt. template specializations (NVIDIA#2138)
* Clarify compatibility wrt. template specializations We do not want users to specialize arbitrary templates in CCCL unless otherwise stated. This PR makes this clear in the README.md. Co-authored-by: Michael Schellenberger Costa <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for ce4b904 - Browse repository at this point
Copy the full SHA ce4b904View commit details -
Configuration menu - View commit details
-
Copy full SHA for fadb135 - Browse repository at this point
Copy the full SHA fadb135View commit details -
Make
cuda::std::tuple
trivially copyable (NVIDIA#2127)* Make `cuda::std::tuple` trivially copyable This is similar to the situation with `cuda::std::pair` We have a lot of users that rely on types being trivially copyable, so that they can utilize memcpy and friends. Previously, `cuda::std::tuple` did not satisfy this because it needs to handle reference types. Given that we already specialize `__tuple_leaf` depending on whether the class is empty or not, we can simply add a third specialization that handles the trivially copyable types and one that synthesizes assignment. Co-authored-by: Bernhard Manfred Gruber <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 4634d81 - Browse repository at this point
Copy the full SHA 4634d81View commit details -
Fix missing copy of docs artifacts (NVIDIA#2162)
Also fix typo in the link
Configuration menu - View commit details
-
Copy full SHA for cc0b3d1 - Browse repository at this point
Copy the full SHA cc0b3d1View commit details -
Configuration menu - View commit details
-
Copy full SHA for cbe01b0 - Browse repository at this point
Copy the full SHA cbe01b0View commit details -
Fix g++-14 warning on uninitialized copying (NVIDIA#2157)
``` In function bool cuda::std::__4::__dispatch_memmove(_Up*, _Tp*, size_t) ... error: *(unsigned char*)(&privatized_decode_op[0]) may be used uninitialized [-Werror=maybe-uninitialized] ... *(unsigned char*)(&privatized_decode_op[0]) was declared here 1528 | PrivatizedDecodeOpT privatized_decode_op[NUM_ACTIVE_CHANNELS]{}; ```
Configuration menu - View commit details
-
Copy full SHA for 02378eb - Browse repository at this point
Copy the full SHA 02378ebView commit details
Commits on Aug 2, 2024
-
Fix flakey heterogeneous tests (NVIDIA#2085)
* Fix flakey heterogeneous tests by ensuring only *one* writer exists in parallel between H/D * Fixup copy paste mistake * Make host atomics simpler by removing the ugly alignment type * Fix deadlocks introduced into barrier/semaphore tests * Revert removing hacky atomic wrapping stuff * Fix unused warning bug in GCC-6
Configuration menu - View commit details
-
Copy full SHA for cba0345 - Browse repository at this point
Copy the full SHA cba0345View commit details -
Fix multiple definition of InclusiveScanKernel (NVIDIA#2169)
``` Linking CXX executable bin/cub.cpp14.catch2_test.lid_0 FAILED: bin/cub.cpp14.catch2_test.lid_0 ... /usr/bin/ld: cub/test/CMakeFiles/cub.cpp14.test.warp_scan_api.dir/catch2_test_warp_scan_api.cu.o: in function `InclusiveScanKernel(int*)': /usr/local/cuda-12.7/targets/x86_64-linux/include/nvtx3/nvtxDetail/nvtxInitDefs.h:473: multiple definition of `InclusiveScanKernel(int*)'; cub/test/CMakeFiles/cub.cpp14.test.block_scan_api.dir/catch2_test_block_scan_api.cu.o:/usr/local/cuda-12.7/targets/x86_64-linux/include/nvtx3/nvtxDetail/nvtxInitDefs.h:468: first defined here collect2: error: ld returned 1 exit status ```
Configuration menu - View commit details
-
Copy full SHA for 24ed47d - Browse repository at this point
Copy the full SHA 24ed47dView commit details
Commits on Aug 3, 2024
-
[CUDAX] Add a global constexpr
cudax::devices
range for all devices…… in the system (NVIDIA#2100) * add `cuda::devices` vector the number of cuda devices can be determined by calling `cuda::devices.size()`. `cuda::devices` is a range of `cuda::device` objects.
Configuration menu - View commit details
-
Copy full SHA for a8ca75c - Browse repository at this point
Copy the full SHA a8ca75cView commit details -
Configuration menu - View commit details
-
Copy full SHA for d0254e4 - Browse repository at this point
Copy the full SHA d0254e4View commit details
Commits on Aug 5, 2024
-
Configuration menu - View commit details
-
Copy full SHA for a903dc6 - Browse repository at this point
Copy the full SHA a903dc6View commit details -
Fix trivial_copy_device_to_device execution space (NVIDIA#2164)
* Fix trivial_copy_device_to_device execution space * Typo * Format * Extra empty line
Configuration menu - View commit details
-
Copy full SHA for 9459e4a - Browse repository at this point
Copy the full SHA 9459e4aView commit details -
Configuration menu - View commit details
-
Copy full SHA for c65a965 - Browse repository at this point
Copy the full SHA c65a965View commit details -
Configuration menu - View commit details
-
Copy full SHA for e519f25 - Browse repository at this point
Copy the full SHA e519f25View commit details -
Fix the
clang-format
path in the devcotnainers (NVIDIA#2194)In the devcontainers `clang-format` is now installed into `/usr/bin/clang-format`
Configuration menu - View commit details
-
Copy full SHA for fe27d99 - Browse repository at this point
Copy the full SHA fe27d99View commit details -
Mount a build directory for CCCL projects if WSL is detected (NVIDIA#…
…2035) Co-authored-by: Michael Schellenberger Costa <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for d1e7c1c - Browse repository at this point
Copy the full SHA d1e7c1cView commit details
Commits on Aug 6, 2024
-
2118 [CUDAX] Change the RAII device swapper to use driver API and add…
… it in places where it was missing (NVIDIA#2192) * Change __scoped_device to use driver API * Switch to use driver API based dev setter * Remove constexpr from operator device() * Fix comments and includes * Fallback to non-versioned get entry point pre 12.5 We need to use versioned version to get correct cuStreamGetCtx. There is v2 version of it in 12.5, fortunatelly the versioned get entry point is available there too * Fix unused local variable * Fix warnings in ensure_current_device test * Move ensure current device out of detail * Add LIBCUDACXX_ENABLE_EXCEPTIONS to tests cmake
Configuration menu - View commit details
-
Copy full SHA for 75929cb - Browse repository at this point
Copy the full SHA 75929cbView commit details -
Fix singular vs plural typo in thread scope documentation. (NVIDIA#2198)
* Fix singular vs plural typo in thread scope documentation. * Better grammar fix.
Configuration menu - View commit details
-
Copy full SHA for 1b6dbd4 - Browse repository at this point
Copy the full SHA 1b6dbd4View commit details -
[CUDAX] fixing some minor issues with device attribute queries (NVIDI…
…A#2183) * [cudax] give the `cudaDevAttrMemoryPoolSupportedHandleTypes` attribute the correct type * move attribute definitions from `device_ref` to `device`
Configuration menu - View commit details
-
Copy full SHA for 2db4fa7 - Browse repository at this point
Copy the full SHA 2db4fa7View commit details
Commits on Aug 7, 2024
-
Integrate Python docs (NVIDIA#2196)
* pass docs build options to repo.sh * Integrate Python docs * update CI * Apply suggestions from code review Co-authored-by: Georgii Evtushenko <[email protected]> --------- Co-authored-by: Georgii Evtushenko <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for b0e09d0 - Browse repository at this point
Copy the full SHA b0e09d0View commit details -
[FEA] Atomics codegen refactor (NVIDIA#1993)
* Initial draft of new atomics backend * Change atomic fetch ops back to tag dispatch * Save wip * Add load/store and support for MMIO * Begin working on exch * Enable formatting exchange * Several signed-ness fixes * Make atomics ptx tests build. Lit tests are a WIP. * Fix load/store, some volatileness, and min/max * Formatting and enabled codegen in all builds * Make integral.pass.cpp pass * Make the rest of the atomics tests pass * Use 128b ld/st instead of vector load as it is not atomic across the whole atom * Fix copy-paste mistake in load/store * Whitespace fixup * Fix 128b .exch using .cas operands * Make codegen link fmt as PRIVATE Co-authored-by: Allison Piper <[email protected]> * Simplify MMIO down to a static array. Co-authored-by: Bernhard Manfred Gruber <[email protected]> * Static -> Inline for codegen functions. Replace endl with '\n'. * Supply the output stream directly to `fmt::format` * Update fmtlib. * Revert `fmt::format(out...)` changes. They don't work on MSVC. * Fixup libcudacxx codegen CMake stuff * Remove sneaky cstdef include that was auto-added * [pre-commit.ci] auto code formatting --------- Co-authored-by: Allison Piper <[email protected]> Co-authored-by: Bernhard Manfred Gruber <[email protected]> Co-authored-by: Michael Schellenberger Costa <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Configuration menu - View commit details
-
Copy full SHA for 62336ad - Browse repository at this point
Copy the full SHA 62336adView commit details -
[CUDAX] add
__launch_transform
to transform arguments to `cudax::la……unch` prior to launching the kernel (NVIDIA#2202) * add `__launch_transform` to transform arguments to `cudax::launch` prior to launching the kernel
Configuration menu - View commit details
-
Copy full SHA for 47b8f5c - Browse repository at this point
Copy the full SHA 47b8f5cView commit details
Commits on Aug 8, 2024
-
Cleanup common testing headers and correct asserts in launch testing (N…
…VIDIA#2204) * Cleanup common testing headers * Add test/common to cmake and fix formatting
Configuration menu - View commit details
-
Copy full SHA for 39fd05e - Browse repository at this point
Copy the full SHA 39fd05eView commit details -
[CUDAX] Add an API to get device_ref from stream and add comparison o…
…perator to device_ref (NVIDIA#2203) * Add a way to compare device_refs * Add a way to query device_ref from a stream * Fix Windows missing cast * Apply suggestions from code review Co-authored-by: Michael Schellenberger Costa <[email protected]> * Disallow device comparision with int --------- Co-authored-by: Michael Schellenberger Costa <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for c9a7b6a - Browse repository at this point
Copy the full SHA c9a7b6aView commit details -
Update devcontainer docs for WSL (NVIDIA#2200)
* Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * [pre-commit.ci] auto code formatting * Why was 6 afraid of 7? --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Configuration menu - View commit details
-
Copy full SHA for 3ebf8cc - Browse repository at this point
Copy the full SHA 3ebf8ccView commit details
Commits on Aug 9, 2024
-
add
cudax::distribute<threadsPrBlock>(numElements)
as a way to even……ly distribute elements over thread blocks (NVIDIA#2210)
Configuration menu - View commit details
-
Copy full SHA for f95f211 - Browse repository at this point
Copy the full SHA f95f211View commit details -
Rework mdspan concept emulation (NVIDIA#2213)
It is proving difficult to handle for msvc and also the one we are using in libcu++ it much cleaner Gets NVIDIA#2160 compiling on MSVC
Configuration menu - View commit details
-
Copy full SHA for 8e20c9a - Browse repository at this point
Copy the full SHA 8e20c9aView commit details -
Un-doc functions taking debug_synchronous (NVIDIA#2209)
* undoc functions taking debug_synchronous
Configuration menu - View commit details
-
Copy full SHA for 7473934 - Browse repository at this point
Copy the full SHA 7473934View commit details -
CUDA
vector_add
sample project (NVIDIA#2160)--------- Co-authored-by: pciolkosz <[email protected]> Co-authored-by: Michael Schellenberger Costa <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for a3a5f9c - Browse repository at this point
Copy the full SHA a3a5f9cView commit details
Commits on Aug 12, 2024
-
avoid constraint recursion in the
resource
concept (NVIDIA#2215)drive-by: avoid potential overload ambiguity in `__launch_transform`
Configuration menu - View commit details
-
Copy full SHA for 6ee3415 - Browse repository at this point
Copy the full SHA 6ee3415View commit details
Commits on Aug 13, 2024
-
Configuration menu - View commit details
-
Copy full SHA for aaf1340 - Browse repository at this point
Copy the full SHA aaf1340View commit details -
Fix including
<complex>
when bad CUDA bfloat/half macros are used. (N……VIDIA#2226) * Add <complex> test for bad macros being defined * Fix <complex> failing upon inclusion when bad macros are defined * Rather use explicit specializations and some evil hackery to get the complex interop to work * Fix typos * Inline everything * Move workarounds together * Use conversion functions instead of explicit specializations * Drop unneeded conversions --------- Co-authored-by: Michael Schellenberger Costa <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 098fb29 - Browse repository at this point
Copy the full SHA 098fb29View commit details -
Configuration menu - View commit details
-
Copy full SHA for d7c83fe - Browse repository at this point
Copy the full SHA d7c83feView commit details
Commits on Aug 14, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 64d28d1 - Browse repository at this point
Copy the full SHA 64d28d1View commit details -
Configuration menu - View commit details
-
Copy full SHA for 6213a5e - Browse repository at this point
Copy the full SHA 6213a5eView commit details -
Configuration menu - View commit details
-
Copy full SHA for 2e44b2c - Browse repository at this point
Copy the full SHA 2e44b2cView commit details -
Configuration menu - View commit details
-
Copy full SHA for 352638b - Browse repository at this point
Copy the full SHA 352638bView commit details -
Configuration menu - View commit details
-
Copy full SHA for dded5f1 - Browse repository at this point
Copy the full SHA dded5f1View commit details -
Configuration menu - View commit details
-
Copy full SHA for 1981c49 - Browse repository at this point
Copy the full SHA 1981c49View commit details -
Fix ForEachCopyN for non-contiguous iterators (NVIDIA#2220)
By falling back to a non-load-vectorizing code path. Fixes: NVIDIA#2207
Configuration menu - View commit details
-
Copy full SHA for 73df2b0 - Browse repository at this point
Copy the full SHA 73df2b0View commit details -
Configuration menu - View commit details
-
Copy full SHA for cbce14b - Browse repository at this point
Copy the full SHA cbce14bView commit details
Commits on Aug 15, 2024
-
Allow installing components when downstream (NVIDIA#2096)
In the @acts-project we adopt an (admittedly somewhat unconventional) build system in which software A depends on B, and B depends on CCCL. The setup is that we want to install B into a prefix, and then try to build A against B. The problem arises is that we are using CMake to dynamically fetch CCCL using the so-called "FetchContent" mechanism, which downloads CCCL and then adds it as a subdirectory. The core problem is that installing software B which has included CCCL does not actually install CCCL in the same prefix, so software A cannot then load software B as CCCL is not installed. The reason this happens is that CMakeLists.txt:28 (at the time of writing) returns from the CMake configuration stage early, and leaves the CUB, Thrust, and libcudacxx directories unincluded (see lines 70 to 72). Although this is, again, an unconventional and rare scenario, it should be easy to add support for this kind of build, and I hope the CCCL devs would agree that it might be worth doing. In this commit, I remove the early return and replace it with additional if-statements. This commit should leave any existing workflows completely untouched, but should make it easier to use CCCL in the way we do in @acts-project.
Configuration menu - View commit details
-
Copy full SHA for e423412 - Browse repository at this point
Copy the full SHA e423412View commit details -
Configuration menu - View commit details
-
Copy full SHA for 532ff47 - Browse repository at this point
Copy the full SHA 532ff47View commit details
Commits on Aug 16, 2024
-
Fix and simplify <bit> (NVIDIA#2197)
* Fix and simplify <bit> * Make logic for non-constant evaluation simpler in C++14 and greater in <bit> * Remove use of `std::` in `<bit>` Co-authored-by: Michael Schellenberger Costa <[email protected]> * Change bitops tests to prevent constant folding of runtime checks * Move bit and split implementation details from main header * Remove volatile from tests in bitops * Make Windows happy by using `unsigned long` * Work around being unable to use {} in c++ constexpr functions * Add a 'default to constexpr' interpretation of is_constant_evaluated for internal use in bitops * Make windows happy by reusing the default to constexpr hack * Make bitops tests definitely actually do runtime * Move <bit> fallbacks into relevant headers * Fix fallbacks being guarded by MSVC ifdef. * Keep the license --------- Co-authored-by: Michael Schellenberger Costa <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 16d4fd3 - Browse repository at this point
Copy the full SHA 16d4fd3View commit details -
Configuration menu - View commit details
-
Copy full SHA for fed3ec1 - Browse repository at this point
Copy the full SHA fed3ec1View commit details -
Make
cuda::std::min
constexpr in C++11 (NVIDIA#2249)This should fix our rmm builds
Configuration menu - View commit details
-
Copy full SHA for 4a5dcc4 - Browse repository at this point
Copy the full SHA 4a5dcc4View commit details -
Configuration menu - View commit details
-
Copy full SHA for ba9e9bb - Browse repository at this point
Copy the full SHA ba9e9bbView commit details
Commits on Aug 19, 2024
-
Workaround GCC 13 issue with empty histogram decoder op (NVIDIA#2252)
* Workaround GCC 13 issue * Update cub/cub/device/dispatch/dispatch_histogram.cuh Co-authored-by: Michael Schellenberger Costa <[email protected]> --------- Co-authored-by: Michael Schellenberger Costa <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 51c1b22 - Browse repository at this point
Copy the full SHA 51c1b22View commit details
Commits on Aug 20, 2024
-
Configuration menu - View commit details
-
Copy full SHA for da9b7dd - Browse repository at this point
Copy the full SHA da9b7ddView commit details -
Configuration menu - View commit details
-
Copy full SHA for f871aeb - Browse repository at this point
Copy the full SHA f871aebView commit details -
Add tests for transform_iterator's reference type (NVIDIA#2221)
* Fix typo * Add tests for transform_iterator's reference type
Configuration menu - View commit details
-
Copy full SHA for 38d5787 - Browse repository at this point
Copy the full SHA 38d5787View commit details -
Small tuning script output improvements (NVIDIA#2262)
* Report mismatched values in first_val * Improve output while search.py is running
Configuration menu - View commit details
-
Copy full SHA for c92e8d4 - Browse repository at this point
Copy the full SHA c92e8d4View commit details -
Fix Thrust::vector ctor selection for int,int (NVIDIA#2261)
thrust::device_vector<int> v(5, 10) should create a vector with 5 integers of value 10, and not attempt the iterator pair constructor.
Configuration menu - View commit details
-
Copy full SHA for 7bec0ce - Browse repository at this point
Copy the full SHA 7bec0ceView commit details
Commits on Aug 21, 2024
-
Adds support for large number of items to
DeviceScan
(NVIDIA#2171)* make DeviceScan offset type a template parameter * updates tests to use device interface * moves thrust scan to unsigned offset types * adjusts benchmarks to account for used offset types * uses dynamic dispatch to unsigned type * adds tparam docs for NumItemsT * fixes warning about different signedness comparison * adds check for negative num_items in thrust::scan * fixes unused param in is_negative
Configuration menu - View commit details
-
Copy full SHA for 06e334f - Browse repository at this point
Copy the full SHA 06e334fView commit details -
Use/Test radix sort for int128, half, bfloat16 in Thrust (NVIDIA#2168)
int128 was already working but not covered by a test.
Configuration menu - View commit details
-
Copy full SHA for 1e1af8d - Browse repository at this point
Copy the full SHA 1e1af8dView commit details -
Implement C API for device reduction (NVIDIA#2256)
* Implement C device reduce * Format * Fix device-specific module loading * Don't need a context at build step now * Address review feedback
Configuration menu - View commit details
-
Copy full SHA for 5a4881b - Browse repository at this point
Copy the full SHA 5a4881bView commit details -
Move cooperative module (NVIDIA#2269)
* Move cuda/cooperative to cuda_cooperative * Update paths to cooperative module * Fix literal include paths
Configuration menu - View commit details
-
Copy full SHA for 2c1080d - Browse repository at this point
Copy the full SHA 2c1080dView commit details -
Move compiler version macros into libcu++ (NVIDIA#2250)
* Move compiler version macros into libcu++ * Drop some pre-C++11 code paths around thrust::complex
Configuration menu - View commit details
-
Copy full SHA for 529f910 - Browse repository at this point
Copy the full SHA 529f910View commit details
Commits on Aug 23, 2024
-
Configuration menu - View commit details
-
Copy full SHA for d62e979 - Browse repository at this point
Copy the full SHA d62e979View commit details
Commits on Aug 25, 2024
-
Adds
thrust::tabulate_output_iterator
(NVIDIA#2282)* adds tabulate output iterator * uses cccl exec space macros * addresses review comments * fixes documentation and example * moves to using alias template instead of member type
Configuration menu - View commit details
-
Copy full SHA for 0d0d2d3 - Browse repository at this point
Copy the full SHA 0d0d2d3View commit details -
Configuration menu - View commit details
-
Copy full SHA for a15adf3 - Browse repository at this point
Copy the full SHA a15adf3View commit details
Commits on Aug 26, 2024
-
Flatten forwarding headers (NVIDIA#2284)
* Flatten `<cuda/std/utility>` We do not need the indirection anymore * Flatten <cuda/std/iterator> We do not need the indirection anymore * Flatten `<cuda/std/expected>` * Flatten `<cuda/std/mdspan>` * Flatten `<cuda/std/ranges>` * Flatten `<cuda/std/__new_>` * Flatten `<cuda/std/cassert>` * Flatten `<cuda/std/initializer_list>` * Cleanup `<cuda/functional>` and `<cuda/std/functional>` Those headers dont need any of the deep nesting we had, so clean them up and simplify their dependencies * Flatten `<cuda/std/type_traits>` * Flatten `<cuda/std/__memory_>` * Flatten `<cuda/std/concepts>` * Drop unused `libcudacxx/include/cuda/std/detail/libcxx/include/ccomplex` * Flatten `<cuda/std/cfloat>` * Flatten `<cuda/std/version>` * Fix formatting issues? * Add missing include to `thrust/functional.h` Also fix the cyclic dependency with actor.h * Suppress MSVC warning
Configuration menu - View commit details
-
Copy full SHA for c1c1d96 - Browse repository at this point
Copy the full SHA c1c1d96View commit details -
2270 static compute capabilities queries (NVIDIA#2271)
* Architecture traits first draft * Comment * Add missing sm 80 and sm 60 arches * Add missing sm 90 * Move to 100*major+10*minor format and remove template arch_traits getter * Add Ada and cleanup some things * Move includes after system header pragma --------- Co-authored-by: Elias Stehle <[email protected]> Co-authored-by: Bernhard Manfred Gruber <[email protected]> Co-authored-by: Georgii Evtushenko <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 03247ab - Browse repository at this point
Copy the full SHA 03247abView commit details -
Configuration menu - View commit details
-
Copy full SHA for 9d4c3a8 - Browse repository at this point
Copy the full SHA 9d4c3a8View commit details
Commits on Aug 27, 2024
-
Implement
any_resource
, an owning wrapper around a memory resource (N……VIDIA#2266) * Implement `any_resource` an owning wrapper around any resource Addresses NVIDIA#1426 * Continue development of @miscco's `any_resource` * address review feedback * [pre-commit.ci] auto code formatting * mark all deallocation functions as `noexcept` * fix some test failures * more tests and bug fixes * fix more build breaks * attempt to fix the cudax docs build * exclude more symbols from the cudax docs * more portability fixes and doxygen tweaks * once more with feeling * getting pretty close now * fix broken test * deduplicate `basic_any_resource` constructors to satisfy doxygen * [pre-commit.ci] auto code formatting * don't use `if constexpr` when compiling as c++14 * more fixes for doxygen and c++14 * back out a questionable addition of `noexcept` * molify msvc * accommodate integer size differences on msvc * eliminate shadow warning treated as error * handle duplicate properties without triggering compiler warnings * [pre-commit.ci] auto code formatting --------- Co-authored-by: Michael Schellenberger Costa <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for f53e725 - Browse repository at this point
Copy the full SHA f53e725View commit details -
Configuration menu - View commit details
-
Copy full SHA for e8939e9 - Browse repository at this point
Copy the full SHA e8939e9View commit details -
use
NV_IF_TARGET
to conditionally compile CUDAX tests (NVIDIA#2297)* use `NV_IF_TARGET` to conditionally compile CUDAX tests * [pre-commit.ci] auto code formatting --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Configuration menu - View commit details
-
Copy full SHA for 92e006b - Browse repository at this point
Copy the full SHA 92e006bView commit details -
Make for_each compatible with NVRTC (NVIDIA#2288)
* Make for_each compatible with NVRTC * Whitespace fixup. * Apply suggestion to improve includes.
Configuration menu - View commit details
-
Copy full SHA for f80972b - Browse repository at this point
Copy the full SHA f80972bView commit details -
refactor cmake so more cudax samples can be easily added (NVIDIA#2296)
* refactor cmake so more cudax samples can be easily added * [pre-commit.ci] auto code formatting --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Configuration menu - View commit details
-
Copy full SHA for a5b0a23 - Browse repository at this point
Copy the full SHA a5b0a23View commit details -
Configuration menu - View commit details
-
Copy full SHA for dd90bed - Browse repository at this point
Copy the full SHA dd90bedView commit details -
Implement
std::bit_cast
(NVIDIA#2258)* Implement `std::bit_cast` This backport C++20 `std::bit_cast` to be available in all standard modes. As this requires compiler builtin support, we have a non-constexpr workaround with the usual memcpy implementation. Fixes NVIDIA#2257 * Add additional contraint in the fallback mode * Use bit_cast in cub * Formatting fix? * Fix typo
Configuration menu - View commit details
-
Copy full SHA for 0a1cddb - Browse repository at this point
Copy the full SHA 0a1cddbView commit details
Commits on Aug 28, 2024
-
Cleanup the
<cuda/std/bit>
header (NVIDIA#2299)* Move enum `endian` to its own file * Move `std::rotl` and `std::rotr` to their own file * Move `std::has_single_bit` to its own file * Move `countr_{one, zero}` to its ownn file * Move `countl_{one, zero}` to their own file * Move `bit_ceil`, `bit_floor` and `bit_width` to their own file * Cleanup the `<bit>` header
Configuration menu - View commit details
-
Copy full SHA for 490a20f - Browse repository at this point
Copy the full SHA 490a20fView commit details -
change
cudax::uninitialized_buffer
to own its memory resource with ……`cudax::any_resource` (NVIDIA#2293) * Implement `any_resource` an owning wrapper around any resource Addresses NVIDIA#1426 * Continue development of @miscco's `any_resource` * address review feedback * [pre-commit.ci] auto code formatting * mark all deallocation functions as `noexcept` * fix some test failures * more tests and bug fixes * fix more build breaks * attempt to fix the cudax docs build * exclude more symbols from the cudax docs * more portability fixes and doxygen tweaks * once more with feeling * getting pretty close now * fix broken test * deduplicate `basic_any_resource` constructors to satisfy doxygen * [pre-commit.ci] auto code formatting * don't use `if constexpr` when compiling as c++14 * more fixes for doxygen and c++14 * back out a questionable addition of `noexcept` * molify msvc * accommodate integer size differences on msvc * eliminate shadow warning treated as error * handle duplicate properties without triggering compiler warnings * change `uninitialized_buffer` to own its memory resource using `any_resource` * Use fully qualified name * Drop `__host__ __device__` from uninitialized_buffer * Revert "Drop `__host__ __device__` from uninitialized_buffer" This reverts commit 5115b08. * Just do the cursed thing * Add missing include * Adopt the doc string --------- Co-authored-by: Michael Schellenberger Costa <[email protected]> Co-authored-by: anon <users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Configuration menu - View commit details
-
Copy full SHA for 198208a - Browse repository at this point
Copy the full SHA 198208aView commit details -
Documentation typos (NVIDIA#2302)
* add segmented [radix] sort tests for 64-bit indices * Add test for device segmented sort pairs with 64-bit indices * Add a macro guard to protect tests that are not supported yet * Fix documentation typos * Remove unrelated files
Configuration menu - View commit details
-
Copy full SHA for ec5bd08 - Browse repository at this point
Copy the full SHA ec5bd08View commit details -
Add thrust::inclusive_scan with init_value support (NVIDIA#1940)
* Add thrust::inclusive_scan with init value sequential * Add thrust::inclusive_scan cuda par with init value * Add thrust::async::incluisve_scan with init value * Add thrust::inclusive_scan tbb with init value * Handle reviews * Consolidate init overloads into a single overload that accepts both init and binary_op * Fix formatting issues * Add cuda::std::accumulator_t and use it for value_type in scan algorithms * Redo Bernhard's work and consolidate the two tbb::inclusive_scan bodies * Handle final reviews * Replace cub::accumulator_t with cuda::std::__accumulator_t
Configuration menu - View commit details
-
Copy full SHA for e311e89 - Browse repository at this point
Copy the full SHA e311e89View commit details -
Configuration menu - View commit details
-
Copy full SHA for 942f59f - Browse repository at this point
Copy the full SHA 942f59fView commit details -
Add documentation for
any_resource
(NVIDIA#2309)* Add documentation for `any_resource` * Apply suggestions from code review Co-authored-by: Bernhard Manfred Gruber <[email protected]> --------- Co-authored-by: Bernhard Manfred Gruber <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 7d4be26 - Browse repository at this point
Copy the full SHA 7d4be26View commit details
Commits on Aug 29, 2024
-
Implement P0843
inplace_vector
(NVIDIA#1936)* Implement P0843 `inplace_vector` This implements `inplace_vector` a resizable container with a fixed capacity that stores its elements in a local array. Due to the fact that everything is local we are more or less save regarding host device issues, as long as users do not pass around references of it. The exception guarantees are not 100% clear yet, so I implemented them on a best effort basis. We might need to revisit what we guarantee in case of an exception.
Configuration menu - View commit details
-
Copy full SHA for eb87e56 - Browse repository at this point
Copy the full SHA eb87e56View commit details -
Cleanup
__config
and unify most visibility macros (NVIDIA#2285)* Unify definition of `_LIBCUDACXX_TYPE_VIS` * Unify definition of `_LIBCUDACXX_HIDDEN` * Unify definition of `_LIBCUDACXX_TEMPLATE_VIS` * Unify definition of `_LIBCUDACXX_ENUM_VIS` * Drop definition of `_LIBCUDACXX_EXCEPTION_ABI` * Drop definition of `_LIBCUDACXX_DLL_VIS` * Drop `_LIBCUDACXX_METHOD_TEMPLATE_IMPLICIT_INSTANTIATION_VIS` * Drop `_LIBCUDACXX_EXPORTED_FROM_ABI` in favor of `_CCCL_VISIBILITY_DEFAULT` * Drop `_LIBCUDACXX_CLASS_TEMPLATE_INSTANTIATION_VIS` * Drop `_LIBCUDACXX_EXTERN_TEMPLATE_TYPE_VIS` * Drop `_LIBCUDACXX_CRT_FUNC` * Drop `_LIBCUDACXX_OVERRIDABLE_FUNC_VIS` in favor of `_CCCL_HOST_DEVICE` * Drop _LIBCUDACXX_TYPE_VIS in favor of _CCCL_TYPE_VISIBILITY_DEFAULT * Drop `_LIBCUDACXX_HIDDEN` in favor of `_CCCL_VISIBILITY_HIDDEN` * Drop `_LIBCUDACXX_ENUM_VIS` in favor of `_CCCL_TYPE_VISIBILITY_DEFAULT` * Drop `_LIBCUDACXX_TEMPLATE_VIS` in favor of `_CCCL_TYPE_VISIBILITY_DEFAULT` * Drop a bunch of unused defines that we inherited from libc++ * Drop `_LIBCUDACXX_ALWAYS_INLINE` in favor of `_CCCL_ALWAYS_INLINE`
Configuration menu - View commit details
-
Copy full SHA for 10b0d2b - Browse repository at this point
Copy the full SHA 10b0d2bView commit details -
Add a fast, low memory "limited" mode to CUB testing. (NVIDIA#2317)
* Add CCCL_SEED_COUNT_OVERRIDE env var. This overrides the number of test sections generated by CUB_SEED. It is intended to be used for limiting to a single case for sanitizer runs, etc, but may be useful for fuzzing as well. * Add new env opts for CUB's check allocators. - `CCCL_DEVICE_MEMORY_LIMIT`: Limit the total memory allocations to this number of bytes. - `CCCL_DEBUG_CHECKED_ALLOC_FAILURES`: Print info about each checked alloc failure prior to throwing exceptions. * Add `limited` job for CUB. - Forces CUB_SEED to only generate a single test case, ignoring requested number of seeds. - Limits the total global device memory usage to 8 GiB. - Prints allocation details when checked allocator fails. * Add error checking to build scripts. * Add a `-configure` option to CI scripts that just configures the build and exits. * Add std for new cub job. This way it will reuse the same build job as the other CUB tests.
Configuration menu - View commit details
-
Copy full SHA for 11fc50b - Browse repository at this point
Copy the full SHA 11fc50bView commit details -
[CUDAX] Add event_ref::is_done() and update event tests (NVIDIA#2304)
* Add event_ref.is_done() and update event tests * Update cudax/include/cuda/experimental/__event/event_ref.cuh --------- Co-authored-by: Michael Schellenberger Costa <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for d862315 - Browse repository at this point
Copy the full SHA d862315View commit details -
Configuration menu - View commit details
-
Copy full SHA for e42d7b7 - Browse repository at this point
Copy the full SHA e42d7b7View commit details -
Configuration menu - View commit details
-
Copy full SHA for a7837d3 - Browse repository at this point
Copy the full SHA a7837d3View commit details
Commits on Aug 30, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 16096d4 - Browse repository at this point
Copy the full SHA 16096d4View commit details -
[CUDAX] Add compute_capability device attribute and handle arch_trait…
…s for future architectures (NVIDIA#2328) * Add combined compute capability query * Store arch traits in device object * Craft arch_traits for future architectures * Add missing return * Add missing noexcept * Add a comment describing compute_capability_t
Configuration menu - View commit details
-
Copy full SHA for a9fa9a1 - Browse repository at this point
Copy the full SHA a9fa9a1View commit details -
Disable exec checks on ranges CPOs (NVIDIA#2331)
* Disable exec checks on ranges CPOs We need those for the cudax containers coming in
Configuration menu - View commit details
-
Copy full SHA for 95c6ba9 - Browse repository at this point
Copy the full SHA 95c6ba9View commit details -
Enable exceptions by default (NVIDIA#2329)
* Use a negative error code for exit in terminate * Add a global CCCL configuration for disabling exceptions * Make the terminate test a runfail test We return a non-0 return code so... * We cannot fail a test that does not run
Configuration menu - View commit details
-
Copy full SHA for 206e745 - Browse repository at this point
Copy the full SHA 206e745View commit details -
Make the thrust dispatch mechanisms configurable (NVIDIA#2310)
* Make the thrust dispatch mechanisms configurable The current dispatch mechanisms trades compile time and binary size for performance and flexibility. Allow users to tune that depending on their needs Co-authored-by: Jake Hemstad <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 89702de - Browse repository at this point
Copy the full SHA 89702deView commit details -
Configuration menu - View commit details
-
Copy full SHA for a7996f0 - Browse repository at this point
Copy the full SHA a7996f0View commit details
Commits on Aug 31, 2024
-
Compiler version improvements (NVIDIA#2316)
* Remove Intel LLVM references * Remove IBM/XLC/PowerPC references * Add NVHPC version
Configuration menu - View commit details
-
Copy full SHA for bb6c7b1 - Browse repository at this point
Copy the full SHA bb6c7b1View commit details -
Fix hardcoding __THRUST_HOST_SYSTEM_NAMESPACE to cpp (NVIDIA#2341)
This change was erroneously introduced in 91b78d8 Fixes: NVIDIA#2098
Configuration menu - View commit details
-
Copy full SHA for 0a40182 - Browse repository at this point
Copy the full SHA 0a40182View commit details
Commits on Sep 2, 2024
-
Improvements to the Cuda Core C library infrastructure (NVIDIA#2336)
* Move towards libfmt, to allow different host compilers * Try and filter changes to Cuda C library * Update ci/inspect_changes.sh Co-authored-by: Leo Fang <[email protected]> * Revert "Move towards libfmt, to allow different host compilers" This reverts commit 3bd8d95. --------- Co-authored-by: Leo Fang <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 709ddec - Browse repository at this point
Copy the full SHA 709ddecView commit details
Commits on Sep 3, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 498251c - Browse repository at this point
Copy the full SHA 498251cView commit details -
[CUDAX] make
uninitialized_buffer
usable withlaunch
(NVIDIA#2342)* make `cudax::uninitialized_buffer` usable with `cudax::launch` * test passing a `const` `uninitialized_buffer` to `launch`
Configuration menu - View commit details
-
Copy full SHA for c6b777b - Browse repository at this point
Copy the full SHA c6b777bView commit details -
Configuration menu - View commit details
-
Copy full SHA for 5c6b6df - Browse repository at this point
Copy the full SHA 5c6b6dfView commit details -
Update Memory Model docs for HMM (NVIDIA#2272)
* Update Memory Model docs for HMM * Document p2p2NativeAtomicSupported --------- Co-authored-by: Michael Schellenberger Costa <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 4297b07 - Browse repository at this point
Copy the full SHA 4297b07View commit details -
Configuration menu - View commit details
-
Copy full SHA for 457e4d7 - Browse repository at this point
Copy the full SHA 457e4d7View commit details -
Harden thrust algorithms against evil iterators that overload `operat…
…or,` (NVIDIA#2349) We need to guard against such iterators in libcu++, so our tests conventionallly contain iterators that delete `operator,`. To allow using thrust with such iterators we need to add the void casts
Configuration menu - View commit details
-
Copy full SHA for 6b76188 - Browse repository at this point
Copy the full SHA 6b76188View commit details -
Avoid circular concept definition with memory resources (NVIDIA#2351)
We cannot constrain the hidden friend comparison functions because that would lead to a constrain recursion in the `resource` concept However, we actually do not need to do that because we can just rely on C++20 operator rewrite to avoid the hidden friend at all. In that case the non-template operator== takes precedence and all is fine
Configuration menu - View commit details
-
Copy full SHA for 707ee73 - Browse repository at this point
Copy the full SHA 707ee73View commit details -
add IWYU
export
pragma on config headers (NVIDIA#2352)When working in the devcontainers, the clangd/clang-format integration causes extra #includes to be added automatically. They get added because clang's "include what you use" tool (IWYU) doesn't consider transitive includes when determining what headers need to be included. This PR adds the export IWYU pragma to the #includes in CCCL's config headers so that a #include of <cuda/__cccl_config> or <cuda/std/detail/__config> will behave as if all of the transitive headers were included directly. This should be enough to keep IWYU from automatically adding headers it shouldn't.
Configuration menu - View commit details
-
Copy full SHA for a154e7b - Browse repository at this point
Copy the full SHA a154e7bView commit details
Commits on Sep 4, 2024
-
Add cuda_parallel to CI. (NVIDIA#2338)
* Add python cuda_parallel to CI. * Fix cuda.parallel installation order * WAR occupancy/V100 failure * Update cuda_cooperative readme --------- Co-authored-by: Georgy Evtushenko <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 1e9125e - Browse repository at this point
Copy the full SHA 1e9125eView commit details -
[CUDAX] Branch out an experimental version of stream_ref (NVIDIA#2343)
* Branch out experimental version of stream_ref * Add tests for the experimental part of stream_ref * Move inequality check * typo * Remove not needed using declaration * Add a TODO to remove NULL stream_ref * Remove TODO and remove NULL stream ref constructor * move runtime api include after the system header decl Co-authored-by: Michael Schellenberger Costa <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 0251ae4 - Browse repository at this point
Copy the full SHA 0251ae4View commit details -
Improve visibility macros for libcu++ (NVIDIA#2337)
* Refactor visibility handling for libcu++ We always want to apply `_CCCL_VISIBILITY_HIDDEN` to effectively all function definitions. This has been a general thorn in our side and also introduced a considerable amout of divergence from libc++. This moves the visibility definition into CCCL and then applies the libcu++ specific extras * Use `_CCCL_HIDE_FROM_ABI` for defaulted SMF * Drop `__MDSPAN_INLINE_FUNCTION` in favor of `_LIBCUDACXX_HIDE_FROM_ABI` * Work around nvcc-11.1 limitation * nvcc really does not like always inline * Placate ICC * Move `_LIBCUDACXX_HIDE_FROM_ABI` into visibility.h * Suppress visibility issues with ICC
Configuration menu - View commit details
-
Copy full SHA for dae826b - Browse repository at this point
Copy the full SHA dae826bView commit details -
Add missing cuKernelGetFunction call to reduce (NVIDIA#2355)
* Add missing cuKernelGetFunction call to reduce * Fix format * Move CUfunction getter just before occupancy query * fix format again
Configuration menu - View commit details
-
Copy full SHA for dcb7d51 - Browse repository at this point
Copy the full SHA dcb7d51View commit details -
Configuration menu - View commit details
-
Copy full SHA for 046a761 - Browse repository at this point
Copy the full SHA 046a761View commit details
Commits on Sep 5, 2024
-
fix the cudax
vector_add
sample (NVIDIA#2372)the cudax `vector_add` sample has not compiled since [0251ae4]. it was broken by PR NVIDIA#2343, which added a type `::cuda::experimental::stream_ref` distinct from `::cuda::stream_ref`. all unqualified mentions of `stream_ref` within the `cuda::experimental` namespace were made to refer to a different type. this causes problems in `cudax/samples/vector_add/vector.cuh` which has several unqualified uses of `stream_ref`.
Configuration menu - View commit details
-
Copy full SHA for 3876dcc - Browse repository at this point
Copy the full SHA 3876dccView commit details -
Add -Wmissing-field-initializers to cudax (NVIDIA#2373)
Change cudaLaunchConfig_t init to {} Co-authored-by: Michael Schellenberger Costa <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for af695d0 - Browse repository at this point
Copy the full SHA af695d0View commit details -
Configuration menu - View commit details
-
Copy full SHA for 05e019a - Browse repository at this point
Copy the full SHA 05e019aView commit details
Commits on Sep 6, 2024
-
Adds benchmarks for
DeviceSelect::Unique
(NVIDIA#2359)* adds benchmarks for DeviceSelect::Unique * Fix typo --------- Co-authored-by: Michael Schellenberger Costa <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for e0dad56 - Browse repository at this point
Copy the full SHA e0dad56View commit details -
CUB - Enable DPX Reduction (NVIDIA#2286)
Enable Hopper+ DPX (SIMD) reduction for `uint16_t/int16_t` data types and `Min/Max/Sum` operators
Configuration menu - View commit details
-
Copy full SHA for 3adc92a - Browse repository at this point
Copy the full SHA 3adc92aView commit details -
Configuration menu - View commit details
-
Copy full SHA for 4a32b1c - Browse repository at this point
Copy the full SHA 4a32b1cView commit details -
Add thurst::transform_inclusive_scan with init value (NVIDIA#2326)
* Add thrust::transform_inclusive_scan with init value implementations * Add tests for thrust::transform_inclusive_scan with init * Add more tests and rebase on bug fix from thrust::inclusive_scan * Add docs * Use __accumulator_t * Fix thrust tests readability with initializer_list and docs identation * Fix docs bugs and use correct accumulator and intermediate result types
Configuration menu - View commit details
-
Copy full SHA for 5647255 - Browse repository at this point
Copy the full SHA 5647255View commit details -
Widen histogram agent constructor to more types (NVIDIA#2380)
This allows to accept more data types beyond arrays of exact static sizes. No SASS changes on CUB device histogram test with CTK 12.6. Fixes NVIDIA#1877 for AgentHistogram
Configuration menu - View commit details
-
Copy full SHA for fcf7c91 - Browse repository at this point
Copy the full SHA fcf7c91View commit details -
Configuration menu - View commit details
-
Copy full SHA for 07fef97 - Browse repository at this point
Copy the full SHA 07fef97View commit details
Commits on Sep 8, 2024
-
Add
cub::DeviceTransform
(NVIDIA#2086)Including benchmarks based on BabelStream Co-authored-by: Georgii Evtushenko <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 71b9f98 - Browse repository at this point
Copy the full SHA 71b9f98View commit details
Commits on Sep 9, 2024
-
Update toolkit to CTK 12.6 (NVIDIA#2348)
* Update toolkit to CTK 12.6 12.6 has been released some time ago, so we should update our containers appropriately Co-authored-by: Allison Piper <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 371a434 - Browse repository at this point
Copy the full SHA 371a434View commit details -
implement
make_integer_sequence
in terms of intrinsics whenever pos……sible (NVIDIA#2384) * use the `__make_integer_seq` intrinsic on MSVC when available * use the `__integer_pack` builtin to implement `make_integer_sequence` when possible * define and use a `_LIBCUDACXX_HAS_INTEGER_PACK` config macro
Configuration menu - View commit details
-
Copy full SHA for ee9b856 - Browse repository at this point
Copy the full SHA ee9b856View commit details
Commits on Sep 10, 2024
-
Implement
cuda::mr::cuda_async_memory_resource
(NVIDIA#1637)This implements a wrapper around a `cudaMemPool_t` Fixes NVIDIA#1514
Configuration menu - View commit details
-
Copy full SHA for d5492d5 - Browse repository at this point
Copy the full SHA d5492d5View commit details
Commits on Sep 11, 2024
-
Drop implementation of
thrust::pair
andthrust::tuple
(NVIDIA#2395)* Drop implementation of `thrust::pair` and `thrust::tuple` We previously moved them back to proper class definitions, as using alias declarations broke CTAD. Thanks to @bernhardmgruber who realized that instead of making them an alias we can just pull them in and be done with it. Co-authored-by: Bernhard Manfred Gruber <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for e7ade77 - Browse repository at this point
Copy the full SHA e7ade77View commit details -
Pull out
_LIBCUDACXX_UNREACHABLE
into its own file (NVIDIA#2399)* Pull out `_LIBCUDACXX_UNREACHABLE` into its own file Also make it available globally * Drop more uses of `cstdlib`
Configuration menu - View commit details
-
Copy full SHA for 1c422f2 - Browse repository at this point
Copy the full SHA 1c422f2View commit details
Commits on Sep 12, 2024
-
Share common compiler flags in new CCCL-level targets. (NVIDIA#2386)
* Share common compiler flags in new CCCL-level targets. * fix up some warnings from cudax/ * Avoid deprecation warning from allocator_traits * Silence conversion warning in tuple_leaf * Suppress conversion warning in complex test * Fix more conversion warnings * Silence warning about signedness of unary minus * Let doxygen ignore warning suppression * Suppress a variety of warnings on MSVC. * Fix unused parameter warning. * Suppress reordering warnings on ICC, too. * More msvc taming. * More warnings. * More MSVC narrowing warning fixes. * Re-enable global suppression of MSVC warnings. There are hundreds of these, and most of them appear to be in the unit tests, not the actual Thrust headers. It's turning into a rabbit hole and dragging beyond the scope of the current PR. Filed NVIDIA#2409 to track the removal of these suppressions. --------- Co-authored-by: Eric Niebler <[email protected]> Co-authored-by: Michael Schellenberger Costa <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 1fe25ed - Browse repository at this point
Copy the full SHA 1fe25edView commit details -
Configuration menu - View commit details
-
Copy full SHA for cf21a40 - Browse repository at this point
Copy the full SHA cf21a40View commit details
Commits on Sep 16, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 684cf8e - Browse repository at this point
Copy the full SHA 684cf8eView commit details -
Drop thrusts diagnostic suppression warnings (NVIDIA#2392)
* Drop thrusts diagnostic suppression warnings We have global suppression warnings for CCCL, so use them * Avoid warning of possible loss of data when casting to index type
Configuration menu - View commit details
-
Copy full SHA for 4088134 - Browse repository at this point
Copy the full SHA 4088134View commit details
Commits on Sep 17, 2024
-
[PoC]: Implement
cuda::experimental::uninitialized_async_buffer
(NV……IDIA#1854) * Implement `cuda::uninitialized_async_buffer` This uninitialized buffer provides a stream ordered allocation of N elements of type T utilitzing a cuda::mr::async_resource to allocate the storage. The buffer takes care of alignment and deallocation of the storage. The user is required to ensure that the lifetime of the memory resource exceeds the lifetime of the buffer. Co-authored-by: Mark Harris <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for e3c2e2b - Browse repository at this point
Copy the full SHA e3c2e2bView commit details
Commits on Sep 18, 2024
-
Fix thrust package to work with newer FindOpenMP.cmake. (NVIDIA#2421)
The FindOpenMP module shipped with CMake started (unnecessarily...) including the SHELL: prefix on it's -fopenmp flag.
Configuration menu - View commit details
-
Copy full SHA for 8ced877 - Browse repository at this point
Copy the full SHA 8ced877View commit details -
Introduce
cccl_configure_target
cmake function. (NVIDIA#2388)* Introduce `cccl_configure_target` cmake function. Currently this encapsulates common operations such as setting the CXX/CUDA standard dialect and binary output locations. * Update CI scripts to prevent unsupported cudax/msvc/c++17 config.
Configuration menu - View commit details
-
Copy full SHA for 8f27fba - Browse repository at this point
Copy the full SHA 8f27fbaView commit details -
Fix sccache errors in RAPIDS builds (NVIDIA#2417)
* set .aws/{config,credentials} to read-only, enable bash xtrace, and trap exit to print error logs [skip-matrix] [skip-vdc] [skip-docs] * change how rapids-build-utils stop and restart sccache [skip-matrix] [skip-vdc] [skip-docs] * try using sccache v0.8.1 [skip-matrix] [skip-vdc] [skip-docs] * try killing all sccache processes [skip-matrix] [skip-vdc] [skip-docs] * try killing all sccache processes first [skip-matrix] [skip-vdc] [skip-docs] * check .aws dir exists [skip-matrix] [skip-vdc] [skip-docs] * don't stop the sccache server before running the build commands [skip-matrix] [skip-vdc] [skip-docs] * debug why sccache server won't start [skip-matrix] [skip-vdc] [skip-docs] * define the creds as envvars instead of a file mount [skip-matrix] [skip-vdc] [skip-docs] * set SCCACHE_IDLE_TIMEOUT=0 in CI --------- Co-authored-by: Michael Schellenberger Costa <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 2496571 - Browse repository at this point
Copy the full SHA 2496571View commit details
Commits on Sep 19, 2024
-
Replace
CUDA C++ Core Libraries
withCUDA Core Compute Libraries
……(only in README.md). (NVIDIA#2424) * Replace `CUDA C++ Core Libraries` with `CUDA Core Compute Libraries`. * Remove a couple `C++`, based on suggestion by @miscco --------- Co-authored-by: Michael Schellenberger Costa <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 52a967f - Browse repository at this point
Copy the full SHA 52a967fView commit details -
Configuration menu - View commit details
-
Copy full SHA for d191102 - Browse repository at this point
Copy the full SHA d191102View commit details -
uninitialized_buffer::get_resource
returns a ref to an `any_resourc……e` that can be copied (NVIDIA#2431) * `uninitialized_buffer::get_resource` returns a ref to an `any_resource` that can be copied * Also update `uninintialized_async_buffer` * Fix doc string --------- Co-authored-by: Michael Schellenberger Costa <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 445fd71 - Browse repository at this point
Copy the full SHA 445fd71View commit details -
Refactor
cuda::ceil_div
to take two different types (NVIDIA#2376)* Rework `cuda::ceil_div` to take different types This replaces `cub::DivideAndRoundUp`
Configuration menu - View commit details
-
Copy full SHA for b07f036 - Browse repository at this point
Copy the full SHA b07f036View commit details -
Reduce PR testing matrix. (NVIDIA#2436)
* Remove file annotation from verbose matrix warnings. * Allow 'min', 'max', 'minmax' values for matrix `std` tags. * Error when no supported stds available. * Reduce PR testing matrix. 1. Temporarily remove current nightly matrix pending NVKS bringup. 2. Move current per-PR matrix to nightly. 3. Reduce the number of jobs in the PR matrix while maintaining decent coverage. Before: (total jobs: 437) | 320 | `linux-amd64-cpu16` | 66 | `linux-amd64-gpu-v100-latest-1` | 28 | `linux-arm64-cpu16` | 23 | `windows-amd64-cpu16` After (total jobs: 183) | 126 | `linux-amd64-cpu16` | 26 | `linux-amd64-gpu-v100-latest-1` | 21 | `windows-amd64-cpu16` | 10 | `linux-arm64-cpu16` * Restore old build matrix.
Configuration menu - View commit details
-
Copy full SHA for ee94bb9 - Browse repository at this point
Copy the full SHA ee94bb9View commit details -
Implement
cudax::shared_resource
(NVIDIA#2398)* Implement `cudax::shared_resource` We currently have two basic building blocks around memory resources, `any_resource` and `resource_ref`. However, while they make owning and sharing resources much easier, we can still run into lifetime issues. If a user wants to pass a resource into a library function that might exceed the lifetime of the resource, they would need to move it into an any_resource. However, they also might want to share that resource among multiple functions, e.g a pool allocator. We need a way to properly share a resource in those circumstances. Enter `shared_resource`. Rather than storing an `any_resource` this holds a `shared_ptr<any_resource>`. With that we can happily copy / move them around and without touching the stored resource. Co-authored-by: Eric Niebler <[email protected]> Co-authored-by: Mark Harris <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 7bd04ad - Browse repository at this point
Copy the full SHA 7bd04adView commit details -
Increase the libcu++ timeout (NVIDIA#2435)
* Increase the libcu++ timeout We are frequently running into to the current test duration limit of 01:20, so add another 20 minutes to the timeout. --------- Co-authored-by: Allison Piper <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 5e14128 - Browse repository at this point
Copy the full SHA 5e14128View commit details -
Move c/include/cccl/*.h files to c/include/cccl/c/*.h (NVIDIA#2428)
* Move c/include/cccl/*.h files to c/include/cccl/c/*.h * Change `#warning` to `#error` (to improve the user experience). * Add comments to preprocessor conditionals. Co-authored-by: Michael Schellenberger Costa <[email protected]> * Add comments to preprocessor conditionals. Co-authored-by: Michael Schellenberger Costa <[email protected]> * Add comment to preprocessor conditional. --------- Co-authored-by: Michael Schellenberger Costa <[email protected]> Co-authored-by: Allison Piper <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 2fe09c8 - Browse repository at this point
Copy the full SHA 2fe09c8View commit details -
Make
any_resource
emplacable (NVIDIA#2425)* Rename `async_any_resource` to `any_async_resource` * Add a way of constructing an `any_{async_}resource` from a set of arguments and a tag type --------- Co-authored-by: Allison Piper <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 8b2bf13 - Browse repository at this point
Copy the full SHA 8b2bf13View commit details -
Fix issues with
__host__
and__device__
definitions (NVIDIA#2413)* Fix issues with `__host__` and `__device__` definitions We currently only checked whether we are compiling with a cuda compiler, but not whether we are actually compiling in cuda mode. That meant that certain macros werent properly defined. * Simplify and account for nvhpc
Configuration menu - View commit details
-
Copy full SHA for 28888eb - Browse repository at this point
Copy the full SHA 28888ebView commit details
Commits on Sep 20, 2024
-
Make
bit_cast
play nice with extended floating point types (NVIDIA#……2434) * Move `__is_nvbf16` and `__is_nvfp16` to their own file * Make `bit_cast` play nice with extended floating point types
Configuration menu - View commit details
-
Copy full SHA for 31c3eb9 - Browse repository at this point
Copy the full SHA 31c3eb9View commit details -
Do not include our own string.h file (NVIDIA#2444)
That might conflict with the host standard library
Configuration menu - View commit details
-
Copy full SHA for 92bc4ac - Browse repository at this point
Copy the full SHA 92bc4acView commit details -
Configuration menu - View commit details
-
Copy full SHA for 9641b7e - Browse repository at this point
Copy the full SHA 9641b7eView commit details
Commits on Sep 24, 2024
-
Remove a ton of lines in thrust tests (NVIDIA#2356)
* Convert redundant assignments to initializers list in thrust tests * done :)
Configuration menu - View commit details
-
Copy full SHA for aa1458d - Browse repository at this point
Copy the full SHA aa1458dView commit details -
[CUDAX] Add placeholder green context type and logical device that ca…
…n hold both a green ctx and a device (NVIDIA#2446)
Configuration menu - View commit details
-
Copy full SHA for 6fd1e5c - Browse repository at this point
Copy the full SHA 6fd1e5cView commit details -
Fix typo in CCCLBuildCompilerTargets.cmake (NVIDIA#2453)
* Fix typo in CCCLBuildCompilerTargets.cmake
Configuration menu - View commit details
-
Copy full SHA for 0f0fdc2 - Browse repository at this point
Copy the full SHA 0f0fdc2View commit details
Commits on Sep 25, 2024
-
This drops the duplicated definition of
_CCCL_NO_SYSTEM_HEADER
from…… thrust as that is now in the common compile targets (NVIDIA#2450)
Configuration menu - View commit details
-
Copy full SHA for 17e0c83 - Browse repository at this point
Copy the full SHA 17e0c83View commit details -
Consolidate packages and install rules (NVIDIA#2456)
* Migrate CMake packages to central location. * Remove obsolete infra tests. These tests check that consumers can add_subdirectory(cccl/cub) add_subdirectory(cccl/thrust) to add Thrust or CUB to their build. This hasn't been sensible or supported since we migrated to the monorepo. Now that we're moving shared CMake infra to the CCCL, these tests are finally actually failing. * Migrate install rules to top-level, use `cccl_generate_install_rules` helper to reduce redundancy. * Remove libcudacxx export tests. These tests are redundant with CCCL-level tests, and no longer function now that install rules are centralized at the top level. * Make std arg for test_thrust.ps1 optional. * Ensure that cudax::Thrust target is only created once.
Configuration menu - View commit details
-
Copy full SHA for 2cbf40b - Browse repository at this point
Copy the full SHA 2cbf40bView commit details
Commits on Sep 26, 2024
-
Prune CUB's ChainedPolicy by __CUDA_ARCH_LIST__ (NVIDIA#2154)
* Prune CUB's ChainedPolicy by __CUDA_ARCH_LIST__ Co-authored-by: Elias Stehle <[email protected]> * Workaround GCC 13 error: ``` /home/coder/cccl/thrust/thrust/cmake/../../thrust/iterator/detail/transform_input_output_iterator.inl:68:9: error: writing 1 byte into a region of size 0 [-Werror=stringop-overflow=] 68 | *io = output_function(x); | ~~~~^~~~~~~~~~~~~~~~~~~~~ ``` * Apply feedback from elstehle * Apply suggestion from elstehle * returns cuda error if ptx and arch list mismatch * adds comment on ptxversion requirements --------- Co-authored-by: Elias Stehle <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for bda69fd - Browse repository at this point
Copy the full SHA bda69fdView commit details -
Configuration menu - View commit details
-
Copy full SHA for cc01ce7 - Browse repository at this point
Copy the full SHA cc01ce7View commit details -
Add CCCL_ENABLE_WERROR flag. (NVIDIA#2463)
* Add CCCL_ENABLE_WERROR flag. Also move a stray compiler definition to the compiler interface targets. * Fix typo
Configuration menu - View commit details
-
Copy full SHA for 99fb4b4 - Browse repository at this point
Copy the full SHA 99fb4b4View commit details -
Configuration menu - View commit details
-
Copy full SHA for 5d45850 - Browse repository at this point
Copy the full SHA 5d45850View commit details
Commits on Sep 27, 2024
-
Propagate compiler flags down to libcu++ LIT tests (NVIDIA#2420)
* Propagate compiler flags down to libcu++ LIT tests In order to run libcu++ tests with clang lit has to use correct compiler settings. It currently appears to rely on clang automatically finding and using the default CUDA installation, but that's not necessarily the CUDA that the build itself may have been configured to use. Clang builds may also require building them with the host libc++, instead of the default libstdc++. That requires propagation of the top-level CFLAGS specified by the user.
Configuration menu - View commit details
-
Copy full SHA for 0e09815 - Browse repository at this point
Copy the full SHA 0e09815View commit details
Commits on Sep 28, 2024
-
Drop remaining uses of
_LIBCUDACXX_COMPILER_*
(NVIDIA#2467)This was breaking some other PR and we never use them anymore
Configuration menu - View commit details
-
Copy full SHA for 467a44d - Browse repository at this point
Copy the full SHA 467a44dView commit details -
Avoid C++17 extension in c++11 tests (NVIDIA#2469)
Using typename in a template template parameter is a C++17 extension and clang warns about that as a cuda compiler
Configuration menu - View commit details
-
Copy full SHA for 7c668e8 - Browse repository at this point
Copy the full SHA 7c668e8View commit details -
Add span to example and templated block size (NVIDIA#2470)
* add span to example and template block size
Configuration menu - View commit details
-
Copy full SHA for e3800d7 - Browse repository at this point
Copy the full SHA e3800d7View commit details -
Configuration menu - View commit details
-
Copy full SHA for 94e4e75 - Browse repository at this point
Copy the full SHA 94e4e75View commit details
Commits on Sep 30, 2024
-
removes superfluous template keyword that striggers warnings/errors w…
…ith clang-19 (NVIDIA#2482) error: a template argument list is expected after a name prefixed by the template keyword [-Wmissing-template-arg-list-after-template-kw] 349 | IterateThreadStore<0, VOLATILE_MULTIPLE>::template Dereference(
Configuration menu - View commit details
-
Copy full SHA for 242bcce - Browse repository at this point
Copy the full SHA 242bcceView commit details -
Improve build times in several heavyweight libcudacxx tests. (NVIDIA#…
…2478) * Split heavy heterogeneous atomic tests * Split integral atomics tests * Split heavy alg.copy test
Configuration menu - View commit details
-
Copy full SHA for 653e546 - Browse repository at this point
Copy the full SHA 653e546View commit details -
Drop
__availability
header (NVIDIA#2484)We always unconditionally define no availability anyway and we do not really care about the macOS host standard library
Configuration menu - View commit details
-
Copy full SHA for 0521015 - Browse repository at this point
Copy the full SHA 0521015View commit details -
Replace a few more instances of
CUDA C++ Core Libraries
with CUDA C……ore Compute Libraries`. (NVIDIA#2447) * Do not include our own string.h file (NVIDIA#2444) That might conflict with the host standard library * Programmatically (`sed`) replace `CUDA C++ Core Libraries` with CUDA Core Compute Libraries`. NO MANUAL CHANGES. Command used: ``` for fn in `git grep 'CUDA C++ Core Libraries' | cut -d':' -f1 | uniq`; do sed -i .bak 's/CUDA C++ Core Libraries/CUDA Core Compute Libraries/g' $fn; done ``` * Extend `=======================` line in .rst file to match new name. * Revert changes in all .cuh, .cu, .h, .cpp files. Rationale: These files actually have a C++-specific license. It's left for later to change the license and the notices together. * Revert changes in docs/ subdirectory. --------- Co-authored-by: Michael Schellenberger Costa <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 725954c - Browse repository at this point
Copy the full SHA 725954cView commit details
Commits on Oct 1, 2024
-
Fix
common_type
specialization for extended floating point types (N……VIDIA#2483) * Fix `common_type` specialization for extended floating point types The machinery we had in place was not really suited to specialize `common_type` because it would take precendence over the actual implementation of `common_type` In that case, we only specialized `common_type<__half, __half>` but not `common_type<__half, __half&>` and so on. This shows how brittle the whole thing is and that it is not extensible. Rather than putting another bandaid over it, add a proper 5th step in the common_type detection that properly treats combinations of an extended floating point type with an arithmetic type. Allowing arithmetic types it necessary to keep machinery like `pow(__half, 2)` working. Fixes [BUG]: `is_common_type` trait is broken when mixing rvalue references NVIDIA#2419 * Work around MSVC declval bug
Configuration menu - View commit details
-
Copy full SHA for 81d05bb - Browse repository at this point
Copy the full SHA 81d05bbView commit details -
Configuration menu - View commit details
-
Copy full SHA for 808f9c2 - Browse repository at this point
Copy the full SHA 808f9c2View commit details -
Configuration menu - View commit details
-
Copy full SHA for 57b9899 - Browse repository at this point
Copy the full SHA 57b9899View commit details -
Disable system header for narrowing conversion check (NVIDIA#2465)
There is an incredible compiler bug reported in nvbug4867473 where the use of system header changes the way some types are instantiated. The culprit seems to be that within a system header the compiler accepts narrowing conversions that it should not accept Work around it by moving __is_non_narrowing_convertible to its own header that is included before we define the system header machinery
Configuration menu - View commit details
-
Copy full SHA for 190099c - Browse repository at this point
Copy the full SHA 190099cView commit details
Commits on Oct 2, 2024
-
Require resources to always provide at least one execution space prop…
…erty (NVIDIA#2489) Currently we implicitly assumed that any resource that had no execution space property was host accessible. However, that is not a good design, as it provides a source of surprise and numerous challenges with proper type matching down the road. So rather than implicitly assuming that something is host accessible, we require the user to always provide at least one execution space property.
Configuration menu - View commit details
-
Copy full SHA for 59ad103 - Browse repository at this point
Copy the full SHA 59ad103View commit details -
Rework builtin handling (NVIDIA#2461)
* Move builtin detection to its own file * Try to reenable more builtins * Address review comments
Configuration menu - View commit details
-
Copy full SHA for e4f48cf - Browse repository at this point
Copy the full SHA e4f48cfView commit details -
Disable execution checks for
std::equal
(NVIDIA#2491)This is used in the `cudax::vector` PR and the only dependency change of libcu++ which blows up the CI
Configuration menu - View commit details
-
Copy full SHA for ee3bd53 - Browse repository at this point
Copy the full SHA ee3bd53View commit details -
replace
_CCCL_ALWAYS_INLINE
with_CCCL_FORCEINLINE
(NVIDIA#2439)* replace `_CCCL_ALWAYS_INLINE` with `_CCCL_FORCEINLINE` * Avoid GCC optimizer doing things * try avoiding the gcc optimizer bug a different way --------- Co-authored-by: Michael Schellenberger Costa <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 0589775 - Browse repository at this point
Copy the full SHA 0589775View commit details -
Configuration menu - View commit details
-
Copy full SHA for 25c57f8 - Browse repository at this point
Copy the full SHA 25c57f8View commit details -
Configuration menu - View commit details
-
Copy full SHA for 10769b4 - Browse repository at this point
Copy the full SHA 10769b4View commit details -
Drop badly named
_One_of
concept (NVIDIA#2490)* Drop badly named `_One_of` concept First ig goes against the naming convention we have with both files and concepts. Furthermore, we should rather use the new type set facility * Rename to `_is_included_in` * Fix concept issue
Configuration menu - View commit details
-
Copy full SHA for 5e139af - Browse repository at this point
Copy the full SHA 5e139afView commit details
Commits on Oct 3, 2024
-
Unify assert handling in cccl (NVIDIA#2382)
We currently do not have proper assertions within CCCL. There are different approaches in cub thrust and libcu++, some of which are completely broken. This tries to rework the assertion handlers so that they work uniformly everywhere and can be selectively enabled.
Configuration menu - View commit details
-
Copy full SHA for 3eee9b2 - Browse repository at this point
Copy the full SHA 3eee9b2View commit details -
Reduce scope of Thrust linkage in cudax. (NVIDIA#2496)
It's only used for tests, it doesn't need to be linked to all cudax targets.
Configuration menu - View commit details
-
Copy full SHA for bb001b7 - Browse repository at this point
Copy the full SHA bb001b7View commit details -
Centralize CPM logic. (NVIDIA#2495)
* Abstract and consolidate CPM calls. * Update CPM used in example projects.
Configuration menu - View commit details
-
Copy full SHA for a0ec74c - Browse repository at this point
Copy the full SHA a0ec74cView commit details -
Configuration menu - View commit details
-
Copy full SHA for c15546a - Browse repository at this point
Copy the full SHA c15546aView commit details -
Refactor away per-project TOPLEVEL flags. (NVIDIA#2498)
* Refactor away per-project TOPLEVEL flags. These will never be toplevel projects ever again now that we're a monorepo. They're redundant with the `CCCL_ENABLE_<proj>` flags. * Disable cudax when unstable is off.
Configuration menu - View commit details
-
Copy full SHA for 1cfe171 - Browse repository at this point
Copy the full SHA 1cfe171View commit details
Commits on Oct 4, 2024
-
[FEA]: Validate cuda.parallel type matching in build and execution (N…
…VIDIA#2429) * Brute force experiment: Which tests fail after adding an `assert False`? * Do not include our own string.h file (NVIDIA#2444) That might conflict with the host standard library * Add `_dtype_validation()` in python/cuda_parallel/cuda/parallel/experimental/__init__.py and fix bug in python/cuda_parallel/tests/test_reduce_api.py * Add `test_device_reduce_dtype_mismatch()`. Capture `dtype`s only in ctor (not entire arrays). * Undo change in .gitignore * Move `min_op()` back into `test_device_reduce_success()` to unbreak sphinx documentation. Also fix existing typo. * Move `test_device_reduce_dtype_mismatch()` from test_reduce_api.py to test_reduce.py * Add TODO POINTER vs ITERATOR --------- Co-authored-by: Michael Schellenberger Costa <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for e8d57c3 - Browse repository at this point
Copy the full SHA e8d57c3View commit details -
Configuration menu - View commit details
-
Copy full SHA for 583567b - Browse repository at this point
Copy the full SHA 583567bView commit details
Commits on Oct 5, 2024
-
Configuration menu - View commit details
-
Copy full SHA for c86caca - Browse repository at this point
Copy the full SHA c86cacaView commit details
Commits on Oct 7, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 8aaeb29 - Browse repository at this point
Copy the full SHA 8aaeb29View commit details
Commits on Oct 8, 2024
-
Add ForEachN from CUB to cccl/c. (NVIDIA#2378)
* Work in progress * Initial draft of exposing CUB::for in cccl/c. * Add a fairly cursed storage abstraction for kernel arguments. * Fix cccl/c include path. * Commit WIP * Make reduce inline functions static. * Initial draft of iterator support for C device for * Remove small vector (for now) * Fixup and remove debug build option from for.cu. * Disable iterator support in foreach (for now) * whitespace fixup * Restore LDL/STL call ABI checks that were commented out earlier. Co-authored-by: Georgii Evtushenko <[email protected]> * Fix missing context push, Fix terminating instead of returning error. * alignof instead of signof, use `value_t` for value type of kernel. * Fix missing header in for.cu --------- Co-authored-by: Georgii Evtushenko <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for ee5dd3e - Browse repository at this point
Copy the full SHA ee5dd3eView commit details -
Adds support for large number of items in
DeviceSelect
and `DeviceP……artition` (NVIDIA#2400) * adds streaming selection and partition * ensures policy lookup uses per-partition offset type * mitigates perf degradation on select * makes device interfaces use i64 num_items * updates select::if large num_items tests * fixes syntax * adds tests for large num_items for select::flagged * adds tests for large num_items for partition::if * adds tests for large num_items for partition::flagged * fixes redundant definition * fixes implicit conversion * fixes f32 select::if perf regression * fixes perf regression for partition * fix feature macro * fixes feature macro * fixes feature macro * silences msvc constant conditional warning * add support for streamin ctx dummy for partition with small offset types * removes superfluous template parameter * adds test for different offset types for partition::if & ::flagged * adds tests and support for streaming select::unique * fixes msvc warning * fixes perf for partition * fixes format * fixes mixup for partition perf fix * fixes syntax * fixes partition:flagged perf * fixes perf for partition::flagged * switches unique to always use i64 offset types * adds benchmark for partition with distinct iterators * resolves merge conflicts * fixes merge conflict * makes sass identical to main for i32 partition * updates thrust copy_if to always use i64 offset types * fixes formatting * minor style improvements * addresses review comments * fixes conditional type usage * makes tests on empty input more robust * restores empty problem behaviour * adds comment on const ref
Configuration menu - View commit details
-
Copy full SHA for 16f9a1a - Browse repository at this point
Copy the full SHA 16f9a1aView commit details -
Adds support for large number of items to
DeviceScan::*ByKey
family…… of algorithms (NVIDIA#2477) * experimenting with bool tile state * fixes perf regression from different tile state * fixes support for large offset types * adapts interface for scanbykey * adds tests for large number of items for scanbykey * fixes naming * makes thrust scan_by_key use unsigned offset types * moves scan_by_key_op to detail ns
Configuration menu - View commit details
-
Copy full SHA for 951c822 - Browse repository at this point
Copy the full SHA 951c822View commit details
Commits on Oct 9, 2024
-
Integrate c/parallel with CCCL build system and CI. (NVIDIA#2514)
Integrate c/parallel into CCCL, setup CI, etc.
Configuration menu - View commit details
-
Copy full SHA for e149e86 - Browse repository at this point
Copy the full SHA e149e86View commit details -
Configuration menu - View commit details
-
Copy full SHA for cbb0edd - Browse repository at this point
Copy the full SHA cbb0eddView commit details -
Refactor include paths to match cudax conventions.
- Global instead of local - Use `cuda/experimental` - s/.h$/.cuh$/g
Configuration menu - View commit details
-
Copy full SHA for afa153d - Browse repository at this point
Copy the full SHA afa153dView commit details -
Configuration menu - View commit details
-
Copy full SHA for 4b2cf18 - Browse repository at this point
Copy the full SHA 4b2cf18View commit details -
Split STF headers into a separate headertest unit.
These currently require different compile options from the rest of CudaX.
Configuration menu - View commit details
-
Copy full SHA for 450136e - Browse repository at this point
Copy the full SHA 450136eView commit details -
Configuration menu - View commit details
-
Copy full SHA for 09213f6 - Browse repository at this point
Copy the full SHA 09213f6View commit details -
Configuration menu - View commit details
-
Copy full SHA for c587b36 - Browse repository at this point
Copy the full SHA c587b36View commit details -
Configuration menu - View commit details
-
Copy full SHA for 2030832 - Browse repository at this point
Copy the full SHA 2030832View commit details -
Configuration menu - View commit details
-
Copy full SHA for 7a2a842 - Browse repository at this point
Copy the full SHA 7a2a842View commit details -
Configuration menu - View commit details
-
Copy full SHA for 71196f1 - Browse repository at this point
Copy the full SHA 71196f1View commit details -
Fix standalone compilation of logical_data.cuh.
These functions are used in both logical_data.cuh and acquire_release.cuh. Move them to where they'll be visible to both.
Configuration menu - View commit details
-
Copy full SHA for ebc205a - Browse repository at this point
Copy the full SHA ebc205aView commit details -
Configuration menu - View commit details
-
Copy full SHA for 750db80 - Browse repository at this point
Copy the full SHA 750db80View commit details -
Configuration menu - View commit details
-
Copy full SHA for 5c55fef - Browse repository at this point
Copy the full SHA 5c55fefView commit details
Commits on Oct 10, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 03d0a33 - Browse repository at this point
Copy the full SHA 03d0a33View commit details