Cudastf #1

* Make libcu++ void_t available in C++11 * Replace uses of CUB and Thrust's void_t by libcu++ * Deprecate Thrust's void_t * Deprecate libcu++'s __void_t

* Fix divide by zero in `parse-job-times.py` * Add CI slack notifications.

[skip-rapids][skip-vdc][skip-matrix][skip-docs]

…ws vs. actions. (NVIDIA#2008) [skip-matrix][skip-vdc][skip-docs][skip-rapids]

[skip-rapids][skip-vdc][skip-matrix][skip-docs]

This is needed to fetch the runtime info from the GHA API. This only modifies the nightly workflow, the PR tests are unaffected: [skip-matrix][skip-vdc][skip-rapids][skip-docs]

…ture. (NVIDIA#2016)

Co-authored-by: Georgii Evtushenko <[email protected]>

We currently employ a workaround against an nvc++ bug where `__ELF__` is not properly defined. However, we should not define that macro if it is already present. Fixes [BUG]: incompatible redefinition of macro "__ELF__" with NVC++ host compiler NVIDIA#1995

* Refactor thrust/CUB merge * Port thurst::merge[_by_key] to cub::DeviceMerge Fixes NVIDIA#1763 Co-authored-by: Georgii Evtushenko <[email protected]>

…x types (NVIDIA#2025) Addresses NVIDIA#2013

We almost always pull in `cuda::std::distance`, which would be ambiguous with `thrust::distance` if the used type pulls in `cuda::std::` within a thrust algorithm

Co-authored-by: Michael Schellenberger Costa <[email protected]>

* Add a test for Thrust scan with non-commutative op * Fix printing mismatching sequences of non-addable types before C++17 in Thrust unit tests

Move the docs to rst format and also fix some minor issues in the documentation Co-authored-by: Bernhard Manfred Gruber <[email protected]>

These are on a testing pool and the machines are currently experiencing infra failures.

…types (NVIDIA#2046) The issue we have is that our tests rely extensively on those conversions which makes it incredibly painfull to test

…+ interface. (NVIDIA#2017) * Wrapper types for `cudaEvent_t` that provide a modern C++ interface. * `cuda::experimental::event_ref` is a non-owning wrapper around a `cudaEvent_t`. * `cuda::experimental::event` is an owning wrapper around a `cudaEvent_t`. * `cuda::experimental::timed_event` is a `cuda::experimental::event` that also records the time at which it was recorded. * apparently `__event` is a word of power for msvc * represent the elapsed time between two events with nanoseconds instead of microsoconds according to the CUDA docs for `cudaEventElapsedTime`, the elapsed time has sub-microsecond resolution, so it is more appropriate to represent it in nanoseconds. * prune unused headers, switch to rst-friendly doxygen comment style * add class synopsis comments * construct with a stream_ref and record the event on construction * review feedback * tests for `cudax::event` and `cudax::timed_event` * change `event_ref::wait` to use `cudaEventSynchronize` * Use a struct for windows instead * Do not include superfluous config header * Add clang-format rule for cudax * Spell `cudax_add_catch2_test` correctly * Fix formatting issues --------- Co-authored-by: Michael Schellenberger Costa <[email protected]>

* Add dummy device struct * add `__scoped_device` for changing and restoring the current device --------- Co-authored-by: Michael Schellenberger Costa <[email protected]> Co-authored-by: Eric Niebler <[email protected]>

* Add a cuDF inspired test for merge_by_key * Allow CUB MergePath to support iterators with different value types * Allow different input value types for merge, as long as they are convertible to the value type of the first iterator. This weakens the publicly documented guarantees of equal value types to restore the old behavior of the thrust implementation replaced in NVIDIA#1817.

) This allows us to get rid of partial_sum_type, which still uses the C++11-deprecated function object API ::result_type. Co-authored-by: Georgii Evtushenko <[email protected]>

…nctors

…IDIA#2084) this commit encodes all of the device attributes as query objects. whereas the CUDA runtime has `cudaDevAttrMaxThreadsPerBlock`, cudax has `device::attrs::max_threads_per_block`. Querying a device looks like: ``` device dev0(0); auto tpb = dev0.attr(device::attrs::max_threads_per_block); ``` this syntax also works, which is intended to help people already familiar with the C-style interface: ``` device dev0(0); auto tpb = dev0.attr<::cudaDevAttrMaxThreadsPerBlock>(); ``` the attribute tags are convertible to `::cudaDeviceAttr`, so that `device::attrs::max_threads_per_block == ::cudaDevAttrMaxThreadsPerBlock`. some effort has been made to make the query return types correct. for instance, `dev0.attr(device::attrs::managed_memory)` returns a `bool`, and `dev0.attr(device::attrs::compute_mode)` returns a value from the `::cudaComputeMode` enumeration. the attributes that return enumerations provide convenience aliases for the enum values. so this: ``` if (dev0.attr<::cudaDevAttrComputeMode>() == ::cudaComputeModeDefault) ``` is equivalent to: ``` if (dev0.attr(device::attrs::compute_mode) == device::attrs::compute_mode._default) ``` closes NVIDIA#2083 --------- Co-authored-by: pciolkosz <[email protected]> Co-authored-by: Michael Schellenberger Costa <[email protected]>

…2093) * construct with a stream_ref and record the event on construction --------- Co-authored-by: Eric Niebler <[email protected]>

This may fail as discussed in NVIDIA#2095 Fixes 2095

…IDIA#1872) * Improve binary function objects and replace thrust implementation * simplify use of ::cuda::std binary_function_objects * Replace _CCCL_CONSTEXPR_CXX14 with constexpr in all libcudacxx binary function objects that are imported in thrust. * Determine partial sum type without ::result_type * Ignore _LIBCUDACXX_DEPRECATED_IN_CXX11 for doxygen Co-authored-by: Bernhard Manfred Gruber <[email protected]>

…NVIDIA#1881)

* Add script to update RAPIDS version. * Update to 24.10.

* fix broken links * revert repo.toml * linkchecker fixes * fix .cuh errors * lint

* Add a header to interact with driver APIs * Add a test for the driver API interaction * Format * Fix formatting

* Use `common_type` for complex `pow` Previously we would rely on our internal `__promote` function. However, that could have surprising results, e.g. `pow(complexy<float>, int)` would return `complex<double>` With C++23, this situation got clarified and we should use `common_type` to determine the return type.

… cache properties (NVIDIA#2110)

… point fallbacks (NVIDIA#2106) Fixes NVIDIA#2078

) * Drop `cuda::get_property` CPO It serves no purpose as it only ever forwards via ADL and also breaks older nvcc * Ensure that we test memory resources * Implement `cuda::uninitialized_buffer` `cuda::uninitialized_buffer` provides an allocation of `N` elements of type `T` utilitzing a `cuda::mr::resource` to allocate the storage. `cuda::uninitialized_buffer` takes care of alignment and deallocation of the storage. The user is required to ensure that the lifetime of the memory resource exceeds the lifetime of the buffer.

…ice (NVIDIA#2073) * Ensure that `cuda_memory_resource` allocates memory on the proper device * Move `__ensure_current_device` to own header

* Clarify compatibility wrt. template specializations We do not want users to specialize arbitrary templates in CCCL unless otherwise stated. This PR makes this clear in the README.md. Co-authored-by: Michael Schellenberger Costa <[email protected]>

* Make `cuda::std::tuple` trivially copyable This is similar to the situation with `cuda::std::pair` We have a lot of users that rely on types being trivially copyable, so that they can utilize memcpy and friends. Previously, `cuda::std::tuple` did not satisfy this because it needs to handle reference types. Given that we already specialize `__tuple_leaf` depending on whether the class is empty or not, we can simply add a third specialization that handles the trivially copyable types and one that synthesizes assignment. Co-authored-by: Bernhard Manfred Gruber <[email protected]>

Also fix typo in the link

``` In function bool cuda::std::__4::__dispatch_memmove(_Up*, _Tp*, size_t) ... error: *(unsigned char*)(&privatized_decode_op[0]) may be used uninitialized [-Werror=maybe-uninitialized] ... *(unsigned char*)(&privatized_decode_op[0]) was declared here 1528 | PrivatizedDecodeOpT privatized_decode_op[NUM_ACTIVE_CHANNELS]{}; ```

* Fix flakey heterogeneous tests by ensuring only *one* writer exists in parallel between H/D * Fixup copy paste mistake * Make host atomics simpler by removing the ugly alignment type * Fix deadlocks introduced into barrier/semaphore tests * Revert removing hacky atomic wrapping stuff * Fix unused warning bug in GCC-6

``` Linking CXX executable bin/cub.cpp14.catch2_test.lid_0 FAILED: bin/cub.cpp14.catch2_test.lid_0 ... /usr/bin/ld: cub/test/CMakeFiles/cub.cpp14.test.warp_scan_api.dir/catch2_test_warp_scan_api.cu.o: in function `InclusiveScanKernel(int*)': /usr/local/cuda-12.7/targets/x86_64-linux/include/nvtx3/nvtxDetail/nvtxInitDefs.h:473: multiple definition of `InclusiveScanKernel(int*)'; cub/test/CMakeFiles/cub.cpp14.test.block_scan_api.dir/catch2_test_block_scan_api.cu.o:/usr/local/cuda-12.7/targets/x86_64-linux/include/nvtx3/nvtxDetail/nvtxInitDefs.h:468: first defined here collect2: error: ld returned 1 exit status ```

… in the system (NVIDIA#2100) * add `cuda::devices` vector the number of cuda devices can be determined by calling `cuda::devices.size()`. `cuda::devices` is a range of `cuda::device` objects.

* Fix trivial_copy_device_to_device execution space * Typo * Format * Extra empty line

Fixes: NVIDIA#1968

Fixes: NVIDIA#2165

In the devcontainers `clang-format` is now installed into `/usr/bin/clang-format`

…2035) Co-authored-by: Michael Schellenberger Costa <[email protected]>

… it in places where it was missing (NVIDIA#2192) * Change __scoped_device to use driver API * Switch to use driver API based dev setter * Remove constexpr from operator device() * Fix comments and includes * Fallback to non-versioned get entry point pre 12.5 We need to use versioned version to get correct cuStreamGetCtx. There is v2 version of it in 12.5, fortunatelly the versioned get entry point is available there too * Fix unused local variable * Fix warnings in ensure_current_device test * Move ensure current device out of detail * Add LIBCUDACXX_ENABLE_EXCEPTIONS to tests cmake

* Fix singular vs plural typo in thread scope documentation. * Better grammar fix.

…A#2183) * [cudax] give the `cudaDevAttrMemoryPoolSupportedHandleTypes` attribute the correct type * move attribute definitions from `device_ref` to `device`

* pass docs build options to repo.sh * Integrate Python docs * update CI * Apply suggestions from code review Co-authored-by: Georgii Evtushenko <[email protected]> --------- Co-authored-by: Georgii Evtushenko <[email protected]>

* Initial draft of new atomics backend * Change atomic fetch ops back to tag dispatch * Save wip * Add load/store and support for MMIO * Begin working on exch * Enable formatting exchange * Several signed-ness fixes * Make atomics ptx tests build. Lit tests are a WIP. * Fix load/store, some volatileness, and min/max * Formatting and enabled codegen in all builds * Make integral.pass.cpp pass * Make the rest of the atomics tests pass * Use 128b ld/st instead of vector load as it is not atomic across the whole atom * Fix copy-paste mistake in load/store * Whitespace fixup * Fix 128b .exch using .cas operands * Make codegen link fmt as PRIVATE Co-authored-by: Allison Piper <[email protected]> * Simplify MMIO down to a static array. Co-authored-by: Bernhard Manfred Gruber <[email protected]> * Static -> Inline for codegen functions. Replace endl with '\n'. * Supply the output stream directly to `fmt::format` * Update fmtlib. * Revert `fmt::format(out...)` changes. They don't work on MSVC. * Fixup libcudacxx codegen CMake stuff * Remove sneaky cstdef include that was auto-added * [pre-commit.ci] auto code formatting --------- Co-authored-by: Allison Piper <[email protected]> Co-authored-by: Bernhard Manfred Gruber <[email protected]> Co-authored-by: Michael Schellenberger Costa <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

…unch` prior to launching the kernel (NVIDIA#2202) * add `__launch_transform` to transform arguments to `cudax::launch` prior to launching the kernel

…VIDIA#2204) * Cleanup common testing headers * Add test/common to cmake and fix formatting

…perator to device_ref (NVIDIA#2203) * Add a way to compare device_refs * Add a way to query device_ref from a stream * Fix Windows missing cast * Apply suggestions from code review Co-authored-by: Michael Schellenberger Costa <[email protected]> * Disallow device comparision with int --------- Co-authored-by: Michael Schellenberger Costa <[email protected]>

* Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * [pre-commit.ci] auto code formatting * Why was 6 afraid of 7? --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

…ly distribute elements over thread blocks (NVIDIA#2210)

It is proving difficult to handle for msvc and also the one we are using in libcu++ it much cleaner Gets NVIDIA#2160 compiling on MSVC

* undoc functions taking debug_synchronous

--------- Co-authored-by: pciolkosz <[email protected]> Co-authored-by: Michael Schellenberger Costa <[email protected]>

drive-by: avoid potential overload ambiguity in `__launch_transform`

)

…VIDIA#2226) * Add <complex> test for bad macros being defined * Fix <complex> failing upon inclusion when bad macros are defined * Rather use explicit specializations and some evil hackery to get the complex interop to work * Fix typos * Inline everything * Move workarounds together * Use conversion functions instead of explicit specializations * Drop unneeded conversions --------- Co-authored-by: Michael Schellenberger Costa <[email protected]>

By falling back to a non-load-vectorizing code path. Fixes: NVIDIA#2207

In the @acts-project we adopt an (admittedly somewhat unconventional) build system in which software A depends on B, and B depends on CCCL. The setup is that we want to install B into a prefix, and then try to build A against B. The problem arises is that we are using CMake to dynamically fetch CCCL using the so-called "FetchContent" mechanism, which downloads CCCL and then adds it as a subdirectory. The core problem is that installing software B which has included CCCL does not actually install CCCL in the same prefix, so software A cannot then load software B as CCCL is not installed. The reason this happens is that CMakeLists.txt:28 (at the time of writing) returns from the CMake configuration stage early, and leaves the CUB, Thrust, and libcudacxx directories unincluded (see lines 70 to 72). Although this is, again, an unconventional and rare scenario, it should be easy to add support for this kind of build, and I hope the CCCL devs would agree that it might be worth doing. In this commit, I remove the early return and replace it with additional if-statements. This commit should leave any existing workflows completely untouched, but should make it easier to use CCCL in the way we do in @acts-project.

…VIDIA#2243)

* Fix and simplify <bit> * Make logic for non-constant evaluation simpler in C++14 and greater in <bit> * Remove use of `std::` in `<bit>` Co-authored-by: Michael Schellenberger Costa <[email protected]> * Change bitops tests to prevent constant folding of runtime checks * Move bit and split implementation details from main header * Remove volatile from tests in bitops * Make Windows happy by using `unsigned long` * Work around being unable to use {} in c++ constexpr functions * Add a 'default to constexpr' interpretation of is_constant_evaluated for internal use in bitops * Make windows happy by reusing the default to constexpr hack * Make bitops tests definitely actually do runtime * Move <bit> fallbacks into relevant headers * Fix fallbacks being guarded by MSVC ifdef. * Keep the license --------- Co-authored-by: Michael Schellenberger Costa <[email protected]>

This should fix our rmm builds

Fixes: NVIDIA#2172

* Workaround GCC 13 issue * Update cub/cub/device/dispatch/dispatch_histogram.cuh Co-authored-by: Michael Schellenberger Costa <[email protected]> --------- Co-authored-by: Michael Schellenberger Costa <[email protected]>

* Fix typo * Add tests for transform_iterator's reference type

* Report mismatched values in first_val * Improve output while search.py is running

thrust::device_vector<int> v(5, 10) should create a vector with 5 integers of value 10, and not attempt the iterator pair constructor.

* make DeviceScan offset type a template parameter * updates tests to use device interface * moves thrust scan to unsigned offset types * adjusts benchmarks to account for used offset types * uses dynamic dispatch to unsigned type * adds tparam docs for NumItemsT * fixes warning about different signedness comparison * adds check for negative num_items in thrust::scan * fixes unused param in is_negative

int128 was already working but not covered by a test.

* Implement C device reduce * Format * Fix device-specific module loading * Don't need a context at build step now * Address review feedback

* Move cuda/cooperative to cuda_cooperative * Update paths to cooperative module * Fix literal include paths

* Move compiler version macros into libcu++ * Drop some pre-C++11 code paths around thrust::complex

* adds tabulate output iterator * uses cccl exec space macros * addresses review comments * fixes documentation and example * moves to using alias template instead of member type

* Flatten `<cuda/std/utility>` We do not need the indirection anymore * Flatten <cuda/std/iterator> We do not need the indirection anymore * Flatten `<cuda/std/expected>` * Flatten `<cuda/std/mdspan>` * Flatten `<cuda/std/ranges>` * Flatten `<cuda/std/__new_>` * Flatten `<cuda/std/cassert>` * Flatten `<cuda/std/initializer_list>` * Cleanup `<cuda/functional>` and `<cuda/std/functional>` Those headers dont need any of the deep nesting we had, so clean them up and simplify their dependencies * Flatten `<cuda/std/type_traits>` * Flatten `<cuda/std/__memory_>` * Flatten `<cuda/std/concepts>` * Drop unused `libcudacxx/include/cuda/std/detail/libcxx/include/ccomplex` * Flatten `<cuda/std/cfloat>` * Flatten `<cuda/std/version>` * Fix formatting issues? * Add missing include to `thrust/functional.h` Also fix the cyclic dependency with actor.h * Suppress MSVC warning

* Architecture traits first draft * Comment * Add missing sm 80 and sm 60 arches * Add missing sm 90 * Move to 100*major+10*minor format and remove template arch_traits getter * Add Ada and cleanup some things * Move includes after system header pragma --------- Co-authored-by: Elias Stehle <[email protected]> Co-authored-by: Bernhard Manfred Gruber <[email protected]> Co-authored-by: Georgii Evtushenko <[email protected]>

Fixes: NVIDIA#2289

@miscco

…VIDIA#2266) * Implement `any_resource` an owning wrapper around any resource Addresses NVIDIA#1426 * Continue development of @miscco's `any_resource` * address review feedback * [pre-commit.ci] auto code formatting * mark all deallocation functions as `noexcept` * fix some test failures * more tests and bug fixes * fix more build breaks * attempt to fix the cudax docs build * exclude more symbols from the cudax docs * more portability fixes and doxygen tweaks * once more with feeling * getting pretty close now * fix broken test * deduplicate `basic_any_resource` constructors to satisfy doxygen * [pre-commit.ci] auto code formatting * don't use `if constexpr` when compiling as c++14 * more fixes for doxygen and c++14 * back out a questionable addition of `noexcept` * molify msvc * accommodate integer size differences on msvc * eliminate shadow warning treated as error * handle duplicate properties without triggering compiler warnings * [pre-commit.ci] auto code formatting --------- Co-authored-by: Michael Schellenberger Costa <[email protected]>

* use `NV_IF_TARGET` to conditionally compile CUDAX tests * [pre-commit.ci] auto code formatting --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Make for_each compatible with NVRTC * Whitespace fixup. * Apply suggestion to improve includes.

* refactor cmake so more cudax samples can be easily added * [pre-commit.ci] auto code formatting --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

…unch` (NVIDIA#2294)

* Implement `std::bit_cast` This backport C++20 `std::bit_cast` to be available in all standard modes. As this requires compiler builtin support, we have a non-constexpr workaround with the usual memcpy implementation. Fixes NVIDIA#2257 * Add additional contraint in the fallback mode * Use bit_cast in cub * Formatting fix? * Fix typo

* Move enum `endian` to its own file * Move `std::rotl` and `std::rotr` to their own file * Move `std::has_single_bit` to its own file * Move `countr_{one, zero}` to its ownn file * Move `countl_{one, zero}` to their own file * Move `bit_ceil`, `bit_floor` and `bit_width` to their own file * Cleanup the `<bit>` header

@miscco

…`cudax::any_resource` (NVIDIA#2293) * Implement `any_resource` an owning wrapper around any resource Addresses NVIDIA#1426 * Continue development of @miscco's `any_resource` * address review feedback * [pre-commit.ci] auto code formatting * mark all deallocation functions as `noexcept` * fix some test failures * more tests and bug fixes * fix more build breaks * attempt to fix the cudax docs build * exclude more symbols from the cudax docs * more portability fixes and doxygen tweaks * once more with feeling * getting pretty close now * fix broken test * deduplicate `basic_any_resource` constructors to satisfy doxygen * [pre-commit.ci] auto code formatting * don't use `if constexpr` when compiling as c++14 * more fixes for doxygen and c++14 * back out a questionable addition of `noexcept` * molify msvc * accommodate integer size differences on msvc * eliminate shadow warning treated as error * handle duplicate properties without triggering compiler warnings * change `uninitialized_buffer` to own its memory resource using `any_resource` * Use fully qualified name * Drop `__host__ __device__` from uninitialized_buffer * Revert "Drop `__host__ __device__` from uninitialized_buffer" This reverts commit 5115b08. * Just do the cursed thing * Add missing include * Adopt the doc string --------- Co-authored-by: Michael Schellenberger Costa <[email protected]> Co-authored-by: anon <users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* add segmented [radix] sort tests for 64-bit indices * Add test for device segmented sort pairs with 64-bit indices * Add a macro guard to protect tests that are not supported yet * Fix documentation typos * Remove unrelated files

* Add thrust::inclusive_scan with init value sequential * Add thrust::inclusive_scan cuda par with init value * Add thrust::async::incluisve_scan with init value * Add thrust::inclusive_scan tbb with init value * Handle reviews * Consolidate init overloads into a single overload that accepts both init and binary_op * Fix formatting issues * Add cuda::std::accumulator_t and use it for value_type in scan algorithms * Redo Bernhard's work and consolidate the two tbb::inclusive_scan bodies * Handle final reviews * Replace cub::accumulator_t with cuda::std::__accumulator_t

* Add documentation for `any_resource` * Apply suggestions from code review Co-authored-by: Bernhard Manfred Gruber <[email protected]> --------- Co-authored-by: Bernhard Manfred Gruber <[email protected]>

* Implement P0843 `inplace_vector` This implements `inplace_vector` a resizable container with a fixed capacity that stores its elements in a local array. Due to the fact that everything is local we are more or less save regarding host device issues, as long as users do not pass around references of it. The exception guarantees are not 100% clear yet, so I implemented them on a best effort basis. We might need to revisit what we guarantee in case of an exception.

* Unify definition of `_LIBCUDACXX_TYPE_VIS` * Unify definition of `_LIBCUDACXX_HIDDEN` * Unify definition of `_LIBCUDACXX_TEMPLATE_VIS` * Unify definition of `_LIBCUDACXX_ENUM_VIS` * Drop definition of `_LIBCUDACXX_EXCEPTION_ABI` * Drop definition of `_LIBCUDACXX_DLL_VIS` * Drop `_LIBCUDACXX_METHOD_TEMPLATE_IMPLICIT_INSTANTIATION_VIS` * Drop `_LIBCUDACXX_EXPORTED_FROM_ABI` in favor of `_CCCL_VISIBILITY_DEFAULT` * Drop `_LIBCUDACXX_CLASS_TEMPLATE_INSTANTIATION_VIS` * Drop `_LIBCUDACXX_EXTERN_TEMPLATE_TYPE_VIS` * Drop `_LIBCUDACXX_CRT_FUNC` * Drop `_LIBCUDACXX_OVERRIDABLE_FUNC_VIS` in favor of `_CCCL_HOST_DEVICE` * Drop _LIBCUDACXX_TYPE_VIS in favor of _CCCL_TYPE_VISIBILITY_DEFAULT * Drop `_LIBCUDACXX_HIDDEN` in favor of `_CCCL_VISIBILITY_HIDDEN` * Drop `_LIBCUDACXX_ENUM_VIS` in favor of `_CCCL_TYPE_VISIBILITY_DEFAULT` * Drop `_LIBCUDACXX_TEMPLATE_VIS` in favor of `_CCCL_TYPE_VISIBILITY_DEFAULT` * Drop a bunch of unused defines that we inherited from libc++ * Drop `_LIBCUDACXX_ALWAYS_INLINE` in favor of `_CCCL_ALWAYS_INLINE`

* Add CCCL_SEED_COUNT_OVERRIDE env var. This overrides the number of test sections generated by CUB_SEED. It is intended to be used for limiting to a single case for sanitizer runs, etc, but may be useful for fuzzing as well. * Add new env opts for CUB's check allocators. - `CCCL_DEVICE_MEMORY_LIMIT`: Limit the total memory allocations to this number of bytes. - `CCCL_DEBUG_CHECKED_ALLOC_FAILURES`: Print info about each checked alloc failure prior to throwing exceptions. * Add `limited` job for CUB. - Forces CUB_SEED to only generate a single test case, ignoring requested number of seeds. - Limits the total global device memory usage to 8 GiB. - Prints allocation details when checked allocator fails. * Add error checking to build scripts. * Add a `-configure` option to CI scripts that just configures the build and exits. * Add std for new cub job. This way it will reuse the same build job as the other CUB tests.

* Add event_ref.is_done() and update event tests * Update cudax/include/cuda/experimental/__event/event_ref.cuh --------- Co-authored-by: Michael Schellenberger Costa <[email protected]>

…s for future architectures (NVIDIA#2328) * Add combined compute capability query * Store arch traits in device object * Craft arch_traits for future architectures * Add missing return * Add missing noexcept * Add a comment describing compute_capability_t

* Disable exec checks on ranges CPOs We need those for the cudax containers coming in

* Use a negative error code for exit in terminate * Add a global CCCL configuration for disabling exceptions * Make the terminate test a runfail test We return a non-0 return code so... * We cannot fail a test that does not run

* Make the thrust dispatch mechanisms configurable The current dispatch mechanisms trades compile time and binary size for performance and flexibility. Allow users to tune that depending on their needs Co-authored-by: Jake Hemstad <[email protected]>

* Remove Intel LLVM references * Remove IBM/XLC/PowerPC references * Add NVHPC version

This change was erroneously introduced in 91b78d8 Fixes: NVIDIA#2098

* Move towards libfmt, to allow different host compilers * Try and filter changes to Cuda C library * Update ci/inspect_changes.sh Co-authored-by: Leo Fang <[email protected]> * Revert "Move towards libfmt, to allow different host compilers" This reverts commit 3bd8d95. --------- Co-authored-by: Leo Fang <[email protected]>

…VIDIA#2346)

* make `cudax::uninitialized_buffer` usable with `cudax::launch` * test passing a `const` `uninitialized_buffer` to `launch`

* Update Memory Model docs for HMM * Document p2p2NativeAtomicSupported --------- Co-authored-by: Michael Schellenberger Costa <[email protected]>

…or,` (NVIDIA#2349) We need to guard against such iterators in libcu++, so our tests conventionallly contain iterators that delete `operator,`. To allow using thrust with such iterators we need to add the void casts

We cannot constrain the hidden friend comparison functions because that would lead to a constrain recursion in the `resource` concept However, we actually do not need to do that because we can just rely on C++20 operator rewrite to avoid the hidden friend at all. In that case the non-template operator== takes precedence and all is fine

When working in the devcontainers, the clangd/clang-format integration causes extra #includes to be added automatically. They get added because clang's "include what you use" tool (IWYU) doesn't consider transitive includes when determining what headers need to be included. This PR adds the export IWYU pragma to the #includes in CCCL's config headers so that a #include of <cuda/__cccl_config> or <cuda/std/detail/__config> will behave as if all of the transitive headers were included directly. This should be enough to keep IWYU from automatically adding headers it shouldn't.

* Add python cuda_parallel to CI. * Fix cuda.parallel installation order * WAR occupancy/V100 failure * Update cuda_cooperative readme --------- Co-authored-by: Georgy Evtushenko <[email protected]>

* Branch out experimental version of stream_ref * Add tests for the experimental part of stream_ref * Move inequality check * typo * Remove not needed using declaration * Add a TODO to remove NULL stream_ref * Remove TODO and remove NULL stream ref constructor * move runtime api include after the system header decl Co-authored-by: Michael Schellenberger Costa <[email protected]>

* Refactor visibility handling for libcu++ We always want to apply `_CCCL_VISIBILITY_HIDDEN` to effectively all function definitions. This has been a general thorn in our side and also introduced a considerable amout of divergence from libc++. This moves the visibility definition into CCCL and then applies the libcu++ specific extras * Use `_CCCL_HIDE_FROM_ABI` for defaulted SMF * Drop `__MDSPAN_INLINE_FUNCTION` in favor of `_LIBCUDACXX_HIDE_FROM_ABI` * Work around nvcc-11.1 limitation * nvcc really does not like always inline * Placate ICC * Move `_LIBCUDACXX_HIDE_FROM_ABI` into visibility.h * Suppress visibility issues with ICC

* Add missing cuKernelGetFunction call to reduce * Fix format * Move CUfunction getter just before occupancy query * fix format again

the cudax `vector_add` sample has not compiled since [0251ae4]. it was broken by PR NVIDIA#2343, which added a type `::cuda::experimental::stream_ref` distinct from `::cuda::stream_ref`. all unqualified mentions of `stream_ref` within the `cuda::experimental` namespace were made to refer to a different type. this causes problems in `cudax/samples/vector_add/vector.cuh` which has several unqualified uses of `stream_ref`.

Change cudaLaunchConfig_t init to {} Co-authored-by: Michael Schellenberger Costa <[email protected]>

* adds benchmarks for DeviceSelect::Unique * Fix typo --------- Co-authored-by: Michael Schellenberger Costa <[email protected]>

Enable Hopper+ DPX (SIMD) reduction for `uint16_t/int16_t` data types and `Min/Max/Sum` operators

…00) (NVIDIA#2301)

* Add thrust::transform_inclusive_scan with init value implementations * Add tests for thrust::transform_inclusive_scan with init * Add more tests and rebase on bug fix from thrust::inclusive_scan * Add docs * Use __accumulator_t * Fix thrust tests readability with initializer_list and docs identation * Fix docs bugs and use correct accumulator and intermediate result types

This allows to accept more data types beyond arrays of exact static sizes. No SASS changes on CUB device histogram test with CTK 12.6. Fixes NVIDIA#1877 for AgentHistogram

Including benchmarks based on BabelStream Co-authored-by: Georgii Evtushenko <[email protected]>

* Update toolkit to CTK 12.6 12.6 has been released some time ago, so we should update our containers appropriately Co-authored-by: Allison Piper <[email protected]>

…sible (NVIDIA#2384) * use the `__make_integer_seq` intrinsic on MSVC when available * use the `__integer_pack` builtin to implement `make_integer_sequence` when possible * define and use a `_LIBCUDACXX_HAS_INTEGER_PACK` config macro

This implements a wrapper around a `cudaMemPool_t` Fixes NVIDIA#1514

@bernhardmgruber

* Drop implementation of `thrust::pair` and `thrust::tuple` We previously moved them back to proper class definitions, as using alias declarations broke CTAD. Thanks to @bernhardmgruber who realized that instead of making them an alias we can just pull them in and be done with it. Co-authored-by: Bernhard Manfred Gruber <[email protected]>

* Pull out `_LIBCUDACXX_UNREACHABLE` into its own file Also make it available globally * Drop more uses of `cstdlib`

* Share common compiler flags in new CCCL-level targets. * fix up some warnings from cudax/ * Avoid deprecation warning from allocator_traits * Silence conversion warning in tuple_leaf * Suppress conversion warning in complex test * Fix more conversion warnings * Silence warning about signedness of unary minus * Let doxygen ignore warning suppression * Suppress a variety of warnings on MSVC. * Fix unused parameter warning. * Suppress reordering warnings on ICC, too. * More msvc taming. * More warnings. * More MSVC narrowing warning fixes. * Re-enable global suppression of MSVC warnings. There are hundreds of these, and most of them appear to be in the unit tests, not the actual Thrust headers. It's turning into a rabbit hole and dragging beyond the scope of the current PR. Filed NVIDIA#2409 to track the removal of these suppressions. --------- Co-authored-by: Eric Niebler <[email protected]> Co-authored-by: Michael Schellenberger Costa <[email protected]>

…eader (NVIDIA#2406)

* Drop thrusts diagnostic suppression warnings We have global suppression warnings for CCCL, so use them * Avoid warning of possible loss of data when casting to index type

…IDIA#1854) * Implement `cuda::uninitialized_async_buffer` This uninitialized buffer provides a stream ordered allocation of N elements of type T utilitzing a cuda::mr::async_resource to allocate the storage. The buffer takes care of alignment and deallocation of the storage. The user is required to ensure that the lifetime of the memory resource exceeds the lifetime of the buffer. Co-authored-by: Mark Harris <[email protected]>

The FindOpenMP module shipped with CMake started (unnecessarily...) including the SHELL: prefix on it's -fopenmp flag.

* Introduce `cccl_configure_target` cmake function. Currently this encapsulates common operations such as setting the CXX/CUDA standard dialect and binary output locations. * Update CI scripts to prevent unsupported cudax/msvc/c++17 config.

* set .aws/{config,credentials} to read-only, enable bash xtrace, and trap exit to print error logs [skip-matrix] [skip-vdc] [skip-docs] * change how rapids-build-utils stop and restart sccache [skip-matrix] [skip-vdc] [skip-docs] * try using sccache v0.8.1 [skip-matrix] [skip-vdc] [skip-docs] * try killing all sccache processes [skip-matrix] [skip-vdc] [skip-docs] * try killing all sccache processes first [skip-matrix] [skip-vdc] [skip-docs] * check .aws dir exists [skip-matrix] [skip-vdc] [skip-docs] * don't stop the sccache server before running the build commands [skip-matrix] [skip-vdc] [skip-docs] * debug why sccache server won't start [skip-matrix] [skip-vdc] [skip-docs] * define the creds as envvars instead of a file mount [skip-matrix] [skip-vdc] [skip-docs] * set SCCACHE_IDLE_TIMEOUT=0 in CI --------- Co-authored-by: Michael Schellenberger Costa <[email protected]>

@miscco

…(only in README.md). (NVIDIA#2424) * Replace `CUDA C++ Core Libraries` with `CUDA Core Compute Libraries`. * Remove a couple `C++`, based on suggestion by @miscco --------- Co-authored-by: Michael Schellenberger Costa <[email protected]>

…e` that can be copied (NVIDIA#2431) * `uninitialized_buffer::get_resource` returns a ref to an `any_resource` that can be copied * Also update `uninintialized_async_buffer` * Fix doc string --------- Co-authored-by: Michael Schellenberger Costa <[email protected]>

* Rework `cuda::ceil_div` to take different types This replaces `cub::DivideAndRoundUp`

* Remove file annotation from verbose matrix warnings. * Allow 'min', 'max', 'minmax' values for matrix `std` tags. * Error when no supported stds available. * Reduce PR testing matrix. 1. Temporarily remove current nightly matrix pending NVKS bringup. 2. Move current per-PR matrix to nightly. 3. Reduce the number of jobs in the PR matrix while maintaining decent coverage. Before: (total jobs: 437) | 320 | `linux-amd64-cpu16` | 66 | `linux-amd64-gpu-v100-latest-1` | 28 | `linux-arm64-cpu16` | 23 | `windows-amd64-cpu16` After (total jobs: 183) | 126 | `linux-amd64-cpu16` | 26 | `linux-amd64-gpu-v100-latest-1` | 21 | `windows-amd64-cpu16` | 10 | `linux-arm64-cpu16` * Restore old build matrix.

* Implement `cudax::shared_resource` We currently have two basic building blocks around memory resources, `any_resource` and `resource_ref`. However, while they make owning and sharing resources much easier, we can still run into lifetime issues. If a user wants to pass a resource into a library function that might exceed the lifetime of the resource, they would need to move it into an any_resource. However, they also might want to share that resource among multiple functions, e.g a pool allocator. We need a way to properly share a resource in those circumstances. Enter `shared_resource`. Rather than storing an `any_resource` this holds a `shared_ptr<any_resource>`. With that we can happily copy / move them around and without touching the stored resource. Co-authored-by: Eric Niebler <[email protected]> Co-authored-by: Mark Harris <[email protected]>

* Increase the libcu++ timeout We are frequently running into to the current test duration limit of 01:20, so add another 20 minutes to the timeout. --------- Co-authored-by: Allison Piper <[email protected]>

* Move c/include/cccl/*.h files to c/include/cccl/c/*.h * Change `#warning` to `#error` (to improve the user experience). * Add comments to preprocessor conditionals. Co-authored-by: Michael Schellenberger Costa <[email protected]> * Add comments to preprocessor conditionals. Co-authored-by: Michael Schellenberger Costa <[email protected]> * Add comment to preprocessor conditional. --------- Co-authored-by: Michael Schellenberger Costa <[email protected]> Co-authored-by: Allison Piper <[email protected]>

* Rename `async_any_resource` to `any_async_resource` * Add a way of constructing an `any_{async_}resource` from a set of arguments and a tag type --------- Co-authored-by: Allison Piper <[email protected]>

* Fix issues with `__host__` and `__device__` definitions We currently only checked whether we are compiling with a cuda compiler, but not whether we are actually compiling in cuda mode. That meant that certain macros werent properly defined. * Simplify and account for nvhpc

…2434) * Move `__is_nvbf16` and `__is_nvfp16` to their own file * Make `bit_cast` play nice with extended floating point types

That might conflict with the host standard library

* Convert redundant assignments to initializers list in thrust tests * done :)

…n hold both a green ctx and a device (NVIDIA#2446)

* Fix typo in CCCLBuildCompilerTargets.cmake

… thrust as that is now in the common compile targets (NVIDIA#2450)

* Migrate CMake packages to central location. * Remove obsolete infra tests. These tests check that consumers can add_subdirectory(cccl/cub) add_subdirectory(cccl/thrust) to add Thrust or CUB to their build. This hasn't been sensible or supported since we migrated to the monorepo. Now that we're moving shared CMake infra to the CCCL, these tests are finally actually failing. * Migrate install rules to top-level, use `cccl_generate_install_rules` helper to reduce redundancy. * Remove libcudacxx export tests. These tests are redundant with CCCL-level tests, and no longer function now that install rules are centralized at the top level. * Make std arg for test_thrust.ps1 optional. * Ensure that cudax::Thrust target is only created once.

* Prune CUB's ChainedPolicy by __CUDA_ARCH_LIST__ Co-authored-by: Elias Stehle <[email protected]> * Workaround GCC 13 error: ``` /home/coder/cccl/thrust/thrust/cmake/../../thrust/iterator/detail/transform_input_output_iterator.inl:68:9: error: writing 1 byte into a region of size 0 [-Werror=stringop-overflow=] 68 | *io = output_function(x); | ~~~~^~~~~~~~~~~~~~~~~~~~~ ``` * Apply feedback from elstehle * Apply suggestion from elstehle * returns cuda error if ptx and arch list mismatch * adds comment on ptxversion requirements --------- Co-authored-by: Elias Stehle <[email protected]>

* Add CCCL_ENABLE_WERROR flag. Also move a stray compiler definition to the compiler interface targets. * Fix typo

…d segments (NVIDIA#2254)

* Propagate compiler flags down to libcu++ LIT tests In order to run libcu++ tests with clang lit has to use correct compiler settings. It currently appears to rely on clang automatically finding and using the default CUDA installation, but that's not necessarily the CUDA that the build itself may have been configured to use. Clang builds may also require building them with the host libc++, instead of the default libstdc++. That requires propagation of the top-level CFLAGS specified by the user.

This was breaking some other PR and we never use them anymore

Using typename in a template template parameter is a C++17 extension and clang warns about that as a cuda compiler

* add span to example and template block size

…ith clang-19 (NVIDIA#2482) error: a template argument list is expected after a name prefixed by the template keyword [-Wmissing-template-arg-list-after-template-kw] 349 | IterateThreadStore<0, VOLATILE_MULTIPLE>::template Dereference(

…2478) * Split heavy heterogeneous atomic tests * Split integral atomics tests * Split heavy alg.copy test

We always unconditionally define no availability anyway and we do not really care about the macOS host standard library

…ore Compute Libraries`. (NVIDIA#2447) * Do not include our own string.h file (NVIDIA#2444) That might conflict with the host standard library * Programmatically (`sed`) replace `CUDA C++ Core Libraries` with CUDA Core Compute Libraries`. NO MANUAL CHANGES. Command used: ``` for fn in `git grep 'CUDA C++ Core Libraries' | cut -d':' -f1 | uniq`; do sed -i .bak 's/CUDA C++ Core Libraries/CUDA Core Compute Libraries/g' $fn; done ``` * Extend `=======================` line in .rst file to match new name. * Revert changes in all .cuh, .cu, .h, .cpp files. Rationale: These files actually have a C++-specific license. It's left for later to change the license and the notices together. * Revert changes in docs/ subdirectory. --------- Co-authored-by: Michael Schellenberger Costa <[email protected]>

…VIDIA#2483) * Fix `common_type` specialization for extended floating point types The machinery we had in place was not really suited to specialize `common_type` because it would take precendence over the actual implementation of `common_type` In that case, we only specialized `common_type<__half, __half>` but not `common_type<__half, __half&>` and so on. This shows how brittle the whole thing is and that it is not extensible. Rather than putting another bandaid over it, add a proper 5th step in the common_type detection that properly treats combinations of an extended floating point type with an arithmetic type. Allowing arithmetic types it necessary to keep machinery like `pow(__half, 2)` working. Fixes [BUG]: `is_common_type` trait is broken when mixing rvalue references NVIDIA#2419 * Work around MSVC declval bug

There is an incredible compiler bug reported in nvbug4867473 where the use of system header changes the way some types are instantiated. The culprit seems to be that within a system header the compiler accepts narrowing conversions that it should not accept Work around it by moving __is_non_narrowing_convertible to its own header that is included before we define the system header machinery

…erty (NVIDIA#2489) Currently we implicitly assumed that any resource that had no execution space property was host accessible. However, that is not a good design, as it provides a source of surprise and numerous challenges with proper type matching down the road. So rather than implicitly assuming that something is host accessible, we require the user to always provide at least one execution space property.

* Move builtin detection to its own file * Try to reenable more builtins * Address review comments

This is used in the `cudax::vector` PR and the only dependency change of libcu++ which blows up the CI

* replace `_CCCL_ALWAYS_INLINE` with `_CCCL_FORCEINLINE` * Avoid GCC optimizer doing things * try avoiding the gcc optimizer bug a different way --------- Co-authored-by: Michael Schellenberger Costa <[email protected]>

)

* Drop badly named `_One_of` concept First ig goes against the naming convention we have with both files and concepts. Furthermore, we should rather use the new type set facility * Rename to `_is_included_in` * Fix concept issue

We currently do not have proper assertions within CCCL. There are different approaches in cub thrust and libcu++, some of which are completely broken. This tries to rework the assertion handlers so that they work uniformly everywhere and can be selectively enabled.

It's only used for tests, it doesn't need to be linked to all cudax targets.

* Abstract and consolidate CPM calls. * Update CPM used in example projects.

* Refactor away per-project TOPLEVEL flags. These will never be toplevel projects ever again now that we're a monorepo. They're redundant with the `CCCL_ENABLE_<proj>` flags. * Disable cudax when unstable is off.

…VIDIA#2429) * Brute force experiment: Which tests fail after adding an `assert False`? * Do not include our own string.h file (NVIDIA#2444) That might conflict with the host standard library * Add `_dtype_validation()` in python/cuda_parallel/cuda/parallel/experimental/__init__.py and fix bug in python/cuda_parallel/tests/test_reduce_api.py * Add `test_device_reduce_dtype_mismatch()`. Capture `dtype`s only in ctor (not entire arrays). * Undo change in .gitignore * Move `min_op()` back into `test_device_reduce_success()` to unbreak sphinx documentation. Also fix existing typo. * Move `test_device_reduce_dtype_mismatch()` from test_reduce_api.py to test_reduce.py * Add TODO POINTER vs ITERATOR --------- Co-authored-by: Michael Schellenberger Costa <[email protected]>

…orm` (NVIDIA#2509)

* Work in progress * Initial draft of exposing CUB::for in cccl/c. * Add a fairly cursed storage abstraction for kernel arguments. * Fix cccl/c include path. * Commit WIP * Make reduce inline functions static. * Initial draft of iterator support for C device for * Remove small vector (for now) * Fixup and remove debug build option from for.cu. * Disable iterator support in foreach (for now) * whitespace fixup * Restore LDL/STL call ABI checks that were commented out earlier. Co-authored-by: Georgii Evtushenko <[email protected]> * Fix missing context push, Fix terminating instead of returning error. * alignof instead of signof, use `value_t` for value type of kernel. * Fix missing header in for.cu --------- Co-authored-by: Georgii Evtushenko <[email protected]>

…artition` (NVIDIA#2400) * adds streaming selection and partition * ensures policy lookup uses per-partition offset type * mitigates perf degradation on select * makes device interfaces use i64 num_items * updates select::if large num_items tests * fixes syntax * adds tests for large num_items for select::flagged * adds tests for large num_items for partition::if * adds tests for large num_items for partition::flagged * fixes redundant definition * fixes implicit conversion * fixes f32 select::if perf regression * fixes perf regression for partition * fix feature macro * fixes feature macro * fixes feature macro * silences msvc constant conditional warning * add support for streamin ctx dummy for partition with small offset types * removes superfluous template parameter * adds test for different offset types for partition::if & ::flagged * adds tests and support for streaming select::unique * fixes msvc warning * fixes perf for partition * fixes format * fixes mixup for partition perf fix * fixes syntax * fixes partition:flagged perf * fixes perf for partition::flagged * switches unique to always use i64 offset types * adds benchmark for partition with distinct iterators * resolves merge conflicts * fixes merge conflict * makes sass identical to main for i32 partition * updates thrust copy_if to always use i64 offset types * fixes formatting * minor style improvements * addresses review comments * fixes conditional type usage * makes tests on empty input more robust * restores empty problem behaviour * adds comment on const ref

… of algorithms (NVIDIA#2477) * experimenting with bool tile state * fixes perf regression from different tile state * fixes support for large offset types * adapts interface for scanbykey * adds tests for large number of items for scanbykey * fixes naming * makes thrust scan_by_key use unsigned offset types * moves scan_by_key_op to detail ns

Integrate c/parallel into CCCL, setup CI, etc.

- Global instead of local - Use `cuda/experimental` - s/.h$/.cuh$/g

These currently require different compile options from the rest of CudaX.

These functions are used in both logical_data.cuh and acquire_release.cuh. Move them to where they'll be visible to both.

Cudastf #1

Cudastf #1

Commits on Jul 17, 2024

Commits on Jul 18, 2024

Commits on Jul 19, 2024

Commits on Jul 20, 2024

Commits on Jul 22, 2024

Commits on Jul 23, 2024

Commits on Jul 24, 2024

Commits on Jul 25, 2024

Commits on Jul 26, 2024

Commits on Jul 27, 2024

Commits on Jul 29, 2024

Commits on Jul 30, 2024

Commits on Jul 31, 2024

Commits on Aug 1, 2024

Commits on Aug 2, 2024

Commits on Aug 3, 2024

Commits on Aug 5, 2024

Commits on Aug 6, 2024

Commits on Aug 7, 2024

Commits on Aug 8, 2024

Commits on Aug 9, 2024

Commits on Aug 12, 2024

Commits on Aug 13, 2024

Commits on Aug 14, 2024

Commits on Aug 15, 2024

Commits on Aug 16, 2024

Commits on Aug 19, 2024

Commits on Aug 20, 2024

Commits on Aug 21, 2024

Commits on Aug 23, 2024

Commits on Aug 25, 2024

Commits on Aug 26, 2024

Commits on Aug 27, 2024

Commits on Aug 28, 2024

Commits on Aug 29, 2024

Commits on Aug 30, 2024

Commits on Aug 31, 2024

Commits on Sep 2, 2024

Commits on Sep 3, 2024

Commits on Sep 4, 2024

Commits on Sep 5, 2024

Commits on Sep 6, 2024

Commits on Sep 8, 2024

Commits on Sep 9, 2024

Commits on Sep 10, 2024

Commits on Sep 11, 2024

Commits on Sep 12, 2024

Commits on Sep 16, 2024

Commits on Sep 17, 2024

Commits on Sep 18, 2024

Commits on Sep 19, 2024

Commits on Sep 20, 2024

Commits on Sep 24, 2024

Commits on Sep 25, 2024

Commits on Sep 26, 2024

Commits on Sep 27, 2024

Commits on Sep 28, 2024

Commits on Sep 30, 2024

Commits on Oct 1, 2024

Commits on Oct 2, 2024

Commits on Oct 3, 2024

Commits on Oct 4, 2024

Commits on Oct 5, 2024

Commits on Oct 7, 2024

Commits on Oct 8, 2024

Commits on Oct 9, 2024

Commits on Oct 10, 2024