Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cudastf #1

Closed
wants to merge 462 commits into from
Closed

Cudastf #1

wants to merge 462 commits into from
This pull request is big! We’re only showing the most recent 250 commits.

Commits on Jul 17, 2024

  1. Configuration menu
    Copy the full SHA
    cc316db View commit details
    Browse the repository at this point in the history
  2. Use libcu++ void_t everywhere (NVIDIA#1977)

    * Make libcu++ void_t available in C++11
    * Replace uses of CUB and Thrust's void_t by libcu++
    * Deprecate Thrust's void_t
    * Deprecate libcu++'s __void_t
    bernhardmgruber authored Jul 17, 2024
    Configuration menu
    Copy the full SHA
    72f77c1 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    be91914 View commit details
    Browse the repository at this point in the history

Commits on Jul 18, 2024

  1. Configuration menu
    Copy the full SHA
    64e7a06 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    87d0849 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    92b4b0b View commit details
    Browse the repository at this point in the history
  4. Add CI slack notifications. (NVIDIA#1961)

    * Fix divide by zero in `parse-job-times.py`
    
    * Add CI slack notifications.
    alliepiper authored Jul 18, 2024
    Configuration menu
    Copy the full SHA
    56d99db View commit details
    Browse the repository at this point in the history
  5. Allow nightly workflow to be manually invoked. (NVIDIA#2007)

    [skip-rapids][skip-vdc][skip-matrix][skip-docs]
    alliepiper authored Jul 18, 2024
    Configuration menu
    Copy the full SHA
    fc457b4 View commit details
    Browse the repository at this point in the history
  6. Need to use a different approach to reuse secrets in reusable workflo…

    …ws vs. actions. (NVIDIA#2008)
    
    [skip-matrix][skip-vdc][skip-docs][skip-rapids]
    alliepiper authored Jul 18, 2024
    Configuration menu
    Copy the full SHA
    eb62dc6 View commit details
    Browse the repository at this point in the history

Commits on Jul 19, 2024

  1. Enable RAPIDS builds for manually dispatched workflows. (NVIDIA#2009)

    [skip-rapids][skip-vdc][skip-matrix][skip-docs]
    alliepiper authored Jul 19, 2024
    Configuration menu
    Copy the full SHA
    97e699f View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    2ff83a2 View commit details
    Browse the repository at this point in the history
  3. Add github token to nightly workflow-results action. (NVIDIA#2012)

    This is needed to fetch the runtime info from the GHA API.
    
    This only modifies the nightly workflow, the PR tests are unaffected:
    [skip-matrix][skip-vdc][skip-rapids][skip-docs]
    alliepiper authored Jul 19, 2024
    Configuration menu
    Copy the full SHA
    8a5e56a View commit details
    Browse the repository at this point in the history

Commits on Jul 20, 2024

  1. Configuration menu
    Copy the full SHA
    e5fcebe View commit details
    Browse the repository at this point in the history

Commits on Jul 22, 2024

  1. Configuration menu
    Copy the full SHA
    496d88d View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    e61bafe View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    b8116c3 View commit details
    Browse the repository at this point in the history
  4. Do no redefine __ELF__ macro (NVIDIA#2018)

    We currently employ a workaround against an nvc++ bug where `__ELF__` is not properly defined.
    
    However, we should not define that macro if it is already present.
    
    Fixes [BUG]: incompatible redefinition of macro "__ELF__" with NVC++ host compiler NVIDIA#1995
    miscco authored Jul 22, 2024
    Configuration menu
    Copy the full SHA
    1b16af7 View commit details
    Browse the repository at this point in the history

Commits on Jul 23, 2024

  1. Port thrust::merge[_by_key] to CUB (NVIDIA#1817)

    * Refactor thrust/CUB merge
    * Port thurst::merge[_by_key] to cub::DeviceMerge
    
    Fixes NVIDIA#1763
    
    Co-authored-by: Georgii Evtushenko <[email protected]>
    bernhardmgruber and gevtushenko authored Jul 23, 2024
    Configuration menu
    Copy the full SHA
    8635429 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    53fe08f View commit details
    Browse the repository at this point in the history

Commits on Jul 24, 2024

  1. Configuration menu
    Copy the full SHA
    18cd90f View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    f6d3d0b View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    a69c8ac View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    82a3ed0 View commit details
    Browse the repository at this point in the history

Commits on Jul 25, 2024

  1. Avoid ADL issues with thrust::distance (NVIDIA#2053)

    We almost always pull in `cuda::std::distance`, which would be ambiguous with `thrust::distance` if the used type pulls in `cuda::std::` within a thrust algorithm
    miscco authored Jul 25, 2024
    Configuration menu
    Copy the full SHA
    46759c5 View commit details
    Browse the repository at this point in the history
  2. Simplify thrust::detail::wrapped_function (NVIDIA#2019)

    Co-authored-by: Michael Schellenberger Costa <[email protected]>
    bernhardmgruber and miscco authored Jul 25, 2024
    Configuration menu
    Copy the full SHA
    e25344c View commit details
    Browse the repository at this point in the history
  3. Add a test for Thrust scan with non-commutative op (NVIDIA#2024)

    * Add a test for Thrust scan with non-commutative op
    * Fix printing mismatching sequences of non-addable types before C++17 in Thrust unit tests
    bernhardmgruber authored Jul 25, 2024
    Configuration menu
    Copy the full SHA
    5ba23b6 View commit details
    Browse the repository at this point in the history
  4. Update memory_resource docs (NVIDIA#1883)

    Move the docs to rst format and also fix some minor issues in the documentation
    
    Co-authored-by: Bernhard Manfred Gruber <[email protected]>
    miscco and bernhardmgruber authored Jul 25, 2024
    Configuration menu
    Copy the full SHA
    30eaa9c View commit details
    Browse the repository at this point in the history
  5. Temporarily switch nightly H100 CI to build-only. (NVIDIA#2060)

    These are on a testing pool and the machines are currently experiencing infra failures.
    alliepiper authored Jul 25, 2024
    Configuration menu
    Copy the full SHA
    04db77a View commit details
    Browse the repository at this point in the history
  6. Do not rely on conversions between float and extended floating point …

    …types (NVIDIA#2046)
    
    The issue we have is that our tests rely extensively on those conversions which makes it incredibly painfull to test
    miscco authored Jul 25, 2024
    Configuration menu
    Copy the full SHA
    1797742 View commit details
    Browse the repository at this point in the history

Commits on Jul 26, 2024

  1. experimental wrapper types for cudaEvent_t that provide a modern C+…

    …+ interface. (NVIDIA#2017)
    
    * Wrapper types for `cudaEvent_t` that provide a modern C++ interface.
    
    * `cuda::experimental::event_ref` is a non-owning wrapper around a
      `cudaEvent_t`.
    
    * `cuda::experimental::event` is an owning wrapper around a `cudaEvent_t`.
    
    * `cuda::experimental::timed_event` is a `cuda::experimental::event` that also
      records the time at which it was recorded.
    
    * apparently `__event` is a word of power for msvc
    
    * represent the elapsed time between two events with nanoseconds instead of microsoconds
    
    according to the CUDA docs for `cudaEventElapsedTime`, the elapsed time
    has sub-microsecond resolution, so it is more appropriate to represent
    it in nanoseconds.
    
    * prune unused headers, switch to rst-friendly doxygen comment style
    
    * add class synopsis comments
    
    * construct with a stream_ref and record the event on construction
    
    * review feedback
    
    * tests for `cudax::event` and `cudax::timed_event`
    
    * change `event_ref::wait` to use `cudaEventSynchronize`
    
    * Use a struct for windows instead
    
    * Do not include superfluous config header
    
    * Add clang-format rule for cudax
    
    * Spell `cudax_add_catch2_test` correctly
    
    * Fix formatting issues
    
    ---------
    
    Co-authored-by: Michael Schellenberger Costa <[email protected]>
    ericniebler and miscco authored Jul 26, 2024
    Configuration menu
    Copy the full SHA
    a4cd52e View commit details
    Browse the repository at this point in the history
  2. [CUDAX] Add a dummy device struct for now (NVIDIA#2066)

    * Add dummy device struct
    * add `__scoped_device` for changing and restoring the current device
    
    ---------
    
    Co-authored-by: Michael Schellenberger Costa <[email protected]>
    Co-authored-by: Eric Niebler <[email protected]>
    3 people authored Jul 26, 2024
    Configuration menu
    Copy the full SHA
    c60d687 View commit details
    Browse the repository at this point in the history
  3. Allow (somewhat) different input value types for merge (NVIDIA#2075)

    * Add a cuDF inspired test for merge_by_key
    * Allow CUB MergePath to support iterators with different value types
    * Allow different input value types for merge, as long as they are convertible to the value type of the first iterator. This weakens the publicly documented guarantees of equal value types to restore the old behavior of the thrust implementation replaced in NVIDIA#1817.
    bernhardmgruber authored Jul 26, 2024
    Configuration menu
    Copy the full SHA
    4b9de3b View commit details
    Browse the repository at this point in the history
  4. Avoid ::result_type for partial sums in TBB reduce_by_key (NVIDIA#1998

    )
    
    This allows us to get rid of partial_sum_type, which still uses the C++11-deprecated function object API ::result_type.
    
    Co-authored-by: Georgii Evtushenko <[email protected]>
    bernhardmgruber and gevtushenko authored Jul 26, 2024
    Configuration menu
    Copy the full SHA
    b761538 View commit details
    Browse the repository at this point in the history

Commits on Jul 27, 2024

  1. Configuration menu
    Copy the full SHA
    a8db0a9 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    94c86b6 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    ad57b1e View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    59d7a4b View commit details
    Browse the repository at this point in the history

Commits on Jul 29, 2024

  1. Configuration menu
    Copy the full SHA
    8a185fe View commit details
    Browse the repository at this point in the history

Commits on Jul 30, 2024

  1. add support to cudax::device for querying a device's attributes (NV…

    …IDIA#2084)
    
    this commit encodes all of the device attributes as query objects. whereas the CUDA runtime has `cudaDevAttrMaxThreadsPerBlock`, cudax has `device::attrs::max_threads_per_block`. Querying a device looks like:
    
    ```
    device dev0(0);
    auto tpb = dev0.attr(device::attrs::max_threads_per_block);
    ```
    
    this syntax also works, which is intended to help people already familiar with the C-style interface:
    
    ```
    device dev0(0);
    auto tpb = dev0.attr<::cudaDevAttrMaxThreadsPerBlock>();
    ```
    
    the attribute tags are convertible to `::cudaDeviceAttr`, so that `device::attrs::max_threads_per_block == ::cudaDevAttrMaxThreadsPerBlock`.
    
    some effort has been made to make the query return types correct. for instance, `dev0.attr(device::attrs::managed_memory)` returns a `bool`, and `dev0.attr(device::attrs::compute_mode)` returns a value from the `::cudaComputeMode` enumeration.
    
    the attributes that return enumerations provide convenience aliases for the enum values. so this:
    
    ```
    if (dev0.attr<::cudaDevAttrComputeMode>() == ::cudaComputeModeDefault)
    ```
    
    is equivalent to:
    
    ```
    if (dev0.attr(device::attrs::compute_mode) == device::attrs::compute_mode._default)
    ```
    
    closes NVIDIA#2083
    
    ---------
    
    Co-authored-by: pciolkosz <[email protected]>
    Co-authored-by: Michael Schellenberger Costa <[email protected]>
    3 people authored Jul 30, 2024
    Configuration menu
    Copy the full SHA
    731c84c View commit details
    Browse the repository at this point in the history
  2. [CUDAX] Add experimental owning abstraction for cudaStream_t (NVIDIA#…

    …2093)
    
    * construct with a stream_ref and record the event on construction
    
    ---------
    
    Co-authored-by: Eric Niebler <[email protected]>
    pciolkosz and ericniebler authored Jul 30, 2024
    Configuration menu
    Copy the full SHA
    15e2ce0 View commit details
    Browse the repository at this point in the history
  3. Do not query NVRTC for cuda runtime header (NVIDIA#2102)

    This may fail as discussed in NVIDIA#2095
    
    Fixes 2095
    miscco authored Jul 30, 2024
    Configuration menu
    Copy the full SHA
    1e67aa7 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    6dfc8dd View commit details
    Browse the repository at this point in the history
  5. Improve binary function objects and replace thrust implementation (NV…

    …IDIA#1872)
    
    * Improve binary function objects and replace thrust implementation
    * simplify use of ::cuda::std binary_function_objects
    * Replace _CCCL_CONSTEXPR_CXX14 with constexpr in all libcudacxx
    binary function objects that are imported in thrust.
    * Determine partial sum type without ::result_type
    * Ignore _LIBCUDACXX_DEPRECATED_IN_CXX11 for doxygen
    
    Co-authored-by: Bernhard Manfred Gruber <[email protected]>
    srinivasyadav18 and bernhardmgruber authored Jul 30, 2024
    Configuration menu
    Copy the full SHA
    4188fb0 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    d92ef23 View commit details
    Browse the repository at this point in the history
  7. Add script to update RAPIDS version. (NVIDIA#2082)

    * Add script to update RAPIDS version.
    
    * Update to 24.10.
    bdice authored Jul 30, 2024
    Configuration menu
    Copy the full SHA
    d4f928e View commit details
    Browse the repository at this point in the history
  8. Update bad links (NVIDIA#2080)

    * fix broken links
    * revert repo.toml
    * linkchecker fixes
    * fix .cuh errors
    * lint
    bryevdv authored Jul 30, 2024
    Configuration menu
    Copy the full SHA
    ce95739 View commit details
    Browse the repository at this point in the history
  9. Configuration menu
    Copy the full SHA
    c0cfbd0 View commit details
    Browse the repository at this point in the history

Commits on Jul 31, 2024

  1. Add internal wrapper for cuda driver APIs (NVIDIA#2070)

    * Add a header to interact with driver APIs
    
    * Add a test for the driver API interaction
    
    * Format
    
    * Fix formatting
    pciolkosz authored Jul 31, 2024
    Configuration menu
    Copy the full SHA
    7a3dae7 View commit details
    Browse the repository at this point in the history
  2. Use common_type for complex pow (NVIDIA#1800)

    * Use `common_type` for complex `pow`
    
    Previously we would rely on our internal `__promote` function.
    
    However, that could have surprising results, e.g. `pow(complexy<float>, int)` would return `complex<double>`
    
    With C++23, this situation got clarified and we should use `common_type` to determine the return type.
    miscco authored Jul 31, 2024
    Configuration menu
    Copy the full SHA
    694e963 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    a2a3824 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    bddcd20 View commit details
    Browse the repository at this point in the history
  5. [PoC]: Implement cuda::experimental::uninitialized_buffer (NVIDIA#1831

    )
    
    * Drop `cuda::get_property` CPO
    
    It serves no purpose as it only ever forwards via ADL and also breaks older nvcc
    
    * Ensure that we test memory resources
    
    * Implement `cuda::uninitialized_buffer`
    
    `cuda::uninitialized_buffer` provides an allocation of `N` elements of type `T` utilitzing a `cuda::mr::resource` to allocate the storage.
    
    `cuda::uninitialized_buffer` takes care of alignment and deallocation of the storage. The user is required to ensure that the lifetime of the memory resource exceeds the lifetime of the buffer.
    miscco authored Jul 31, 2024
    Configuration menu
    Copy the full SHA
    27253d7 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    2600135 View commit details
    Browse the repository at this point in the history

Commits on Aug 1, 2024

  1. Ensure that cuda_memory_resource allocates memory on the proper dev…

    …ice (NVIDIA#2073)
    
    * Ensure that `cuda_memory_resource` allocates memory on the proper device
    
    * Move `__ensure_current_device` to own header
    miscco authored Aug 1, 2024
    Configuration menu
    Copy the full SHA
    39b926a View commit details
    Browse the repository at this point in the history
  2. Clarify compatibility wrt. template specializations (NVIDIA#2138)

    * Clarify compatibility wrt. template specializations
    
    We do not want users to specialize arbitrary templates in CCCL unless otherwise stated. This PR makes this clear in the README.md.
    
    Co-authored-by: Michael Schellenberger Costa <[email protected]>
    bernhardmgruber and miscco authored Aug 1, 2024
    Configuration menu
    Copy the full SHA
    ce4b904 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    fadb135 View commit details
    Browse the repository at this point in the history
  4. Make cuda::std::tuple trivially copyable (NVIDIA#2127)

    * Make `cuda::std::tuple` trivially copyable
    
    This is similar to the situation with `cuda::std::pair`
    We have a lot of users that rely on types being trivially copyable, so that they can utilize memcpy and friends.
    
    Previously, `cuda::std::tuple` did not satisfy this because it needs to handle reference types.
    
    Given that we already specialize `__tuple_leaf` depending on whether the class is empty or not, we can simply
    add a third specialization that handles the trivially copyable types and one that synthesizes assignment.
    
    Co-authored-by: Bernhard Manfred Gruber <[email protected]>
    miscco and bernhardmgruber authored Aug 1, 2024
    Configuration menu
    Copy the full SHA
    4634d81 View commit details
    Browse the repository at this point in the history
  5. Fix missing copy of docs artifacts (NVIDIA#2162)

    Also fix typo in the link
    miscco authored Aug 1, 2024
    Configuration menu
    Copy the full SHA
    cc0b3d1 View commit details
    Browse the repository at this point in the history
  6. Update CODEOWNERS

    jrhemstad authored Aug 1, 2024
    Configuration menu
    Copy the full SHA
    cbe01b0 View commit details
    Browse the repository at this point in the history
  7. Fix g++-14 warning on uninitialized copying (NVIDIA#2157)

    ```
    In function bool cuda::std::__4::__dispatch_memmove(_Up*, _Tp*, size_t)
    ...
    error: *(unsigned char*)(&privatized_decode_op[0]) may be used uninitialized [-Werror=maybe-uninitialized]
    ...
    *(unsigned char*)(&privatized_decode_op[0]) was declared here
     1528 |       PrivatizedDecodeOpT privatized_decode_op[NUM_ACTIVE_CHANNELS]{};
    ```
    bernhardmgruber authored Aug 1, 2024
    Configuration menu
    Copy the full SHA
    02378eb View commit details
    Browse the repository at this point in the history

Commits on Aug 2, 2024

  1. Fix flakey heterogeneous tests (NVIDIA#2085)

    * Fix flakey heterogeneous tests by ensuring only *one* writer exists in parallel between H/D
    
    * Fixup copy paste mistake
    
    * Make host atomics simpler by removing the ugly alignment type
    
    * Fix deadlocks introduced into barrier/semaphore tests
    
    * Revert removing hacky atomic wrapping stuff
    
    * Fix unused warning bug in GCC-6
    wmaxey authored Aug 2, 2024
    Configuration menu
    Copy the full SHA
    cba0345 View commit details
    Browse the repository at this point in the history
  2. Fix multiple definition of InclusiveScanKernel (NVIDIA#2169)

    ```
    Linking CXX executable bin/cub.cpp14.catch2_test.lid_0
    FAILED: bin/cub.cpp14.catch2_test.lid_0
    ...
    /usr/bin/ld: cub/test/CMakeFiles/cub.cpp14.test.warp_scan_api.dir/catch2_test_warp_scan_api.cu.o: in function `InclusiveScanKernel(int*)':
    /usr/local/cuda-12.7/targets/x86_64-linux/include/nvtx3/nvtxDetail/nvtxInitDefs.h:473: multiple definition of `InclusiveScanKernel(int*)'; cub/test/CMakeFiles/cub.cpp14.test.block_scan_api.dir/catch2_test_block_scan_api.cu.o:/usr/local/cuda-12.7/targets/x86_64-linux/include/nvtx3/nvtxDetail/nvtxInitDefs.h:468: first defined here
    collect2: error: ld returned 1 exit status
    
    ```
    bernhardmgruber authored Aug 2, 2024
    Configuration menu
    Copy the full SHA
    24ed47d View commit details
    Browse the repository at this point in the history

Commits on Aug 3, 2024

  1. [CUDAX] Add a global constexpr cudax::devices range for all devices…

    … in the system (NVIDIA#2100)
    
    * add `cuda::devices` vector
    
    the number of cuda devices can be determined by calling
    `cuda::devices.size()`. `cuda::devices` is a range of
    `cuda::device` objects.
    ericniebler authored Aug 3, 2024
    Configuration menu
    Copy the full SHA
    a8ca75c View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    d0254e4 View commit details
    Browse the repository at this point in the history

Commits on Aug 5, 2024

  1. Configuration menu
    Copy the full SHA
    a903dc6 View commit details
    Browse the repository at this point in the history
  2. Fix trivial_copy_device_to_device execution space (NVIDIA#2164)

    * Fix trivial_copy_device_to_device execution space
    
    * Typo
    
    * Format
    
    * Extra empty line
    gevtushenko authored Aug 5, 2024
    Configuration menu
    Copy the full SHA
    9459e4a View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    c65a965 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    e519f25 View commit details
    Browse the repository at this point in the history
  5. Fix the clang-format path in the devcotnainers (NVIDIA#2194)

    In the devcontainers `clang-format` is now installed into `/usr/bin/clang-format`
    miscco authored Aug 5, 2024
    Configuration menu
    Copy the full SHA
    fe27d99 View commit details
    Browse the repository at this point in the history
  6. Mount a build directory for CCCL projects if WSL is detected (NVIDIA#…

    …2035)
    
    Co-authored-by: Michael Schellenberger Costa <[email protected]>
    wmaxey and miscco authored Aug 5, 2024
    Configuration menu
    Copy the full SHA
    d1e7c1c View commit details
    Browse the repository at this point in the history

Commits on Aug 6, 2024

  1. 2118 [CUDAX] Change the RAII device swapper to use driver API and add…

    … it in places where it was missing (NVIDIA#2192)
    
    * Change __scoped_device to use driver API
    
    * Switch to use driver API based dev setter
    
    * Remove constexpr from operator device()
    
    * Fix comments and includes
    
    * Fallback to non-versioned get entry point pre 12.5
    We need to use versioned version to get correct cuStreamGetCtx.
    There is v2 version of it in 12.5, fortunatelly the versioned
    get entry point is available there too
    
    * Fix unused local variable
    
    * Fix warnings in ensure_current_device test
    
    * Move ensure current device out of detail
    
    * Add LIBCUDACXX_ENABLE_EXCEPTIONS to tests cmake
    pciolkosz authored Aug 6, 2024
    Configuration menu
    Copy the full SHA
    75929cb View commit details
    Browse the repository at this point in the history
  2. Fix singular vs plural typo in thread scope documentation. (NVIDIA#2198)

    * Fix singular vs plural typo in thread scope documentation.
    
    * Better grammar fix.
    brycelelbach authored Aug 6, 2024
    Configuration menu
    Copy the full SHA
    1b6dbd4 View commit details
    Browse the repository at this point in the history
  3. [CUDAX] fixing some minor issues with device attribute queries (NVIDI…

    …A#2183)
    
    * [cudax] give the `cudaDevAttrMemoryPoolSupportedHandleTypes` attribute the correct type
    
    * move attribute definitions from `device_ref` to `device`
    ericniebler authored Aug 6, 2024
    Configuration menu
    Copy the full SHA
    2db4fa7 View commit details
    Browse the repository at this point in the history

Commits on Aug 7, 2024

  1. Integrate Python docs (NVIDIA#2196)

    * pass docs build options to repo.sh
    
    * Integrate Python docs
    
    * update CI
    
    * Apply suggestions from code review
    
    Co-authored-by: Georgii Evtushenko <[email protected]>
    
    ---------
    
    Co-authored-by: Georgii Evtushenko <[email protected]>
    bryevdv and gevtushenko authored Aug 7, 2024
    Configuration menu
    Copy the full SHA
    b0e09d0 View commit details
    Browse the repository at this point in the history
  2. [FEA] Atomics codegen refactor (NVIDIA#1993)

    * Initial draft of new atomics backend
    
    * Change atomic fetch ops back to tag dispatch
    
    * Save wip
    
    * Add load/store and support for MMIO
    
    * Begin working on exch
    
    * Enable formatting exchange
    
    * Several signed-ness fixes
    
    * Make atomics ptx tests build. Lit tests are a WIP.
    
    * Fix load/store, some volatileness, and min/max
    
    * Formatting and enabled codegen in all builds
    
    * Make integral.pass.cpp pass
    
    * Make the rest of the atomics tests pass
    
    * Use 128b ld/st instead of vector load as it is not atomic across the whole atom
    
    * Fix copy-paste mistake in load/store
    
    * Whitespace fixup
    
    * Fix 128b .exch using .cas operands
    
    * Make codegen link fmt as PRIVATE
    
    Co-authored-by: Allison Piper <[email protected]>
    
    * Simplify MMIO down to a static array.
    
    Co-authored-by: Bernhard Manfred Gruber <[email protected]>
    
    * Static -> Inline for codegen functions. Replace endl with '\n'.
    
    * Supply the output stream directly to `fmt::format`
    
    * Update fmtlib.
    
    * Revert `fmt::format(out...)` changes. They don't work on MSVC.
    
    * Fixup libcudacxx codegen CMake stuff
    
    * Remove sneaky cstdef include that was auto-added
    
    * [pre-commit.ci] auto code formatting
    
    ---------
    
    Co-authored-by: Allison Piper <[email protected]>
    Co-authored-by: Bernhard Manfred Gruber <[email protected]>
    Co-authored-by: Michael Schellenberger Costa <[email protected]>
    Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
    5 people authored Aug 7, 2024
    Configuration menu
    Copy the full SHA
    62336ad View commit details
    Browse the repository at this point in the history
  3. [CUDAX] add __launch_transform to transform arguments to `cudax::la…

    …unch` prior to launching the kernel (NVIDIA#2202)
    
    * add `__launch_transform` to transform arguments to `cudax::launch` prior to launching the kernel
    ericniebler authored Aug 7, 2024
    Configuration menu
    Copy the full SHA
    47b8f5c View commit details
    Browse the repository at this point in the history

Commits on Aug 8, 2024

  1. Cleanup common testing headers and correct asserts in launch testing (N…

    …VIDIA#2204)
    
    * Cleanup common testing headers
    
    * Add test/common to cmake and fix formatting
    pciolkosz authored Aug 8, 2024
    Configuration menu
    Copy the full SHA
    39fd05e View commit details
    Browse the repository at this point in the history
  2. [CUDAX] Add an API to get device_ref from stream and add comparison o…

    …perator to device_ref (NVIDIA#2203)
    
    * Add a way to compare device_refs
    
    * Add a way to query device_ref from a stream
    
    * Fix Windows missing cast
    
    * Apply suggestions from code review
    
    Co-authored-by: Michael Schellenberger Costa <[email protected]>
    
    * Disallow device comparision with int
    
    ---------
    
    Co-authored-by: Michael Schellenberger Costa <[email protected]>
    pciolkosz and miscco authored Aug 8, 2024
    Configuration menu
    Copy the full SHA
    c9a7b6a View commit details
    Browse the repository at this point in the history
  3. Update devcontainer docs for WSL (NVIDIA#2200)

    * Update README.md
    
    * Update README.md
    
    * Update README.md
    
    * Update README.md
    
    * Update README.md
    
    * [pre-commit.ci] auto code formatting
    
    * Why was 6 afraid of 7?
    
    ---------
    
    Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
    jrhemstad and pre-commit-ci[bot] authored Aug 8, 2024
    Configuration menu
    Copy the full SHA
    3ebf8cc View commit details
    Browse the repository at this point in the history

Commits on Aug 9, 2024

  1. add cudax::distribute<threadsPrBlock>(numElements) as a way to even…

    …ly distribute elements over thread blocks (NVIDIA#2210)
    ericniebler authored Aug 9, 2024
    Configuration menu
    Copy the full SHA
    f95f211 View commit details
    Browse the repository at this point in the history
  2. Rework mdspan concept emulation (NVIDIA#2213)

    It is proving difficult to handle for msvc and also the one we are using in libcu++ it much cleaner
    
    Gets NVIDIA#2160 compiling on MSVC
    miscco authored Aug 9, 2024
    Configuration menu
    Copy the full SHA
    8e20c9a View commit details
    Browse the repository at this point in the history
  3. Un-doc functions taking debug_synchronous (NVIDIA#2209)

    * undoc functions taking debug_synchronous
    bryevdv authored Aug 9, 2024
    Configuration menu
    Copy the full SHA
    7473934 View commit details
    Browse the repository at this point in the history
  4. CUDA vector_add sample project (NVIDIA#2160)

    ---------
    
    Co-authored-by: pciolkosz <[email protected]>
    Co-authored-by: Michael Schellenberger Costa <[email protected]>
    3 people authored Aug 9, 2024
    Configuration menu
    Copy the full SHA
    a3a5f9c View commit details
    Browse the repository at this point in the history

Commits on Aug 12, 2024

  1. avoid constraint recursion in the resource concept (NVIDIA#2215)

    drive-by: avoid potential overload ambiguity in `__launch_transform`
    ericniebler authored Aug 12, 2024
    Configuration menu
    Copy the full SHA
    6ee3415 View commit details
    Browse the repository at this point in the history

Commits on Aug 13, 2024

  1. Configuration menu
    Copy the full SHA
    aaf1340 View commit details
    Browse the repository at this point in the history
  2. Fix including <complex> when bad CUDA bfloat/half macros are used. (N…

    …VIDIA#2226)
    
    * Add <complex> test for bad macros being defined
    
    * Fix <complex> failing upon inclusion when bad macros are defined
    
    * Rather use explicit specializations and some evil hackery to get the complex interop to work
    
    * Fix typos
    
    * Inline everything
    
    * Move workarounds together
    
    * Use conversion functions instead of explicit specializations
    
    * Drop unneeded conversions
    
    ---------
    
    Co-authored-by: Michael Schellenberger Costa <[email protected]>
    wmaxey and miscco authored Aug 13, 2024
    Configuration menu
    Copy the full SHA
    098fb29 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    d7c83fe View commit details
    Browse the repository at this point in the history

Commits on Aug 14, 2024

  1. Configuration menu
    Copy the full SHA
    64d28d1 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    6213a5e View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    2e44b2c View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    352638b View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    dded5f1 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    1981c49 View commit details
    Browse the repository at this point in the history
  7. Fix ForEachCopyN for non-contiguous iterators (NVIDIA#2220)

    By falling back to a non-load-vectorizing code path.
    
    Fixes: NVIDIA#2207
    bernhardmgruber authored Aug 14, 2024
    Configuration menu
    Copy the full SHA
    73df2b0 View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    cbce14b View commit details
    Browse the repository at this point in the history

Commits on Aug 15, 2024

  1. Allow installing components when downstream (NVIDIA#2096)

    In the @acts-project we adopt an (admittedly somewhat unconventional)
    build system in which software A depends on B, and B depends on CCCL.
    The setup is that we want to install B into a prefix, and then try to
    build A against B. The problem arises is that we are using CMake to
    dynamically fetch CCCL using the so-called "FetchContent" mechanism,
    which downloads CCCL and then adds it as a subdirectory.
    
    The core problem is that installing software B which has included CCCL
    does not actually install CCCL in the same prefix, so software A cannot
    then load software B as CCCL is not installed. The reason this happens
    is that CMakeLists.txt:28 (at the time of writing) returns from the
    CMake configuration stage early, and leaves the CUB, Thrust, and
    libcudacxx directories unincluded (see lines 70 to 72).
    
    Although this is, again, an unconventional and rare scenario, it should
    be easy to add support for this kind of build, and I hope the CCCL devs
    would agree that it might be worth doing. In this commit, I remove the
    early return and replace it with additional if-statements. This commit
    should leave any existing workflows completely untouched, but should
    make it easier to use CCCL in the way we do in @acts-project.
    stephenswat authored Aug 15, 2024
    Configuration menu
    Copy the full SHA
    e423412 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    532ff47 View commit details
    Browse the repository at this point in the history

Commits on Aug 16, 2024

  1. Fix and simplify <bit> (NVIDIA#2197)

    * Fix and simplify <bit>
    
    * Make logic for non-constant evaluation simpler in C++14 and greater in <bit>
    
    * Remove use of `std::` in `<bit>`
    
    Co-authored-by: Michael Schellenberger Costa <[email protected]>
    
    * Change bitops tests to prevent constant folding of runtime checks
    
    * Move bit and split implementation details from main header
    
    * Remove volatile from tests in bitops
    
    * Make Windows happy by using `unsigned long`
    
    * Work around being unable to use {} in c++ constexpr functions
    
    * Add a 'default to constexpr' interpretation of is_constant_evaluated for internal use in bitops
    
    * Make windows happy by reusing the default to constexpr hack
    
    * Make bitops tests definitely actually do runtime
    
    * Move <bit> fallbacks into relevant headers
    
    * Fix fallbacks being guarded by MSVC ifdef.
    
    * Keep the license
    
    ---------
    
    Co-authored-by: Michael Schellenberger Costa <[email protected]>
    wmaxey and miscco authored Aug 16, 2024
    Configuration menu
    Copy the full SHA
    16d4fd3 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    fed3ec1 View commit details
    Browse the repository at this point in the history
  3. Make cuda::std::min constexpr in C++11 (NVIDIA#2249)

    This should fix our rmm builds
    miscco authored Aug 16, 2024
    Configuration menu
    Copy the full SHA
    4a5dcc4 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    ba9e9bb View commit details
    Browse the repository at this point in the history

Commits on Aug 19, 2024

  1. Workaround GCC 13 issue with empty histogram decoder op (NVIDIA#2252)

    * Workaround GCC 13 issue
    
    * Update cub/cub/device/dispatch/dispatch_histogram.cuh
    
    Co-authored-by: Michael Schellenberger Costa <[email protected]>
    
    ---------
    
    Co-authored-by: Michael Schellenberger Costa <[email protected]>
    bernhardmgruber and miscco authored Aug 19, 2024
    Configuration menu
    Copy the full SHA
    51c1b22 View commit details
    Browse the repository at this point in the history

Commits on Aug 20, 2024

  1. Configuration menu
    Copy the full SHA
    da9b7dd View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    f871aeb View commit details
    Browse the repository at this point in the history
  3. Add tests for transform_iterator's reference type (NVIDIA#2221)

    * Fix typo
    * Add tests for transform_iterator's reference type
    bernhardmgruber authored Aug 20, 2024
    Configuration menu
    Copy the full SHA
    38d5787 View commit details
    Browse the repository at this point in the history
  4. Small tuning script output improvements (NVIDIA#2262)

    * Report mismatched values in first_val
    * Improve output while search.py is running
    bernhardmgruber authored Aug 20, 2024
    Configuration menu
    Copy the full SHA
    c92e8d4 View commit details
    Browse the repository at this point in the history
  5. Fix Thrust::vector ctor selection for int,int (NVIDIA#2261)

    thrust::device_vector<int> v(5, 10) should create a vector with 5 integers of value 10, and not attempt the iterator pair constructor.
    bernhardmgruber authored Aug 20, 2024
    Configuration menu
    Copy the full SHA
    7bec0ce View commit details
    Browse the repository at this point in the history

Commits on Aug 21, 2024

  1. Adds support for large number of items to DeviceScan (NVIDIA#2171)

    * make DeviceScan offset type a template parameter
    
    * updates tests to use device interface
    
    * moves thrust scan to unsigned offset types
    
    * adjusts benchmarks to account for used offset types
    
    * uses dynamic dispatch to unsigned type
    
    * adds tparam docs for NumItemsT
    
    * fixes warning about different signedness comparison
    
    * adds check for negative num_items in thrust::scan
    
    * fixes unused param in is_negative
    elstehle authored Aug 21, 2024
    Configuration menu
    Copy the full SHA
    06e334f View commit details
    Browse the repository at this point in the history
  2. Use/Test radix sort for int128, half, bfloat16 in Thrust (NVIDIA#2168)

    int128 was already working but not covered by a test.
    bernhardmgruber authored Aug 21, 2024
    Configuration menu
    Copy the full SHA
    1e1af8d View commit details
    Browse the repository at this point in the history
  3. Implement C API for device reduction (NVIDIA#2256)

    * Implement C device reduce
    
    * Format
    
    * Fix device-specific module loading
    
    * Don't need a context at build step now
    
    * Address review feedback
    gevtushenko authored Aug 21, 2024
    Configuration menu
    Copy the full SHA
    5a4881b View commit details
    Browse the repository at this point in the history
  4. Move cooperative module (NVIDIA#2269)

    * Move cuda/cooperative to cuda_cooperative
    
    * Update paths to cooperative module
    
    * Fix literal include paths
    gevtushenko authored Aug 21, 2024
    Configuration menu
    Copy the full SHA
    2c1080d View commit details
    Browse the repository at this point in the history
  5. Move compiler version macros into libcu++ (NVIDIA#2250)

    * Move compiler version macros into libcu++
    * Drop some pre-C++11 code paths around thrust::complex
    bernhardmgruber authored Aug 21, 2024
    Configuration menu
    Copy the full SHA
    529f910 View commit details
    Browse the repository at this point in the history

Commits on Aug 23, 2024

  1. Configuration menu
    Copy the full SHA
    d62e979 View commit details
    Browse the repository at this point in the history

Commits on Aug 25, 2024

  1. Adds thrust::tabulate_output_iterator (NVIDIA#2282)

    * adds tabulate output iterator
    
    * uses cccl exec space macros
    
    * addresses review comments
    
    * fixes documentation and example
    
    * moves to using alias template instead of member type
    elstehle authored Aug 25, 2024
    Configuration menu
    Copy the full SHA
    0d0d2d3 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    a15adf3 View commit details
    Browse the repository at this point in the history

Commits on Aug 26, 2024

  1. Flatten forwarding headers (NVIDIA#2284)

    * Flatten `<cuda/std/utility>`
    
    We do not need the indirection anymore
    
    * Flatten <cuda/std/iterator>
    
    We do not need the indirection anymore
    
    * Flatten `<cuda/std/expected>`
    
    * Flatten `<cuda/std/mdspan>`
    
    * Flatten `<cuda/std/ranges>`
    
    * Flatten `<cuda/std/__new_>`
    
    * Flatten `<cuda/std/cassert>`
    
    * Flatten `<cuda/std/initializer_list>`
    
    * Cleanup `<cuda/functional>` and `<cuda/std/functional>`
    
    Those headers dont need any of the deep nesting we had, so clean them up and simplify their dependencies
    
    * Flatten `<cuda/std/type_traits>`
    
    * Flatten `<cuda/std/__memory_>`
    
    * Flatten `<cuda/std/concepts>`
    
    * Drop unused `libcudacxx/include/cuda/std/detail/libcxx/include/ccomplex`
    
    * Flatten `<cuda/std/cfloat>`
    
    * Flatten `<cuda/std/version>`
    
    * Fix formatting issues?
    
    * Add missing include to `thrust/functional.h`
    
    Also fix the cyclic dependency with actor.h
    
    * Suppress MSVC warning
    miscco authored Aug 26, 2024
    Configuration menu
    Copy the full SHA
    c1c1d96 View commit details
    Browse the repository at this point in the history
  2. 2270 static compute capabilities queries (NVIDIA#2271)

    * Architecture traits first draft
    
    * Comment
    
    * Add missing sm 80 and sm 60 arches
    
    * Add missing sm 90
    
    * Move to 100*major+10*minor format
    and remove template arch_traits getter
    
    * Add Ada and cleanup some things
    
    * Move includes after system header pragma
    
    ---------
    
    Co-authored-by: Elias Stehle <[email protected]>
    Co-authored-by: Bernhard Manfred Gruber <[email protected]>
    Co-authored-by: Georgii Evtushenko <[email protected]>
    4 people authored Aug 26, 2024
    Configuration menu
    Copy the full SHA
    03247ab View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    9d4c3a8 View commit details
    Browse the repository at this point in the history

Commits on Aug 27, 2024

  1. Implement any_resource, an owning wrapper around a memory resource (N…

    …VIDIA#2266)
    
    * Implement `any_resource` an owning wrapper around any resource
    
    Addresses NVIDIA#1426
    
    * Continue development of @miscco's `any_resource`
    
    * address review feedback
    
    * [pre-commit.ci] auto code formatting
    
    * mark all deallocation functions as `noexcept`
    
    * fix some test failures
    
    * more tests and bug fixes
    
    * fix more build breaks
    
    * attempt to fix the cudax docs build
    
    * exclude more symbols from the cudax docs
    
    * more portability fixes and doxygen tweaks
    
    * once more with feeling
    
    * getting pretty close now
    
    * fix broken test
    
    * deduplicate `basic_any_resource` constructors to satisfy doxygen
    
    * [pre-commit.ci] auto code formatting
    
    * don't use `if constexpr` when compiling as c++14
    
    * more fixes for doxygen and c++14
    
    * back out a questionable addition of `noexcept`
    
    * molify msvc
    
    * accommodate integer size differences on msvc
    
    * eliminate shadow warning treated as error
    
    * handle duplicate properties without triggering compiler warnings
    
    * [pre-commit.ci] auto code formatting
    
    ---------
    
    Co-authored-by: Michael Schellenberger Costa <[email protected]>
    ericniebler and miscco authored Aug 27, 2024
    Configuration menu
    Copy the full SHA
    f53e725 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    e8939e9 View commit details
    Browse the repository at this point in the history
  3. use NV_IF_TARGET to conditionally compile CUDAX tests (NVIDIA#2297)

    * use `NV_IF_TARGET` to conditionally compile CUDAX tests
    
    * [pre-commit.ci] auto code formatting
    
    ---------
    
    Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
    ericniebler and pre-commit-ci[bot] authored Aug 27, 2024
    Configuration menu
    Copy the full SHA
    92e006b View commit details
    Browse the repository at this point in the history
  4. Make for_each compatible with NVRTC (NVIDIA#2288)

    * Make for_each compatible with NVRTC
    
    * Whitespace fixup.
    
    * Apply suggestion to improve includes.
    wmaxey authored Aug 27, 2024
    Configuration menu
    Copy the full SHA
    f80972b View commit details
    Browse the repository at this point in the history
  5. refactor cmake so more cudax samples can be easily added (NVIDIA#2296)

    * refactor cmake so more cudax samples can be easily added
    
    * [pre-commit.ci] auto code formatting
    
    ---------
    
    Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
    ericniebler and pre-commit-ci[bot] authored Aug 27, 2024
    Configuration menu
    Copy the full SHA
    a5b0a23 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    dd90bed View commit details
    Browse the repository at this point in the history
  7. Implement std::bit_cast (NVIDIA#2258)

    * Implement `std::bit_cast`
    
    This backport C++20 `std::bit_cast` to be available in all standard modes.
    
    As this requires compiler builtin support, we have a non-constexpr workaround with the usual memcpy implementation.
    
    Fixes NVIDIA#2257
    
    * Add additional contraint in the fallback mode
    
    * Use bit_cast in cub
    
    * Formatting fix?
    
    * Fix typo
    miscco authored Aug 27, 2024
    Configuration menu
    Copy the full SHA
    0a1cddb View commit details
    Browse the repository at this point in the history

Commits on Aug 28, 2024

  1. Cleanup the <cuda/std/bit> header (NVIDIA#2299)

    * Move enum `endian` to its own file
    
    * Move `std::rotl` and `std::rotr` to their own file
    
    * Move `std::has_single_bit` to its own file
    
    * Move `countr_{one, zero}` to its ownn file
    
    * Move `countl_{one, zero}` to their own file
    
    * Move `bit_ceil`, `bit_floor` and `bit_width` to their own file
    
    * Cleanup the `<bit>` header
    miscco authored Aug 28, 2024
    Configuration menu
    Copy the full SHA
    490a20f View commit details
    Browse the repository at this point in the history
  2. change cudax::uninitialized_buffer to own its memory resource with …

    …`cudax::any_resource` (NVIDIA#2293)
    
    * Implement `any_resource` an owning wrapper around any resource
    
    Addresses NVIDIA#1426
    
    * Continue development of @miscco's `any_resource`
    
    * address review feedback
    
    * [pre-commit.ci] auto code formatting
    
    * mark all deallocation functions as `noexcept`
    
    * fix some test failures
    
    * more tests and bug fixes
    
    * fix more build breaks
    
    * attempt to fix the cudax docs build
    
    * exclude more symbols from the cudax docs
    
    * more portability fixes and doxygen tweaks
    
    * once more with feeling
    
    * getting pretty close now
    
    * fix broken test
    
    * deduplicate `basic_any_resource` constructors to satisfy doxygen
    
    * [pre-commit.ci] auto code formatting
    
    * don't use `if constexpr` when compiling as c++14
    
    * more fixes for doxygen and c++14
    
    * back out a questionable addition of `noexcept`
    
    * molify msvc
    
    * accommodate integer size differences on msvc
    
    * eliminate shadow warning treated as error
    
    * handle duplicate properties without triggering compiler warnings
    
    * change `uninitialized_buffer` to own its memory resource using `any_resource`
    
    * Use fully qualified name
    
    * Drop `__host__ __device__` from uninitialized_buffer
    
    * Revert "Drop `__host__ __device__` from uninitialized_buffer"
    
    This reverts commit 5115b08.
    
    * Just do the cursed thing
    
    * Add missing include
    
    * Adopt the doc string
    
    ---------
    
    Co-authored-by: Michael Schellenberger Costa <[email protected]>
    Co-authored-by: anon <users.noreply.github.com>
    Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
    3 people authored Aug 28, 2024
    Configuration menu
    Copy the full SHA
    198208a View commit details
    Browse the repository at this point in the history
  3. Documentation typos (NVIDIA#2302)

    * add segmented [radix] sort tests for 64-bit indices
    
    * Add test for device segmented sort pairs with 64-bit indices
    
    * Add a macro guard to protect tests that are not supported yet
    
    * Fix documentation typos
    
    * Remove unrelated files
    fbusato authored Aug 28, 2024
    Configuration menu
    Copy the full SHA
    ec5bd08 View commit details
    Browse the repository at this point in the history
  4. Add thrust::inclusive_scan with init_value support (NVIDIA#1940)

    * Add thrust::inclusive_scan with init value sequential
    
    * Add thrust::inclusive_scan cuda par with init value
    
    * Add thrust::async::incluisve_scan with init value
    
    * Add thrust::inclusive_scan tbb with init value
    
    * Handle reviews
    
    * Consolidate init overloads into a single overload that accepts both init and binary_op
    
    * Fix formatting issues
    
    * Add cuda::std::accumulator_t and use it for value_type in scan algorithms
    
    * Redo Bernhard's work and consolidate the two tbb::inclusive_scan bodies
    
    * Handle final reviews
    
    * Replace cub::accumulator_t with cuda::std::__accumulator_t
    gonidelis authored Aug 28, 2024
    Configuration menu
    Copy the full SHA
    e311e89 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    942f59f View commit details
    Browse the repository at this point in the history
  6. Add documentation for any_resource (NVIDIA#2309)

    * Add documentation for `any_resource`
    
    * Apply suggestions from code review
    
    Co-authored-by: Bernhard Manfred Gruber <[email protected]>
    
    ---------
    
    Co-authored-by: Bernhard Manfred Gruber <[email protected]>
    miscco and bernhardmgruber authored Aug 28, 2024
    Configuration menu
    Copy the full SHA
    7d4be26 View commit details
    Browse the repository at this point in the history

Commits on Aug 29, 2024

  1. Implement P0843 inplace_vector (NVIDIA#1936)

    * Implement P0843 `inplace_vector`
    
    This implements `inplace_vector` a resizable container with a fixed capacity that stores its elements in a local array.
    
    Due to the fact that everything is local we are more or less save regarding host device issues, as long as users do not pass around references of it.
    
    The exception guarantees are not 100% clear yet, so I implemented them on a best effort basis. We might need to revisit what we guarantee in case of an exception.
    miscco authored Aug 29, 2024
    Configuration menu
    Copy the full SHA
    eb87e56 View commit details
    Browse the repository at this point in the history
  2. Cleanup __config and unify most visibility macros (NVIDIA#2285)

    * Unify definition of `_LIBCUDACXX_TYPE_VIS`
    
    * Unify definition of `_LIBCUDACXX_HIDDEN`
    
    * Unify definition of `_LIBCUDACXX_TEMPLATE_VIS`
    
    * Unify definition of `_LIBCUDACXX_ENUM_VIS`
    
    * Drop definition of `_LIBCUDACXX_EXCEPTION_ABI`
    
    * Drop definition of `_LIBCUDACXX_DLL_VIS`
    
    * Drop `_LIBCUDACXX_METHOD_TEMPLATE_IMPLICIT_INSTANTIATION_VIS`
    
    * Drop `_LIBCUDACXX_EXPORTED_FROM_ABI` in favor of `_CCCL_VISIBILITY_DEFAULT`
    
    * Drop `_LIBCUDACXX_CLASS_TEMPLATE_INSTANTIATION_VIS`
    
    * Drop `_LIBCUDACXX_EXTERN_TEMPLATE_TYPE_VIS`
    
    * Drop `_LIBCUDACXX_CRT_FUNC`
    
    * Drop `_LIBCUDACXX_OVERRIDABLE_FUNC_VIS` in favor of `_CCCL_HOST_DEVICE`
    
    * Drop _LIBCUDACXX_TYPE_VIS in favor of _CCCL_TYPE_VISIBILITY_DEFAULT
    
    * Drop `_LIBCUDACXX_HIDDEN` in favor of `_CCCL_VISIBILITY_HIDDEN`
    
    * Drop `_LIBCUDACXX_ENUM_VIS` in favor of `_CCCL_TYPE_VISIBILITY_DEFAULT`
    
    * Drop `_LIBCUDACXX_TEMPLATE_VIS` in favor of `_CCCL_TYPE_VISIBILITY_DEFAULT`
    
    * Drop a bunch of unused defines that we inherited from libc++
    
    * Drop `_LIBCUDACXX_ALWAYS_INLINE` in favor of `_CCCL_ALWAYS_INLINE`
    miscco authored Aug 29, 2024
    Configuration menu
    Copy the full SHA
    10b0d2b View commit details
    Browse the repository at this point in the history
  3. Add a fast, low memory "limited" mode to CUB testing. (NVIDIA#2317)

    * Add CCCL_SEED_COUNT_OVERRIDE env var.
    
    This overrides the number of test sections generated by CUB_SEED.
    It is intended to be used for limiting to a single case for sanitizer runs, etc,
    but may be useful for fuzzing as well.
    
    * Add new env opts for CUB's check allocators.
    
    - `CCCL_DEVICE_MEMORY_LIMIT`: Limit the total memory allocations to this number of bytes.
    - `CCCL_DEBUG_CHECKED_ALLOC_FAILURES`: Print info about each checked alloc failure prior to throwing exceptions.
    
    * Add `limited` job for CUB.
    
    - Forces CUB_SEED to only generate a single test case, ignoring requested number of seeds.
    - Limits the total global device memory usage to 8 GiB.
    - Prints allocation details when checked allocator fails.
    
    * Add error checking to build scripts.
    
    * Add a `-configure` option to CI scripts that just configures the build and exits.
    
    * Add std for new cub job.
    
    This way it will reuse the same build job as the other CUB tests.
    alliepiper authored Aug 29, 2024
    Configuration menu
    Copy the full SHA
    11fc50b View commit details
    Browse the repository at this point in the history
  4. [CUDAX] Add event_ref::is_done() and update event tests (NVIDIA#2304)

    * Add event_ref.is_done() and update event tests
    
    * Update cudax/include/cuda/experimental/__event/event_ref.cuh
    
    ---------
    
    Co-authored-by: Michael Schellenberger Costa <[email protected]>
    pciolkosz and miscco authored Aug 29, 2024
    Configuration menu
    Copy the full SHA
    d862315 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    e42d7b7 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    a7837d3 View commit details
    Browse the repository at this point in the history

Commits on Aug 30, 2024

  1. Configuration menu
    Copy the full SHA
    16096d4 View commit details
    Browse the repository at this point in the history
  2. [CUDAX] Add compute_capability device attribute and handle arch_trait…

    …s for future architectures (NVIDIA#2328)
    
    * Add combined compute capability query
    
    * Store arch traits in device object
    
    * Craft arch_traits for future architectures
    
    * Add missing return
    
    * Add missing noexcept
    
    * Add a comment describing compute_capability_t
    pciolkosz authored Aug 30, 2024
    Configuration menu
    Copy the full SHA
    a9fa9a1 View commit details
    Browse the repository at this point in the history
  3. Disable exec checks on ranges CPOs (NVIDIA#2331)

    * Disable exec checks on ranges CPOs
    
    We need those for the cudax containers coming in
    miscco authored Aug 30, 2024
    Configuration menu
    Copy the full SHA
    95c6ba9 View commit details
    Browse the repository at this point in the history
  4. Enable exceptions by default (NVIDIA#2329)

    * Use a negative error code for exit in terminate
    
    * Add a global CCCL configuration for disabling exceptions
    
    * Make the terminate test a runfail test
    
    We return a non-0 return code so...
    
    * We cannot fail a test that does not run
    miscco authored Aug 30, 2024
    Configuration menu
    Copy the full SHA
    206e745 View commit details
    Browse the repository at this point in the history
  5. Make the thrust dispatch mechanisms configurable (NVIDIA#2310)

    * Make the thrust dispatch mechanisms configurable
    
    The current dispatch mechanisms trades compile time and binary size for performance and flexibility.
    
    Allow users to tune that depending on their needs
    
    Co-authored-by: Jake Hemstad <[email protected]>
    miscco and jrhemstad authored Aug 30, 2024
    Configuration menu
    Copy the full SHA
    89702de View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    a7996f0 View commit details
    Browse the repository at this point in the history

Commits on Aug 31, 2024

  1. Compiler version improvements (NVIDIA#2316)

    * Remove Intel LLVM references
    * Remove IBM/XLC/PowerPC references
    * Add NVHPC version
    fbusato authored Aug 31, 2024
    Configuration menu
    Copy the full SHA
    bb6c7b1 View commit details
    Browse the repository at this point in the history
  2. Fix hardcoding __THRUST_HOST_SYSTEM_NAMESPACE to cpp (NVIDIA#2341)

    This change was erroneously introduced in 91b78d8
    
    Fixes: NVIDIA#2098
    bernhardmgruber authored Aug 31, 2024
    Configuration menu
    Copy the full SHA
    0a40182 View commit details
    Browse the repository at this point in the history

Commits on Sep 2, 2024

  1. Improvements to the Cuda Core C library infrastructure (NVIDIA#2336)

    * Move towards libfmt, to allow different host compilers
    
    * Try and filter changes to Cuda C library
    
    * Update ci/inspect_changes.sh
    
    Co-authored-by: Leo Fang <[email protected]>
    
    * Revert "Move towards libfmt, to allow different host compilers"
    
    This reverts commit 3bd8d95.
    
    ---------
    
    Co-authored-by: Leo Fang <[email protected]>
    miscco and leofang authored Sep 2, 2024
    Configuration menu
    Copy the full SHA
    709ddec View commit details
    Browse the repository at this point in the history

Commits on Sep 3, 2024

  1. Configuration menu
    Copy the full SHA
    498251c View commit details
    Browse the repository at this point in the history
  2. [CUDAX] make uninitialized_buffer usable with launch (NVIDIA#2342)

    * make `cudax::uninitialized_buffer` usable with `cudax::launch`
    
    * test passing a `const` `uninitialized_buffer` to `launch`
    ericniebler authored Sep 3, 2024
    Configuration menu
    Copy the full SHA
    c6b777b View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    5c6b6df View commit details
    Browse the repository at this point in the history
  4. Update Memory Model docs for HMM (NVIDIA#2272)

    * Update Memory Model docs for HMM
    
    * Document p2p2NativeAtomicSupported
    
    ---------
    
    Co-authored-by: Michael Schellenberger Costa <[email protected]>
    gonzalobg and miscco authored Sep 3, 2024
    Configuration menu
    Copy the full SHA
    4297b07 View commit details
    Browse the repository at this point in the history
  5. Update CONTRIBUTING.md

    jrhemstad authored Sep 3, 2024
    Configuration menu
    Copy the full SHA
    457e4d7 View commit details
    Browse the repository at this point in the history
  6. Harden thrust algorithms against evil iterators that overload `operat…

    …or,` (NVIDIA#2349)
    
    We need to guard against such iterators in libcu++, so our tests conventionallly contain iterators that delete `operator,`. To allow using thrust with such iterators we need to add the void casts
    miscco authored Sep 3, 2024
    Configuration menu
    Copy the full SHA
    6b76188 View commit details
    Browse the repository at this point in the history
  7. Avoid circular concept definition with memory resources (NVIDIA#2351)

    We cannot constrain the hidden friend comparison functions because that would lead to a constrain recursion in the `resource` concept
    
    However, we actually do not need to do that because we can just rely on C++20 operator rewrite to avoid the hidden friend at all. In that case the non-template operator== takes precedence and all is fine
    miscco authored Sep 3, 2024
    Configuration menu
    Copy the full SHA
    707ee73 View commit details
    Browse the repository at this point in the history
  8. add IWYU export pragma on config headers (NVIDIA#2352)

    When working in the devcontainers, the clangd/clang-format integration causes extra #includes to be added automatically. They get added because clang's "include what you use" tool (IWYU) doesn't consider transitive includes when determining what headers need to be included.
    
    This PR adds the export IWYU pragma to the #includes in CCCL's config headers so that a #include of <cuda/__cccl_config> or <cuda/std/detail/__config> will behave as if all of the transitive headers were included directly. This should be enough to keep IWYU from automatically adding headers it shouldn't.
    ericniebler authored Sep 3, 2024
    Configuration menu
    Copy the full SHA
    a154e7b View commit details
    Browse the repository at this point in the history

Commits on Sep 4, 2024

  1. Add cuda_parallel to CI. (NVIDIA#2338)

    * Add python cuda_parallel to CI.
    * Fix cuda.parallel installation order
    * WAR occupancy/V100 failure
    * Update cuda_cooperative readme
    
    ---------
    
    Co-authored-by: Georgy Evtushenko <[email protected]>
    alliepiper and gevtushenko authored Sep 4, 2024
    Configuration menu
    Copy the full SHA
    1e9125e View commit details
    Browse the repository at this point in the history
  2. [CUDAX] Branch out an experimental version of stream_ref (NVIDIA#2343)

    * Branch out experimental version of stream_ref
    
    * Add tests for the experimental part of stream_ref
    
    * Move inequality check
    
    * typo
    
    * Remove not needed using declaration
    
    * Add a TODO to remove NULL stream_ref
    
    * Remove TODO and remove NULL stream ref constructor
    
    * move runtime api include after the system header decl
    
    Co-authored-by: Michael Schellenberger Costa <[email protected]>
    pciolkosz and miscco authored Sep 4, 2024
    Configuration menu
    Copy the full SHA
    0251ae4 View commit details
    Browse the repository at this point in the history
  3. Improve visibility macros for libcu++ (NVIDIA#2337)

    * Refactor visibility handling for libcu++
    
    We always want to apply `_CCCL_VISIBILITY_HIDDEN` to effectively all function definitions. This has been a general thorn in our side and also introduced a considerable amout of divergence from libc++.
    
    This moves the visibility definition into CCCL and then applies the libcu++ specific extras
    
    * Use `_CCCL_HIDE_FROM_ABI` for defaulted SMF
    
    * Drop `__MDSPAN_INLINE_FUNCTION` in favor of `_LIBCUDACXX_HIDE_FROM_ABI`
    
    * Work around nvcc-11.1 limitation
    
    * nvcc really does not like always inline
    
    * Placate ICC
    
    * Move `_LIBCUDACXX_HIDE_FROM_ABI` into visibility.h
    
    * Suppress visibility issues with ICC
    miscco authored Sep 4, 2024
    Configuration menu
    Copy the full SHA
    dae826b View commit details
    Browse the repository at this point in the history
  4. Add missing cuKernelGetFunction call to reduce (NVIDIA#2355)

    * Add missing cuKernelGetFunction call to reduce
    
    * Fix format
    
    * Move CUfunction getter just before occupancy query
    
    * fix format again
    pciolkosz authored Sep 4, 2024
    Configuration menu
    Copy the full SHA
    dcb7d51 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    046a761 View commit details
    Browse the repository at this point in the history

Commits on Sep 5, 2024

  1. fix the cudax vector_add sample (NVIDIA#2372)

    the cudax `vector_add` sample has not compiled since [0251ae4]. it was broken by PR NVIDIA#2343, which added a type `::cuda::experimental::stream_ref` distinct from `::cuda::stream_ref`. all unqualified mentions of `stream_ref` within the `cuda::experimental` namespace were made to refer to a different type. this causes problems in `cudax/samples/vector_add/vector.cuh` which has several unqualified uses of `stream_ref`.
    ericniebler authored Sep 5, 2024
    Configuration menu
    Copy the full SHA
    3876dcc View commit details
    Browse the repository at this point in the history
  2. Add -Wmissing-field-initializers to cudax (NVIDIA#2373)

    Change cudaLaunchConfig_t init to {}
    
    Co-authored-by: Michael Schellenberger Costa <[email protected]>
    pciolkosz and miscco authored Sep 5, 2024
    Configuration menu
    Copy the full SHA
    af695d0 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    05e019a View commit details
    Browse the repository at this point in the history

Commits on Sep 6, 2024

  1. Adds benchmarks for DeviceSelect::Unique (NVIDIA#2359)

    * adds benchmarks for DeviceSelect::Unique
    
    * Fix typo
    
    ---------
    
    Co-authored-by: Michael Schellenberger Costa <[email protected]>
    elstehle and miscco authored Sep 6, 2024
    Configuration menu
    Copy the full SHA
    e0dad56 View commit details
    Browse the repository at this point in the history
  2. CUB - Enable DPX Reduction (NVIDIA#2286)

    Enable Hopper+ DPX (SIMD) reduction for `uint16_t/int16_t` data types and `Min/Max/Sum` operators
    fbusato authored Sep 6, 2024
    Configuration menu
    Copy the full SHA
    3adc92a View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    4a32b1c View commit details
    Browse the repository at this point in the history
  4. Add thurst::transform_inclusive_scan with init value (NVIDIA#2326)

    * Add thrust::transform_inclusive_scan with init value implementations
    
    * Add tests for thrust::transform_inclusive_scan with init
    
    * Add more tests and rebase on bug fix from thrust::inclusive_scan
    
    * Add docs
    
    * Use __accumulator_t
    
    * Fix thrust tests readability with initializer_list and docs identation
    
    * Fix docs bugs and use correct accumulator and intermediate result types
    gonidelis authored Sep 6, 2024
    Configuration menu
    Copy the full SHA
    5647255 View commit details
    Browse the repository at this point in the history
  5. Widen histogram agent constructor to more types (NVIDIA#2380)

    This allows to accept more data types beyond arrays of exact static sizes.
    No SASS changes on CUB device histogram test with CTK 12.6.
    
    Fixes NVIDIA#1877 for AgentHistogram
    bernhardmgruber authored Sep 6, 2024
    Configuration menu
    Copy the full SHA
    fcf7c91 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    07fef97 View commit details
    Browse the repository at this point in the history

Commits on Sep 8, 2024

  1. Add cub::DeviceTransform (NVIDIA#2086)

    Including benchmarks based on BabelStream
    
    Co-authored-by: Georgii Evtushenko <[email protected]>
    bernhardmgruber and gevtushenko authored Sep 8, 2024
    Configuration menu
    Copy the full SHA
    71b9f98 View commit details
    Browse the repository at this point in the history

Commits on Sep 9, 2024

  1. Update toolkit to CTK 12.6 (NVIDIA#2348)

    * Update toolkit to CTK 12.6
    
    12.6 has been released some time ago, so we should update our containers appropriately
    
    Co-authored-by: Allison Piper <[email protected]>
    miscco and alliepiper authored Sep 9, 2024
    Configuration menu
    Copy the full SHA
    371a434 View commit details
    Browse the repository at this point in the history
  2. implement make_integer_sequence in terms of intrinsics whenever pos…

    …sible (NVIDIA#2384)
    
    * use the `__make_integer_seq` intrinsic on MSVC when available
    
    * use the `__integer_pack` builtin to implement `make_integer_sequence` when possible
    
    * define and use a `_LIBCUDACXX_HAS_INTEGER_PACK` config macro
    ericniebler authored Sep 9, 2024
    Configuration menu
    Copy the full SHA
    ee9b856 View commit details
    Browse the repository at this point in the history

Commits on Sep 10, 2024

  1. Implement cuda::mr::cuda_async_memory_resource (NVIDIA#1637)

    This implements a wrapper around a `cudaMemPool_t`
    
    Fixes NVIDIA#1514
    miscco authored Sep 10, 2024
    Configuration menu
    Copy the full SHA
    d5492d5 View commit details
    Browse the repository at this point in the history

Commits on Sep 11, 2024

  1. Drop implementation of thrust::pair and thrust::tuple (NVIDIA#2395)

    * Drop implementation of `thrust::pair` and `thrust::tuple`
    
    We previously moved them back to proper class definitions, as using alias declarations broke CTAD.
    
    Thanks to @bernhardmgruber who realized that instead of making them an alias we can just pull them in and be done with it.
    
    Co-authored-by: Bernhard Manfred Gruber <[email protected]>
    miscco and bernhardmgruber authored Sep 11, 2024
    Configuration menu
    Copy the full SHA
    e7ade77 View commit details
    Browse the repository at this point in the history
  2. Pull out _LIBCUDACXX_UNREACHABLE into its own file (NVIDIA#2399)

    * Pull out `_LIBCUDACXX_UNREACHABLE` into its own file
    
    Also make it available globally
    
    * Drop more uses of `cstdlib`
    miscco authored Sep 11, 2024
    Configuration menu
    Copy the full SHA
    1c422f2 View commit details
    Browse the repository at this point in the history

Commits on Sep 12, 2024

  1. Share common compiler flags in new CCCL-level targets. (NVIDIA#2386)

    * Share common compiler flags in new CCCL-level targets.
    
    * fix up some warnings from cudax/
    
    * Avoid deprecation warning from allocator_traits
    
    * Silence conversion warning in tuple_leaf
    
    * Suppress conversion warning in complex test
    
    * Fix more conversion warnings
    
    * Silence warning about signedness of unary minus
    
    * Let doxygen ignore warning suppression
    
    * Suppress a variety of warnings on MSVC.
    
    * Fix unused parameter warning.
    
    * Suppress reordering warnings on ICC, too.
    
    * More msvc taming.
    
    * More warnings.
    
    * More MSVC narrowing warning fixes.
    
    * Re-enable global suppression of MSVC warnings.
    
    There are hundreds of these, and most of them appear to be in the unit tests, not the actual Thrust headers.
    It's turning into a rabbit hole and dragging beyond the scope of the current PR.
    Filed NVIDIA#2409 to track the removal of these suppressions.
    
    ---------
    
    Co-authored-by: Eric Niebler <[email protected]>
    Co-authored-by: Michael Schellenberger Costa <[email protected]>
    3 people authored Sep 12, 2024
    Configuration menu
    Copy the full SHA
    1fe25ed View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    cf21a40 View commit details
    Browse the repository at this point in the history

Commits on Sep 16, 2024

  1. Configuration menu
    Copy the full SHA
    684cf8e View commit details
    Browse the repository at this point in the history
  2. Drop thrusts diagnostic suppression warnings (NVIDIA#2392)

    * Drop thrusts diagnostic suppression warnings
    
    We have global suppression warnings for CCCL, so use them
    
    * Avoid warning of possible loss of data when casting to index type
    miscco authored Sep 16, 2024
    Configuration menu
    Copy the full SHA
    4088134 View commit details
    Browse the repository at this point in the history

Commits on Sep 17, 2024

  1. [PoC]: Implement cuda::experimental::uninitialized_async_buffer (NV…

    …IDIA#1854)
    
    * Implement `cuda::uninitialized_async_buffer`
    
    This uninitialized buffer provides a stream ordered allocation of N elements of type T utilitzing a cuda::mr::async_resource to allocate the storage.
    
    The buffer takes care of alignment and deallocation of the storage. The user is required to ensure that the lifetime of the memory resource exceeds the lifetime of the buffer.
    
    Co-authored-by: Mark Harris <[email protected]>
    miscco and harrism authored Sep 17, 2024
    Configuration menu
    Copy the full SHA
    e3c2e2b View commit details
    Browse the repository at this point in the history

Commits on Sep 18, 2024

  1. Fix thrust package to work with newer FindOpenMP.cmake. (NVIDIA#2421)

    The FindOpenMP module shipped with CMake started
    (unnecessarily...) including the SHELL: prefix on it's -fopenmp flag.
    alliepiper authored Sep 18, 2024
    Configuration menu
    Copy the full SHA
    8ced877 View commit details
    Browse the repository at this point in the history
  2. Introduce cccl_configure_target cmake function. (NVIDIA#2388)

    * Introduce `cccl_configure_target` cmake function.
    
    Currently this encapsulates common operations such as setting the
    CXX/CUDA standard dialect and binary output locations.
    
    * Update CI scripts to prevent unsupported cudax/msvc/c++17 config.
    alliepiper authored Sep 18, 2024
    Configuration menu
    Copy the full SHA
    8f27fba View commit details
    Browse the repository at this point in the history
  3. Fix sccache errors in RAPIDS builds (NVIDIA#2417)

    * set .aws/{config,credentials} to read-only, enable bash xtrace, and trap exit to print error logs [skip-matrix] [skip-vdc] [skip-docs]
    
    * change how rapids-build-utils stop and restart sccache [skip-matrix] [skip-vdc] [skip-docs]
    
    * try using sccache v0.8.1 [skip-matrix] [skip-vdc] [skip-docs]
    
    * try killing all sccache processes [skip-matrix] [skip-vdc] [skip-docs]
    
    * try killing all sccache processes first [skip-matrix] [skip-vdc] [skip-docs]
    
    * check .aws dir exists [skip-matrix] [skip-vdc] [skip-docs]
    
    * don't stop the sccache server before running the build commands [skip-matrix] [skip-vdc] [skip-docs]
    
    * debug why sccache server won't start [skip-matrix] [skip-vdc] [skip-docs]
    
    * define the creds as envvars instead of a file mount [skip-matrix] [skip-vdc] [skip-docs]
    
    * set SCCACHE_IDLE_TIMEOUT=0 in CI
    
    ---------
    
    Co-authored-by: Michael Schellenberger Costa <[email protected]>
    trxcllnt and miscco authored Sep 18, 2024
    Configuration menu
    Copy the full SHA
    2496571 View commit details
    Browse the repository at this point in the history

Commits on Sep 19, 2024

  1. Replace CUDA C++ Core Libraries with CUDA Core Compute Libraries

    …(only in README.md). (NVIDIA#2424)
    
    * Replace `CUDA C++ Core Libraries` with `CUDA Core Compute Libraries`.
    
    * Remove a couple `C++`, based on suggestion by @miscco
    
    ---------
    
    Co-authored-by: Michael Schellenberger Costa <[email protected]>
    rwgk and miscco authored Sep 19, 2024
    Configuration menu
    Copy the full SHA
    52a967f View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    d191102 View commit details
    Browse the repository at this point in the history
  3. uninitialized_buffer::get_resource returns a ref to an `any_resourc…

    …e` that can be copied (NVIDIA#2431)
    
    * `uninitialized_buffer::get_resource` returns a ref to an `any_resource` that can be copied
    
    * Also update `uninintialized_async_buffer`
    
    * Fix doc string
    
    ---------
    
    Co-authored-by: Michael Schellenberger Costa <[email protected]>
    ericniebler and miscco authored Sep 19, 2024
    Configuration menu
    Copy the full SHA
    445fd71 View commit details
    Browse the repository at this point in the history
  4. Refactor cuda::ceil_div to take two different types (NVIDIA#2376)

    * Rework `cuda::ceil_div` to take different types
    
    This replaces `cub::DivideAndRoundUp`
    miscco authored Sep 19, 2024
    Configuration menu
    Copy the full SHA
    b07f036 View commit details
    Browse the repository at this point in the history
  5. Reduce PR testing matrix. (NVIDIA#2436)

    * Remove file annotation from verbose matrix warnings.
    
    * Allow 'min', 'max', 'minmax' values for matrix `std` tags.
    
    * Error when no supported stds available.
    
    * Reduce PR testing matrix.
    
    1. Temporarily remove current nightly matrix pending NVKS bringup.
    2. Move current per-PR matrix to nightly.
    3. Reduce the number of jobs in the PR matrix while maintaining decent coverage.
    
    Before: (total jobs: 437)
    |  320 | `linux-amd64-cpu16`
    |   66 | `linux-amd64-gpu-v100-latest-1`
    |   28 | `linux-arm64-cpu16`
    |   23 | `windows-amd64-cpu16`
    
    After (total jobs: 183)
    |  126 | `linux-amd64-cpu16`
    |   26 | `linux-amd64-gpu-v100-latest-1`
    |   21 | `windows-amd64-cpu16`
    |   10 | `linux-arm64-cpu16`
    
    * Restore old build matrix.
    alliepiper authored Sep 19, 2024
    Configuration menu
    Copy the full SHA
    ee94bb9 View commit details
    Browse the repository at this point in the history
  6. Implement cudax::shared_resource (NVIDIA#2398)

    * Implement `cudax::shared_resource`
    
    We currently have two basic building blocks around memory resources, `any_resource` and `resource_ref`.
    
    However, while they make owning and sharing resources much easier, we can still run into lifetime issues.
    
    If a user wants to pass a resource into a library function that might exceed the lifetime of the resource, they would need to move it into an any_resource.
    
    However, they also might want to share that resource among multiple functions, e.g a pool allocator. We need a way to properly share a resource in those circumstances.
    
    Enter `shared_resource`. Rather than storing an `any_resource` this holds a `shared_ptr<any_resource>`.  With that we can happily copy / move them around and without touching the stored resource.
    
    Co-authored-by: Eric Niebler <[email protected]>
    Co-authored-by: Mark Harris <[email protected]>
    3 people authored Sep 19, 2024
    Configuration menu
    Copy the full SHA
    7bd04ad View commit details
    Browse the repository at this point in the history
  7. Increase the libcu++ timeout (NVIDIA#2435)

    * Increase the libcu++ timeout
    
    We are frequently running into  to the current test duration limit of 01:20, so add another 20 minutes to the timeout.
    
    ---------
    
    Co-authored-by: Allison Piper <[email protected]>
    miscco and alliepiper authored Sep 19, 2024
    Configuration menu
    Copy the full SHA
    5e14128 View commit details
    Browse the repository at this point in the history
  8. Move c/include/cccl/*.h files to c/include/cccl/c/*.h (NVIDIA#2428)

    * Move c/include/cccl/*.h files to c/include/cccl/c/*.h
    
    * Change `#warning` to `#error` (to improve the user experience).
    
    * Add comments to preprocessor conditionals.
    
    Co-authored-by: Michael Schellenberger Costa <[email protected]>
    
    * Add comments to preprocessor conditionals.
    
    Co-authored-by: Michael Schellenberger Costa <[email protected]>
    
    * Add comment to preprocessor conditional.
    
    ---------
    
    Co-authored-by: Michael Schellenberger Costa <[email protected]>
    Co-authored-by: Allison Piper <[email protected]>
    3 people authored Sep 19, 2024
    Configuration menu
    Copy the full SHA
    2fe09c8 View commit details
    Browse the repository at this point in the history
  9. Make any_resource emplacable (NVIDIA#2425)

    * Rename `async_any_resource` to `any_async_resource`
    
    * Add a way of constructing an `any_{async_}resource` from a set of arguments and a tag type
    
    ---------
    
    Co-authored-by: Allison Piper <[email protected]>
    miscco and alliepiper authored Sep 19, 2024
    Configuration menu
    Copy the full SHA
    8b2bf13 View commit details
    Browse the repository at this point in the history
  10. Fix issues with __host__ and __device__ definitions (NVIDIA#2413)

    * Fix issues with `__host__` and `__device__` definitions
    
    We currently only checked whether we are compiling with a cuda compiler, but not whether we are actually compiling in cuda mode.
    
    That meant that certain macros werent properly defined.
    
    * Simplify and account for nvhpc
    miscco authored Sep 19, 2024
    Configuration menu
    Copy the full SHA
    28888eb View commit details
    Browse the repository at this point in the history

Commits on Sep 20, 2024

  1. Make bit_cast play nice with extended floating point types (NVIDIA#…

    …2434)
    
    * Move `__is_nvbf16` and `__is_nvfp16` to their own file
    
    * Make `bit_cast` play nice with extended floating point types
    miscco authored Sep 20, 2024
    Configuration menu
    Copy the full SHA
    31c3eb9 View commit details
    Browse the repository at this point in the history
  2. Do not include our own string.h file (NVIDIA#2444)

    That might conflict with the host standard library
    miscco authored Sep 20, 2024
    Configuration menu
    Copy the full SHA
    92bc4ac View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    9641b7e View commit details
    Browse the repository at this point in the history

Commits on Sep 24, 2024

  1. Remove a ton of lines in thrust tests (NVIDIA#2356)

    * Convert redundant assignments to initializers list in thrust tests
    
    * done :)
    gonidelis authored Sep 24, 2024
    Configuration menu
    Copy the full SHA
    aa1458d View commit details
    Browse the repository at this point in the history
  2. [CUDAX] Add placeholder green context type and logical device that ca…

    …n hold both a green ctx and a device (NVIDIA#2446)
    pciolkosz authored Sep 24, 2024
    Configuration menu
    Copy the full SHA
    6fd1e5c View commit details
    Browse the repository at this point in the history
  3. Fix typo in CCCLBuildCompilerTargets.cmake (NVIDIA#2453)

    * Fix typo in CCCLBuildCompilerTargets.cmake
    alliepiper authored Sep 24, 2024
    Configuration menu
    Copy the full SHA
    0f0fdc2 View commit details
    Browse the repository at this point in the history

Commits on Sep 25, 2024

  1. This drops the duplicated definition of _CCCL_NO_SYSTEM_HEADER from…

    … thrust as that is now in the common compile targets (NVIDIA#2450)
    miscco authored Sep 25, 2024
    Configuration menu
    Copy the full SHA
    17e0c83 View commit details
    Browse the repository at this point in the history
  2. Consolidate packages and install rules (NVIDIA#2456)

    * Migrate CMake packages to central location.
    
    * Remove obsolete infra tests.
    
    These tests check that consumers can
    
    add_subdirectory(cccl/cub)
    add_subdirectory(cccl/thrust)
    
    to add Thrust or CUB to their build.
    This hasn't been sensible or supported since we migrated to the monorepo.
    
    Now that we're moving shared CMake infra to the CCCL, these tests are finally actually failing.
    
    * Migrate install rules to top-level, use `cccl_generate_install_rules` helper to reduce redundancy.
    
    * Remove libcudacxx export tests.
    
    These tests are redundant with CCCL-level tests, and
    no longer function now that install rules are centralized
    at the top level.
    
    * Make std arg for test_thrust.ps1 optional.
    
    * Ensure that cudax::Thrust target is only created once.
    alliepiper authored Sep 25, 2024
    Configuration menu
    Copy the full SHA
    2cbf40b View commit details
    Browse the repository at this point in the history

Commits on Sep 26, 2024

  1. Prune CUB's ChainedPolicy by __CUDA_ARCH_LIST__ (NVIDIA#2154)

    * Prune CUB's ChainedPolicy by __CUDA_ARCH_LIST__
    
    Co-authored-by: Elias Stehle <[email protected]>
    
    * Workaround GCC 13 error:
    
    ```
    /home/coder/cccl/thrust/thrust/cmake/../../thrust/iterator/detail/transform_input_output_iterator.inl:68:9: error: writing 1 byte into a region of size 0 [-Werror=stringop-overflow=]
         68 |     *io = output_function(x);
            |     ~~~~^~~~~~~~~~~~~~~~~~~~~
    ```
    
    * Apply feedback from elstehle
    
    * Apply suggestion from elstehle
    
    * returns cuda error if ptx and arch list mismatch
    
    * adds comment on ptxversion requirements
    
    ---------
    
    Co-authored-by: Elias Stehle <[email protected]>
    bernhardmgruber and elstehle authored Sep 26, 2024
    Configuration menu
    Copy the full SHA
    bda69fd View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    cc01ce7 View commit details
    Browse the repository at this point in the history
  3. Add CCCL_ENABLE_WERROR flag. (NVIDIA#2463)

    * Add CCCL_ENABLE_WERROR flag.
    
    Also move a stray compiler definition to the compiler interface targets.
    
    * Fix typo
    alliepiper authored Sep 26, 2024
    Configuration menu
    Copy the full SHA
    99fb4b4 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    5d45850 View commit details
    Browse the repository at this point in the history

Commits on Sep 27, 2024

  1. Propagate compiler flags down to libcu++ LIT tests (NVIDIA#2420)

    * Propagate compiler flags down to libcu++ LIT tests
    
    In order to run libcu++ tests with clang lit has to use correct compiler settings.
    It currently appears to rely on clang automatically finding and using
    the default CUDA installation, but that's not necessarily the CUDA that
    the build itself may have been configured to use.
    
    Clang builds may also require building them with the host libc++, instead of the default libstdc++.
    That requires propagation of the top-level CFLAGS specified by the user.
    Artem-B authored Sep 27, 2024
    Configuration menu
    Copy the full SHA
    0e09815 View commit details
    Browse the repository at this point in the history

Commits on Sep 28, 2024

  1. Drop remaining uses of _LIBCUDACXX_COMPILER_* (NVIDIA#2467)

    This was breaking some other PR and we never use them anymore
    miscco authored Sep 28, 2024
    Configuration menu
    Copy the full SHA
    467a44d View commit details
    Browse the repository at this point in the history
  2. Avoid C++17 extension in c++11 tests (NVIDIA#2469)

    Using typename in a template template parameter is a C++17 extension and clang warns about that as a cuda compiler
    miscco authored Sep 28, 2024
    Configuration menu
    Copy the full SHA
    7c668e8 View commit details
    Browse the repository at this point in the history
  3. Add span to example and templated block size (NVIDIA#2470)

    * add span to example and template block size
    Kh4ster authored Sep 28, 2024
    Configuration menu
    Copy the full SHA
    e3800d7 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    94e4e75 View commit details
    Browse the repository at this point in the history

Commits on Sep 30, 2024

  1. removes superfluous template keyword that striggers warnings/errors w…

    …ith clang-19 (NVIDIA#2482)
    
    error: a template argument list is expected after a name prefixed by the template keyword [-Wmissing-template-arg-list-after-template-kw]
      349 |     IterateThreadStore<0, VOLATILE_MULTIPLE>::template Dereference(
    andrewcorrigan authored Sep 30, 2024
    Configuration menu
    Copy the full SHA
    242bcce View commit details
    Browse the repository at this point in the history
  2. Improve build times in several heavyweight libcudacxx tests. (NVIDIA#…

    …2478)
    
    * Split heavy heterogeneous atomic tests
    
    * Split integral atomics tests
    
    * Split heavy alg.copy test
    wmaxey authored Sep 30, 2024
    Configuration menu
    Copy the full SHA
    653e546 View commit details
    Browse the repository at this point in the history
  3. Drop __availability header (NVIDIA#2484)

    We always unconditionally define no availability anyway and we do not really care about the macOS host standard library
    miscco authored Sep 30, 2024
    Configuration menu
    Copy the full SHA
    0521015 View commit details
    Browse the repository at this point in the history
  4. Replace a few more instances of CUDA C++ Core Libraries with CUDA C…

    …ore Compute Libraries`. (NVIDIA#2447)
    
    * Do not include our own string.h file (NVIDIA#2444)
    
    That might conflict with the host standard library
    
    * Programmatically (`sed`) replace `CUDA C++ Core Libraries` with CUDA Core Compute Libraries`.
    
    NO MANUAL CHANGES.
    
    Command used:
    
    ```
    for fn in `git grep 'CUDA C++ Core Libraries' | cut -d':' -f1 | uniq`; do sed -i .bak 's/CUDA C++ Core Libraries/CUDA Core Compute Libraries/g' $fn; done
    ```
    
    * Extend `=======================` line in .rst file to match new name.
    
    * Revert changes in all .cuh, .cu, .h, .cpp files.
    
    Rationale: These files actually have a C++-specific license. It's left for later to change the license and the notices together.
    
    * Revert changes in docs/ subdirectory.
    
    ---------
    
    Co-authored-by: Michael Schellenberger Costa <[email protected]>
    rwgk and miscco authored Sep 30, 2024
    Configuration menu
    Copy the full SHA
    725954c View commit details
    Browse the repository at this point in the history

Commits on Oct 1, 2024

  1. Fix common_type specialization for extended floating point types (N…

    …VIDIA#2483)
    
    * Fix `common_type` specialization for extended floating point types
    
    The machinery we had in place was not really suited to specialize `common_type` because it would take precendence over the actual implementation of `common_type`
    
    In that case, we only specialized `common_type<__half, __half>` but not `common_type<__half, __half&>` and so on.
    
    This shows how brittle the whole thing is and that it is not extensible.
    
    Rather than putting another bandaid over it, add a proper 5th step in the common_type detection that properly treats combinations of an extended floating point type with an arithmetic type.
    
    Allowing arithmetic types it necessary to keep machinery like `pow(__half, 2)` working.
    
    Fixes [BUG]: `is_common_type`  trait is broken when mixing rvalue references NVIDIA#2419
    
    * Work around MSVC declval bug
    miscco authored Oct 1, 2024
    Configuration menu
    Copy the full SHA
    81d05bb View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    808f9c2 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    57b9899 View commit details
    Browse the repository at this point in the history
  4. Disable system header for narrowing conversion check (NVIDIA#2465)

    There is an incredible compiler bug reported in nvbug4867473 where the use of system header changes the way some types are instantiated.
    
    The culprit seems to be that within a system header the compiler accepts narrowing conversions that it should not accept
    
    Work around it by moving __is_non_narrowing_convertible to its own header that is included before we define the system header machinery
    miscco authored Oct 1, 2024
    Configuration menu
    Copy the full SHA
    190099c View commit details
    Browse the repository at this point in the history

Commits on Oct 2, 2024

  1. Require resources to always provide at least one execution space prop…

    …erty (NVIDIA#2489)
    
    Currently we implicitly assumed that any resource that had no execution space property was host accessible.
    
    However, that is not a good design, as it provides a source of surprise and numerous challenges with proper type matching down the road.
    
    So rather than implicitly assuming that something is host accessible, we require the user to always provide at least one execution space property.
    miscco authored Oct 2, 2024
    Configuration menu
    Copy the full SHA
    59ad103 View commit details
    Browse the repository at this point in the history
  2. Rework builtin handling (NVIDIA#2461)

    * Move builtin detection to its own file
    
    * Try to reenable more builtins
    
    * Address review comments
    miscco authored Oct 2, 2024
    Configuration menu
    Copy the full SHA
    e4f48cf View commit details
    Browse the repository at this point in the history
  3. Disable execution checks for std::equal (NVIDIA#2491)

    This is used in the `cudax::vector` PR and the only dependency change of libcu++ which blows up the CI
    miscco authored Oct 2, 2024
    Configuration menu
    Copy the full SHA
    ee3bd53 View commit details
    Browse the repository at this point in the history
  4. replace _CCCL_ALWAYS_INLINE with _CCCL_FORCEINLINE (NVIDIA#2439)

    * replace `_CCCL_ALWAYS_INLINE` with `_CCCL_FORCEINLINE`
    
    * Avoid GCC optimizer doing things
    
    * try avoiding the gcc optimizer bug a different way
    
    ---------
    
    Co-authored-by: Michael Schellenberger Costa <[email protected]>
    ericniebler and miscco authored Oct 2, 2024
    Configuration menu
    Copy the full SHA
    0589775 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    25c57f8 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    10769b4 View commit details
    Browse the repository at this point in the history
  7. Drop badly named _One_of concept (NVIDIA#2490)

    * Drop badly named `_One_of` concept
    
    First ig goes against the naming convention we have with both files and concepts.
    
    Furthermore, we should rather use the new type set facility
    
    * Rename to `_is_included_in`
    
    * Fix concept issue
    miscco authored Oct 2, 2024
    Configuration menu
    Copy the full SHA
    5e139af View commit details
    Browse the repository at this point in the history

Commits on Oct 3, 2024

  1. Unify assert handling in cccl (NVIDIA#2382)

    We currently do not have proper assertions within CCCL.
    
    There are different approaches in cub thrust and libcu++, some of which are completely broken.
    
    This tries to rework the assertion handlers so that they work uniformly everywhere and can be selectively enabled.
    miscco authored Oct 3, 2024
    Configuration menu
    Copy the full SHA
    3eee9b2 View commit details
    Browse the repository at this point in the history
  2. Reduce scope of Thrust linkage in cudax. (NVIDIA#2496)

    It's only used for tests, it doesn't need to be linked to all cudax targets.
    alliepiper authored Oct 3, 2024
    Configuration menu
    Copy the full SHA
    bb001b7 View commit details
    Browse the repository at this point in the history
  3. Centralize CPM logic. (NVIDIA#2495)

    * Abstract and consolidate CPM calls.
    
    * Update CPM used in example projects.
    alliepiper authored Oct 3, 2024
    Configuration menu
    Copy the full SHA
    a0ec74c View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    c15546a View commit details
    Browse the repository at this point in the history
  5. Refactor away per-project TOPLEVEL flags. (NVIDIA#2498)

    * Refactor away per-project TOPLEVEL flags.
    
    These will never be toplevel projects ever again now that we're a monorepo.
    They're redundant with the `CCCL_ENABLE_<proj>` flags.
    
    * Disable cudax when unstable is off.
    alliepiper authored Oct 3, 2024
    Configuration menu
    Copy the full SHA
    1cfe171 View commit details
    Browse the repository at this point in the history

Commits on Oct 4, 2024

  1. [FEA]: Validate cuda.parallel type matching in build and execution (N…

    …VIDIA#2429)
    
    * Brute force experiment: Which tests fail after adding an `assert False`?
    
    * Do not include our own string.h file (NVIDIA#2444)
    
    That might conflict with the host standard library
    
    * Add `_dtype_validation()` in python/cuda_parallel/cuda/parallel/experimental/__init__.py and fix bug in python/cuda_parallel/tests/test_reduce_api.py
    
    * Add `test_device_reduce_dtype_mismatch()`.  Capture `dtype`s only in ctor (not entire arrays).
    
    * Undo change in .gitignore
    
    * Move `min_op()` back into `test_device_reduce_success()` to unbreak sphinx documentation. Also fix existing typo.
    
    * Move `test_device_reduce_dtype_mismatch()` from test_reduce_api.py to test_reduce.py
    
    * Add TODO POINTER vs ITERATOR
    
    ---------
    
    Co-authored-by: Michael Schellenberger Costa <[email protected]>
    rwgk and miscco authored Oct 4, 2024
    Configuration menu
    Copy the full SHA
    e8d57c3 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    583567b View commit details
    Browse the repository at this point in the history

Commits on Oct 5, 2024

  1. Configuration menu
    Copy the full SHA
    c86caca View commit details
    Browse the repository at this point in the history

Commits on Oct 7, 2024

  1. Configuration menu
    Copy the full SHA
    8aaeb29 View commit details
    Browse the repository at this point in the history

Commits on Oct 8, 2024

  1. Add ForEachN from CUB to cccl/c. (NVIDIA#2378)

    * Work in progress
    
    * Initial draft of exposing CUB::for in cccl/c.
    
    * Add a fairly cursed storage abstraction for kernel arguments.
    
    * Fix cccl/c include path.
    
    * Commit WIP
    
    * Make reduce inline functions static.
    
    * Initial draft of iterator support for C device for
    
    * Remove small vector (for now)
    
    * Fixup and remove debug build option from for.cu.
    
    * Disable iterator support in foreach (for now)
    
    * whitespace fixup
    
    * Restore LDL/STL call ABI checks that were commented out earlier.
    
    Co-authored-by: Georgii Evtushenko <[email protected]>
    
    * Fix missing context push, Fix terminating instead of returning error.
    
    * alignof instead of signof, use `value_t` for value type of kernel.
    
    * Fix missing header in for.cu
    
    ---------
    
    Co-authored-by: Georgii Evtushenko <[email protected]>
    wmaxey and gevtushenko authored Oct 8, 2024
    Configuration menu
    Copy the full SHA
    ee5dd3e View commit details
    Browse the repository at this point in the history
  2. Adds support for large number of items in DeviceSelect and `DeviceP…

    …artition` (NVIDIA#2400)
    
    * adds streaming selection and partition
    
    * ensures policy lookup uses per-partition offset type
    
    * mitigates perf degradation on select
    
    * makes device interfaces use i64 num_items
    
    * updates select::if large num_items tests
    
    * fixes syntax
    
    * adds tests for large num_items for select::flagged
    
    * adds tests for large num_items for partition::if
    
    * adds tests for large num_items for partition::flagged
    
    * fixes redundant definition
    
    * fixes implicit conversion
    
    * fixes f32 select::if perf regression
    
    * fixes perf regression for partition
    
    * fix feature macro
    
    * fixes feature macro
    
    * fixes feature macro
    
    * silences msvc constant conditional warning
    
    * add support for streamin ctx dummy for partition with small offset types
    
    * removes superfluous template parameter
    
    * adds test for different offset types for partition::if & ::flagged
    
    * adds tests and support for streaming select::unique
    
    * fixes msvc warning
    
    * fixes perf for partition
    
    * fixes format
    
    * fixes mixup for partition perf fix
    
    * fixes syntax
    
    * fixes partition:flagged perf
    
    * fixes perf for partition::flagged
    
    * switches unique to always use i64 offset types
    
    * adds benchmark for partition with distinct iterators
    
    * resolves merge conflicts
    
    * fixes merge conflict
    
    * makes sass identical to main for i32 partition
    
    * updates thrust copy_if to always use i64 offset types
    
    * fixes formatting
    
    * minor style improvements
    
    * addresses review comments
    
    * fixes conditional type usage
    
    * makes tests on empty input more robust
    
    * restores empty problem behaviour
    
    * adds comment on const ref
    elstehle authored Oct 8, 2024
    Configuration menu
    Copy the full SHA
    16f9a1a View commit details
    Browse the repository at this point in the history
  3. Adds support for large number of items to DeviceScan::*ByKey family…

    … of algorithms (NVIDIA#2477)
    
    * experimenting with bool tile state
    
    * fixes perf regression from different tile state
    
    * fixes support for large offset types
    
    * adapts interface for scanbykey
    
    * adds tests for large number of items for scanbykey
    
    * fixes naming
    
    * makes thrust scan_by_key use unsigned offset types
    
    * moves scan_by_key_op to detail ns
    elstehle authored Oct 8, 2024
    Configuration menu
    Copy the full SHA
    951c822 View commit details
    Browse the repository at this point in the history

Commits on Oct 9, 2024

  1. Integrate c/parallel with CCCL build system and CI. (NVIDIA#2514)

    Integrate c/parallel into CCCL, setup CI, etc.
    alliepiper authored Oct 9, 2024
    Configuration menu
    Copy the full SHA
    e149e86 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    cbb0edd View commit details
    Browse the repository at this point in the history
  3. Refactor include paths to match cudax conventions.

    - Global instead of local
    - Use `cuda/experimental`
    - s/.h$/.cuh$/g
    alliepiper committed Oct 9, 2024
    Configuration menu
    Copy the full SHA
    afa153d View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    4b2cf18 View commit details
    Browse the repository at this point in the history
  5. Split STF headers into a separate headertest unit.

    These currently require different compile options from the rest of CudaX.
    alliepiper committed Oct 9, 2024
    Configuration menu
    Copy the full SHA
    450136e View commit details
    Browse the repository at this point in the history
  6. Fix -Wreorder warnings.

    alliepiper committed Oct 9, 2024
    Configuration menu
    Copy the full SHA
    09213f6 View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    c587b36 View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    2030832 View commit details
    Browse the repository at this point in the history
  9. Add missing includes.

    alliepiper committed Oct 9, 2024
    Configuration menu
    Copy the full SHA
    7a2a842 View commit details
    Browse the repository at this point in the history
  10. Configuration menu
    Copy the full SHA
    71196f1 View commit details
    Browse the repository at this point in the history
  11. Fix standalone compilation of logical_data.cuh.

    These functions are used in both
    logical_data.cuh and acquire_release.cuh.
    Move them to where they'll be visible to both.
    alliepiper committed Oct 9, 2024
    Configuration menu
    Copy the full SHA
    ebc205a View commit details
    Browse the repository at this point in the history
  12. Configuration menu
    Copy the full SHA
    750db80 View commit details
    Browse the repository at this point in the history
  13. Configuration menu
    Copy the full SHA
    5c55fef View commit details
    Browse the repository at this point in the history

Commits on Oct 10, 2024

  1. Configuration menu
    Copy the full SHA
    03d0a33 View commit details
    Browse the repository at this point in the history