Cudastf #1

caugonnet · 2024-10-10T13:21:59Z

Description

closes
Fix an unused warning due to a constexpr

Checklist

New or existing tests cover these changes.
The documentation is up to date with these changes.

* Make libcu++ void_t available in C++11 * Replace uses of CUB and Thrust's void_t by libcu++ * Deprecate Thrust's void_t * Deprecate libcu++'s __void_t

* Fix divide by zero in `parse-job-times.py` * Add CI slack notifications.

[skip-rapids][skip-vdc][skip-matrix][skip-docs]

…ws vs. actions. (NVIDIA#2008) [skip-matrix][skip-vdc][skip-docs][skip-rapids]

[skip-rapids][skip-vdc][skip-matrix][skip-docs]

This is needed to fetch the runtime info from the GHA API. This only modifies the nightly workflow, the PR tests are unaffected: [skip-matrix][skip-vdc][skip-rapids][skip-docs]

…ture. (NVIDIA#2016)

Co-authored-by: Georgii Evtushenko <[email protected]>

We currently employ a workaround against an nvc++ bug where `__ELF__` is not properly defined. However, we should not define that macro if it is already present. Fixes [BUG]: incompatible redefinition of macro "__ELF__" with NVC++ host compiler NVIDIA#1995

* Refactor thrust/CUB merge * Port thurst::merge[_by_key] to cub::DeviceMerge Fixes NVIDIA#1763 Co-authored-by: Georgii Evtushenko <[email protected]>

…x types (NVIDIA#2025) Addresses NVIDIA#2013

We almost always pull in `cuda::std::distance`, which would be ambiguous with `thrust::distance` if the used type pulls in `cuda::std::` within a thrust algorithm

Co-authored-by: Michael Schellenberger Costa <[email protected]>

* Add a test for Thrust scan with non-commutative op * Fix printing mismatching sequences of non-addable types before C++17 in Thrust unit tests

Move the docs to rst format and also fix some minor issues in the documentation Co-authored-by: Bernhard Manfred Gruber <[email protected]>

These are on a testing pool and the machines are currently experiencing infra failures.

…types (NVIDIA#2046) The issue we have is that our tests rely extensively on those conversions which makes it incredibly painfull to test

…+ interface. (NVIDIA#2017) * Wrapper types for `cudaEvent_t` that provide a modern C++ interface. * `cuda::experimental::event_ref` is a non-owning wrapper around a `cudaEvent_t`. * `cuda::experimental::event` is an owning wrapper around a `cudaEvent_t`. * `cuda::experimental::timed_event` is a `cuda::experimental::event` that also records the time at which it was recorded. * apparently `__event` is a word of power for msvc * represent the elapsed time between two events with nanoseconds instead of microsoconds according to the CUDA docs for `cudaEventElapsedTime`, the elapsed time has sub-microsecond resolution, so it is more appropriate to represent it in nanoseconds. * prune unused headers, switch to rst-friendly doxygen comment style * add class synopsis comments * construct with a stream_ref and record the event on construction * review feedback * tests for `cudax::event` and `cudax::timed_event` * change `event_ref::wait` to use `cudaEventSynchronize` * Use a struct for windows instead * Do not include superfluous config header * Add clang-format rule for cudax * Spell `cudax_add_catch2_test` correctly * Fix formatting issues --------- Co-authored-by: Michael Schellenberger Costa <[email protected]>

)

* Drop badly named `_One_of` concept First ig goes against the naming convention we have with both files and concepts. Furthermore, we should rather use the new type set facility * Rename to `_is_included_in` * Fix concept issue

We currently do not have proper assertions within CCCL. There are different approaches in cub thrust and libcu++, some of which are completely broken. This tries to rework the assertion handlers so that they work uniformly everywhere and can be selectively enabled.

It's only used for tests, it doesn't need to be linked to all cudax targets.

* Abstract and consolidate CPM calls. * Update CPM used in example projects.

* Refactor away per-project TOPLEVEL flags. These will never be toplevel projects ever again now that we're a monorepo. They're redundant with the `CCCL_ENABLE_<proj>` flags. * Disable cudax when unstable is off.

…VIDIA#2429) * Brute force experiment: Which tests fail after adding an `assert False`? * Do not include our own string.h file (NVIDIA#2444) That might conflict with the host standard library * Add `_dtype_validation()` in python/cuda_parallel/cuda/parallel/experimental/__init__.py and fix bug in python/cuda_parallel/tests/test_reduce_api.py * Add `test_device_reduce_dtype_mismatch()`. Capture `dtype`s only in ctor (not entire arrays). * Undo change in .gitignore * Move `min_op()` back into `test_device_reduce_success()` to unbreak sphinx documentation. Also fix existing typo. * Move `test_device_reduce_dtype_mismatch()` from test_reduce_api.py to test_reduce.py * Add TODO POINTER vs ITERATOR --------- Co-authored-by: Michael Schellenberger Costa <[email protected]>

…orm` (NVIDIA#2509)

* Work in progress * Initial draft of exposing CUB::for in cccl/c. * Add a fairly cursed storage abstraction for kernel arguments. * Fix cccl/c include path. * Commit WIP * Make reduce inline functions static. * Initial draft of iterator support for C device for * Remove small vector (for now) * Fixup and remove debug build option from for.cu. * Disable iterator support in foreach (for now) * whitespace fixup * Restore LDL/STL call ABI checks that were commented out earlier. Co-authored-by: Georgii Evtushenko <[email protected]> * Fix missing context push, Fix terminating instead of returning error. * alignof instead of signof, use `value_t` for value type of kernel. * Fix missing header in for.cu --------- Co-authored-by: Georgii Evtushenko <[email protected]>

…artition` (NVIDIA#2400) * adds streaming selection and partition * ensures policy lookup uses per-partition offset type * mitigates perf degradation on select * makes device interfaces use i64 num_items * updates select::if large num_items tests * fixes syntax * adds tests for large num_items for select::flagged * adds tests for large num_items for partition::if * adds tests for large num_items for partition::flagged * fixes redundant definition * fixes implicit conversion * fixes f32 select::if perf regression * fixes perf regression for partition * fix feature macro * fixes feature macro * fixes feature macro * silences msvc constant conditional warning * add support for streamin ctx dummy for partition with small offset types * removes superfluous template parameter * adds test for different offset types for partition::if & ::flagged * adds tests and support for streaming select::unique * fixes msvc warning * fixes perf for partition * fixes format * fixes mixup for partition perf fix * fixes syntax * fixes partition:flagged perf * fixes perf for partition::flagged * switches unique to always use i64 offset types * adds benchmark for partition with distinct iterators * resolves merge conflicts * fixes merge conflict * makes sass identical to main for i32 partition * updates thrust copy_if to always use i64 offset types * fixes formatting * minor style improvements * addresses review comments * fixes conditional type usage * makes tests on empty input more robust * restores empty problem behaviour * adds comment on const ref

… of algorithms (NVIDIA#2477) * experimenting with bool tile state * fixes perf regression from different tile state * fixes support for large offset types * adapts interface for scanbykey * adds tests for large number of items for scanbykey * fixes naming * makes thrust scan_by_key use unsigned offset types * moves scan_by_key_op to detail ns

Integrate c/parallel into CCCL, setup CI, etc.

- Global instead of local - Use `cuda/experimental` - s/.h$/.cuh$/g

These currently require different compile options from the rest of CudaX.

These functions are used in both logical_data.cuh and acquire_release.cuh. Move them to where they'll be visible to both.

gonidelis and others added 30 commits July 17, 2024 12:20

Fix BWUtil report on early exit (NVIDIA#1994)

cc316db

Use libcu++ void_t everywhere (NVIDIA#1977)

72f77c1

* Make libcu++ void_t available in C++11 * Replace uses of CUB and Thrust's void_t by libcu++ * Deprecate Thrust's void_t * Deprecate libcu++'s __void_t

Drop zipped_binary_op (NVIDIA#1988)

be91914

Clarify PtxVersion and SmVersion (NVIDIA#2004)

64e7a06

Refactor CUB util_device (NVIDIA#1948)

87d0849

fix some typos in <cuda/stream_ref> (NVIDIA#2003)

92b4b0b

Add CI slack notifications. (NVIDIA#1961)

56d99db

* Fix divide by zero in `parse-job-times.py` * Add CI slack notifications.

Allow nightly workflow to be manually invoked. (NVIDIA#2007)

fc457b4

[skip-rapids][skip-vdc][skip-matrix][skip-docs]

Need to use a different approach to reuse secrets in reusable workflo…

eb62dc6

…ws vs. actions. (NVIDIA#2008) [skip-matrix][skip-vdc][skip-docs][skip-rapids]

Enable RAPIDS builds for manually dispatched workflows. (NVIDIA#2009)

97e699f

[skip-rapids][skip-vdc][skip-matrix][skip-docs]

Clean up complex.inl (NVIDIA#1655)

2ff83a2

Add github token to nightly workflow-results action. (NVIDIA#2012)

8a5e56a

This is needed to fetch the runtime info from the GHA API. This only modifies the nightly workflow, the PR tests are unaffected: [skip-matrix][skip-vdc][skip-rapids][skip-docs]

Remove obsolete build system glue from the Thrust/CUB submodule struc…

e5fcebe

…ture. (NVIDIA#2016)

Benchmark thrust::copy with non-trivially relocatable type (NVIDIA#1989)

496d88d

Make bool_constant available in C++11 (NVIDIA#1997)

e61bafe

Spell value initialization where used by thrust vectors (NVIDIA#1990)

b8116c3

Co-authored-by: Georgii Evtushenko <[email protected]>

Port thrust::merge[_by_key] to CUB (NVIDIA#1817)

8635429

* Refactor thrust/CUB merge * Port thurst::merge[_by_key] to cub::DeviceMerge Fixes NVIDIA#1763 Co-authored-by: Georgii Evtushenko <[email protected]>

Simplify some pointer traits (NVIDIA#2020)

53fe08f

Simplify test data setup (NVIDIA#2023)

18cd90f

Add tests to ensure that we properly propagate common_type for comple…

f6d3d0b

…x types (NVIDIA#2025) Addresses NVIDIA#2013

Update Thrust CMake README to use CCCL repo. (NVIDIA#2026)

a69c8ac

Include container toolkit in manual prereqs (NVIDIA#2064)

82a3ed0

Avoid ADL issues with thrust::distance (NVIDIA#2053)

46759c5

We almost always pull in `cuda::std::distance`, which would be ambiguous with `thrust::distance` if the used type pulls in `cuda::std::` within a thrust algorithm

Simplify thrust::detail::wrapped_function (NVIDIA#2019)

e25344c

Co-authored-by: Michael Schellenberger Costa <[email protected]>

Add a test for Thrust scan with non-commutative op (NVIDIA#2024)

5ba23b6

* Add a test for Thrust scan with non-commutative op * Fix printing mismatching sequences of non-addable types before C++17 in Thrust unit tests

Update memory_resource docs (NVIDIA#1883)

30eaa9c

Move the docs to rst format and also fix some minor issues in the documentation Co-authored-by: Bernhard Manfred Gruber <[email protected]>

Temporarily switch nightly H100 CI to build-only. (NVIDIA#2060)

04db77a

These are on a testing pool and the machines are currently experiencing infra failures.

Do not rely on conversions between float and extended floating point …

1797742

…types (NVIDIA#2046) The issue we have is that our tests rely extensively on those conversions which makes it incredibly painfull to test

miscco and others added 29 commits October 2, 2024 20:13

Drop 2 relative includes that snuck in (NVIDIA#2492)

25c57f8

re-express the __tupl::__apply member to make nvc++ happy (NVIDIA#2493

10769b4

)

Drop badly named _One_of concept (NVIDIA#2490)

5e139af

* Drop badly named `_One_of` concept First ig goes against the naming convention we have with both files and concepts. Furthermore, we should rather use the new type set facility * Rename to `_is_included_in` * Fix concept issue

Reduce scope of Thrust linkage in cudax. (NVIDIA#2496)

bb001b7

It's only used for tests, it doesn't need to be linked to all cudax targets.

Centralize CPM logic. (NVIDIA#2495)

a0ec74c

* Abstract and consolidate CPM calls. * Update CPM used in example projects.

Fix typo in presets. (NVIDIA#2497)

c15546a

Refactor away per-project TOPLEVEL flags. (NVIDIA#2498)

1cfe171

* Refactor away per-project TOPLEVEL flags. These will never be toplevel projects ever again now that we're a monorepo. They're redundant with the `CCCL_ENABLE_<proj>` flags. * Disable cudax when unstable is off.

avoid gcc optimizer bug by not force inlining part of `thrust::transf…

583567b

…orm` (NVIDIA#2509)

Cleanup and modularize <cuda/std/barrier> (NVIDIA#2443)

c86caca

Consolidate header testing infra. (NVIDIA#2460)

8aaeb29

Integrate c/parallel with CCCL build system and CI. (NVIDIA#2514)

e149e86

Integrate c/parallel into CCCL, setup CI, etc.

Initial import and rename of STF headers.

cbb0edd

Refactor include paths to match cudax conventions.

afa153d

- Global instead of local - Use `cuda/experimental` - s/.h$/.cuh$/g

Apply CCCL clang-format to STF files.

4b2cf18

Split STF headers into a separate headertest unit.

450136e

These currently require different compile options from the rest of CudaX.

Fix -Wreorder warnings.

09213f6

Fix -Wsign-compare warnings.

c587b36

s/I/Idx/g (Identifier I conflicts with complex.h system headers).

2030832

Add missing includes.

7a2a842

Add missing execution space annotations.

71196f1

Fix standalone compilation of logical_data.cuh.

ebc205a

These functions are used in both logical_data.cuh and acquire_release.cuh. Move them to where they'll be visible to both.

Limit no_device_stack pragma to NVHPC.

750db80

Temporarily exclude some failing headers from header testing.

5c55fef

Mark a variable as potentially unused (due to some constexpr condition)

03d0a33

caugonnet closed this Oct 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cudastf #1

Cudastf #1

caugonnet commented Oct 10, 2024

Cudastf #1

Cudastf #1

Conversation

caugonnet commented Oct 10, 2024

Description

Checklist