Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cudastf #1

Closed
wants to merge 462 commits into from
Closed

Cudastf #1

wants to merge 462 commits into from

Conversation

caugonnet
Copy link

Description

closes
Fix an unused warning due to a constexpr

Checklist

  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

gonidelis and others added 30 commits July 17, 2024 12:20
* Make libcu++ void_t available in C++11
* Replace uses of CUB and Thrust's void_t by libcu++
* Deprecate Thrust's void_t
* Deprecate libcu++'s __void_t
* Fix divide by zero in `parse-job-times.py`

* Add CI slack notifications.
[skip-rapids][skip-vdc][skip-matrix][skip-docs]
…ws vs. actions. (NVIDIA#2008)

[skip-matrix][skip-vdc][skip-docs][skip-rapids]
[skip-rapids][skip-vdc][skip-matrix][skip-docs]
This is needed to fetch the runtime info from the GHA API.

This only modifies the nightly workflow, the PR tests are unaffected:
[skip-matrix][skip-vdc][skip-rapids][skip-docs]
We currently employ a workaround against an nvc++ bug where `__ELF__` is not properly defined.

However, we should not define that macro if it is already present.

Fixes [BUG]: incompatible redefinition of macro "__ELF__" with NVC++ host compiler NVIDIA#1995
* Refactor thrust/CUB merge
* Port thurst::merge[_by_key] to cub::DeviceMerge

Fixes NVIDIA#1763

Co-authored-by: Georgii Evtushenko <[email protected]>
We almost always pull in `cuda::std::distance`, which would be ambiguous with `thrust::distance` if the used type pulls in `cuda::std::` within a thrust algorithm
* Add a test for Thrust scan with non-commutative op
* Fix printing mismatching sequences of non-addable types before C++17 in Thrust unit tests
Move the docs to rst format and also fix some minor issues in the documentation

Co-authored-by: Bernhard Manfred Gruber <[email protected]>
These are on a testing pool and the machines are currently experiencing infra failures.
…types (NVIDIA#2046)

The issue we have is that our tests rely extensively on those conversions which makes it incredibly painfull to test
…+ interface. (NVIDIA#2017)

* Wrapper types for `cudaEvent_t` that provide a modern C++ interface.

* `cuda::experimental::event_ref` is a non-owning wrapper around a
  `cudaEvent_t`.

* `cuda::experimental::event` is an owning wrapper around a `cudaEvent_t`.

* `cuda::experimental::timed_event` is a `cuda::experimental::event` that also
  records the time at which it was recorded.

* apparently `__event` is a word of power for msvc

* represent the elapsed time between two events with nanoseconds instead of microsoconds

according to the CUDA docs for `cudaEventElapsedTime`, the elapsed time
has sub-microsecond resolution, so it is more appropriate to represent
it in nanoseconds.

* prune unused headers, switch to rst-friendly doxygen comment style

* add class synopsis comments

* construct with a stream_ref and record the event on construction

* review feedback

* tests for `cudax::event` and `cudax::timed_event`

* change `event_ref::wait` to use `cudaEventSynchronize`

* Use a struct for windows instead

* Do not include superfluous config header

* Add clang-format rule for cudax

* Spell `cudax_add_catch2_test` correctly

* Fix formatting issues

---------

Co-authored-by: Michael Schellenberger Costa <[email protected]>
miscco and others added 29 commits October 2, 2024 20:13
* Drop badly named `_One_of` concept

First ig goes against the naming convention we have with both files and concepts.

Furthermore, we should rather use the new type set facility

* Rename to `_is_included_in`

* Fix concept issue
We currently do not have proper assertions within CCCL.

There are different approaches in cub thrust and libcu++, some of which are completely broken.

This tries to rework the assertion handlers so that they work uniformly everywhere and can be selectively enabled.
It's only used for tests, it doesn't need to be linked to all cudax targets.
* Abstract and consolidate CPM calls.

* Update CPM used in example projects.
* Refactor away per-project TOPLEVEL flags.

These will never be toplevel projects ever again now that we're a monorepo.
They're redundant with the `CCCL_ENABLE_<proj>` flags.

* Disable cudax when unstable is off.
…VIDIA#2429)

* Brute force experiment: Which tests fail after adding an `assert False`?

* Do not include our own string.h file (NVIDIA#2444)

That might conflict with the host standard library

* Add `_dtype_validation()` in python/cuda_parallel/cuda/parallel/experimental/__init__.py and fix bug in python/cuda_parallel/tests/test_reduce_api.py

* Add `test_device_reduce_dtype_mismatch()`.  Capture `dtype`s only in ctor (not entire arrays).

* Undo change in .gitignore

* Move `min_op()` back into `test_device_reduce_success()` to unbreak sphinx documentation. Also fix existing typo.

* Move `test_device_reduce_dtype_mismatch()` from test_reduce_api.py to test_reduce.py

* Add TODO POINTER vs ITERATOR

---------

Co-authored-by: Michael Schellenberger Costa <[email protected]>
* Work in progress

* Initial draft of exposing CUB::for in cccl/c.

* Add a fairly cursed storage abstraction for kernel arguments.

* Fix cccl/c include path.

* Commit WIP

* Make reduce inline functions static.

* Initial draft of iterator support for C device for

* Remove small vector (for now)

* Fixup and remove debug build option from for.cu.

* Disable iterator support in foreach (for now)

* whitespace fixup

* Restore LDL/STL call ABI checks that were commented out earlier.

Co-authored-by: Georgii Evtushenko <[email protected]>

* Fix missing context push, Fix terminating instead of returning error.

* alignof instead of signof, use `value_t` for value type of kernel.

* Fix missing header in for.cu

---------

Co-authored-by: Georgii Evtushenko <[email protected]>
…artition` (NVIDIA#2400)

* adds streaming selection and partition

* ensures policy lookup uses per-partition offset type

* mitigates perf degradation on select

* makes device interfaces use i64 num_items

* updates select::if large num_items tests

* fixes syntax

* adds tests for large num_items for select::flagged

* adds tests for large num_items for partition::if

* adds tests for large num_items for partition::flagged

* fixes redundant definition

* fixes implicit conversion

* fixes f32 select::if perf regression

* fixes perf regression for partition

* fix feature macro

* fixes feature macro

* fixes feature macro

* silences msvc constant conditional warning

* add support for streamin ctx dummy for partition with small offset types

* removes superfluous template parameter

* adds test for different offset types for partition::if & ::flagged

* adds tests and support for streaming select::unique

* fixes msvc warning

* fixes perf for partition

* fixes format

* fixes mixup for partition perf fix

* fixes syntax

* fixes partition:flagged perf

* fixes perf for partition::flagged

* switches unique to always use i64 offset types

* adds benchmark for partition with distinct iterators

* resolves merge conflicts

* fixes merge conflict

* makes sass identical to main for i32 partition

* updates thrust copy_if to always use i64 offset types

* fixes formatting

* minor style improvements

* addresses review comments

* fixes conditional type usage

* makes tests on empty input more robust

* restores empty problem behaviour

* adds comment on const ref
… of algorithms (NVIDIA#2477)

* experimenting with bool tile state

* fixes perf regression from different tile state

* fixes support for large offset types

* adapts interface for scanbykey

* adds tests for large number of items for scanbykey

* fixes naming

* makes thrust scan_by_key use unsigned offset types

* moves scan_by_key_op to detail ns
Integrate c/parallel into CCCL, setup CI, etc.
- Global instead of local
- Use `cuda/experimental`
- s/.h$/.cuh$/g
These currently require different compile options from the rest of CudaX.
These functions are used in both
logical_data.cuh and acquire_release.cuh.
Move them to where they'll be visible to both.
@caugonnet caugonnet closed this Oct 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.