forked from NVIDIA/cccl
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cudastf #1
Closed
Closed
Cudastf #1
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* Make libcu++ void_t available in C++11 * Replace uses of CUB and Thrust's void_t by libcu++ * Deprecate Thrust's void_t * Deprecate libcu++'s __void_t
* Fix divide by zero in `parse-job-times.py` * Add CI slack notifications.
[skip-rapids][skip-vdc][skip-matrix][skip-docs]
…ws vs. actions. (NVIDIA#2008) [skip-matrix][skip-vdc][skip-docs][skip-rapids]
[skip-rapids][skip-vdc][skip-matrix][skip-docs]
This is needed to fetch the runtime info from the GHA API. This only modifies the nightly workflow, the PR tests are unaffected: [skip-matrix][skip-vdc][skip-rapids][skip-docs]
Co-authored-by: Georgii Evtushenko <[email protected]>
We currently employ a workaround against an nvc++ bug where `__ELF__` is not properly defined. However, we should not define that macro if it is already present. Fixes [BUG]: incompatible redefinition of macro "__ELF__" with NVC++ host compiler NVIDIA#1995
* Refactor thrust/CUB merge * Port thurst::merge[_by_key] to cub::DeviceMerge Fixes NVIDIA#1763 Co-authored-by: Georgii Evtushenko <[email protected]>
We almost always pull in `cuda::std::distance`, which would be ambiguous with `thrust::distance` if the used type pulls in `cuda::std::` within a thrust algorithm
Co-authored-by: Michael Schellenberger Costa <[email protected]>
* Add a test for Thrust scan with non-commutative op * Fix printing mismatching sequences of non-addable types before C++17 in Thrust unit tests
Move the docs to rst format and also fix some minor issues in the documentation Co-authored-by: Bernhard Manfred Gruber <[email protected]>
These are on a testing pool and the machines are currently experiencing infra failures.
…types (NVIDIA#2046) The issue we have is that our tests rely extensively on those conversions which makes it incredibly painfull to test
…+ interface. (NVIDIA#2017) * Wrapper types for `cudaEvent_t` that provide a modern C++ interface. * `cuda::experimental::event_ref` is a non-owning wrapper around a `cudaEvent_t`. * `cuda::experimental::event` is an owning wrapper around a `cudaEvent_t`. * `cuda::experimental::timed_event` is a `cuda::experimental::event` that also records the time at which it was recorded. * apparently `__event` is a word of power for msvc * represent the elapsed time between two events with nanoseconds instead of microsoconds according to the CUDA docs for `cudaEventElapsedTime`, the elapsed time has sub-microsecond resolution, so it is more appropriate to represent it in nanoseconds. * prune unused headers, switch to rst-friendly doxygen comment style * add class synopsis comments * construct with a stream_ref and record the event on construction * review feedback * tests for `cudax::event` and `cudax::timed_event` * change `event_ref::wait` to use `cudaEventSynchronize` * Use a struct for windows instead * Do not include superfluous config header * Add clang-format rule for cudax * Spell `cudax_add_catch2_test` correctly * Fix formatting issues --------- Co-authored-by: Michael Schellenberger Costa <[email protected]>
* Drop badly named `_One_of` concept First ig goes against the naming convention we have with both files and concepts. Furthermore, we should rather use the new type set facility * Rename to `_is_included_in` * Fix concept issue
We currently do not have proper assertions within CCCL. There are different approaches in cub thrust and libcu++, some of which are completely broken. This tries to rework the assertion handlers so that they work uniformly everywhere and can be selectively enabled.
It's only used for tests, it doesn't need to be linked to all cudax targets.
* Abstract and consolidate CPM calls. * Update CPM used in example projects.
* Refactor away per-project TOPLEVEL flags. These will never be toplevel projects ever again now that we're a monorepo. They're redundant with the `CCCL_ENABLE_<proj>` flags. * Disable cudax when unstable is off.
…VIDIA#2429) * Brute force experiment: Which tests fail after adding an `assert False`? * Do not include our own string.h file (NVIDIA#2444) That might conflict with the host standard library * Add `_dtype_validation()` in python/cuda_parallel/cuda/parallel/experimental/__init__.py and fix bug in python/cuda_parallel/tests/test_reduce_api.py * Add `test_device_reduce_dtype_mismatch()`. Capture `dtype`s only in ctor (not entire arrays). * Undo change in .gitignore * Move `min_op()` back into `test_device_reduce_success()` to unbreak sphinx documentation. Also fix existing typo. * Move `test_device_reduce_dtype_mismatch()` from test_reduce_api.py to test_reduce.py * Add TODO POINTER vs ITERATOR --------- Co-authored-by: Michael Schellenberger Costa <[email protected]>
* Work in progress * Initial draft of exposing CUB::for in cccl/c. * Add a fairly cursed storage abstraction for kernel arguments. * Fix cccl/c include path. * Commit WIP * Make reduce inline functions static. * Initial draft of iterator support for C device for * Remove small vector (for now) * Fixup and remove debug build option from for.cu. * Disable iterator support in foreach (for now) * whitespace fixup * Restore LDL/STL call ABI checks that were commented out earlier. Co-authored-by: Georgii Evtushenko <[email protected]> * Fix missing context push, Fix terminating instead of returning error. * alignof instead of signof, use `value_t` for value type of kernel. * Fix missing header in for.cu --------- Co-authored-by: Georgii Evtushenko <[email protected]>
…artition` (NVIDIA#2400) * adds streaming selection and partition * ensures policy lookup uses per-partition offset type * mitigates perf degradation on select * makes device interfaces use i64 num_items * updates select::if large num_items tests * fixes syntax * adds tests for large num_items for select::flagged * adds tests for large num_items for partition::if * adds tests for large num_items for partition::flagged * fixes redundant definition * fixes implicit conversion * fixes f32 select::if perf regression * fixes perf regression for partition * fix feature macro * fixes feature macro * fixes feature macro * silences msvc constant conditional warning * add support for streamin ctx dummy for partition with small offset types * removes superfluous template parameter * adds test for different offset types for partition::if & ::flagged * adds tests and support for streaming select::unique * fixes msvc warning * fixes perf for partition * fixes format * fixes mixup for partition perf fix * fixes syntax * fixes partition:flagged perf * fixes perf for partition::flagged * switches unique to always use i64 offset types * adds benchmark for partition with distinct iterators * resolves merge conflicts * fixes merge conflict * makes sass identical to main for i32 partition * updates thrust copy_if to always use i64 offset types * fixes formatting * minor style improvements * addresses review comments * fixes conditional type usage * makes tests on empty input more robust * restores empty problem behaviour * adds comment on const ref
… of algorithms (NVIDIA#2477) * experimenting with bool tile state * fixes perf regression from different tile state * fixes support for large offset types * adapts interface for scanbykey * adds tests for large number of items for scanbykey * fixes naming * makes thrust scan_by_key use unsigned offset types * moves scan_by_key_op to detail ns
Integrate c/parallel into CCCL, setup CI, etc.
- Global instead of local - Use `cuda/experimental` - s/.h$/.cuh$/g
These currently require different compile options from the rest of CudaX.
These functions are used in both logical_data.cuh and acquire_release.cuh. Move them to where they'll be visible to both.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
closes
Fix an unused warning due to a constexpr
Checklist