Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Host Implementation of Histogram APIs #1974

Open
wants to merge 40 commits into
base: main
Choose a base branch
from

Conversation

danhoeflinger
Copy link
Contributor

@danhoeflinger danhoeflinger commented Dec 18, 2024

Implementation of histogram APIs for host backends.

Implementations are provided for serial, tbb, and openMP backends. We add a generic __thread_enumerable_storage struct to add a generic thread local storage for our host backends. We use the new TLS (Thread local storage) with parallel_for to implement histogram. Testing is also added, and some minor adjustments are made to cmake.

Please see the RFC documentation / discussion here for more details.

Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
This reverts commit 8d61609.
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
@danhoeflinger danhoeflinger marked this pull request as ready for review December 30, 2024 20:59
@danhoeflinger danhoeflinger changed the title [Draft] Host Implementation of Histogram APIs Host Implementation of Histogram APIs Dec 30, 2024
@danhoeflinger danhoeflinger added this to the 2022.8.0 milestone Dec 30, 2024
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
include/oneapi/dpl/pstl/algorithm_impl.h Outdated Show resolved Hide resolved
include/oneapi/dpl/pstl/omp/util.h Outdated Show resolved Hide resolved
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
__construct_by_args(_P&&... __args) : __pack(std::forward<_P>(__args)...) {}

private:
std::tuple<_P...> __pack;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we include <tuple> in this header?

std::unique_ptr<_StorageType>
construct() override
{
return std::move(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason std::move is needed here?

Comment on lines +211 to +218
for (; __j < __thread_specific_storage.size() && __count <= __i; ++__j)
{
// Only include storage from threads which have instantiated a storage object
if (__thread_specific_storage[__j])
{
__count++;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we expect in the general case that each thread constructs its own container? If so, would it be possible to add a check like the following before this? :

if (size() == __thread_specific_storage.size())
    return *__thread_specific_storage[__i];

If I understand correctly, each thread constructing its local storage would mean the ith id would be at index i. For a CPU with a large number of threads this may be more performant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants