Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FlatterScatter #271

Merged
merged 16 commits into from
Nov 26, 2024
Merged

Conversation

chillenzer
Copy link
Contributor

This PR adds amongst a number of modernisations the new FlatterScatter creation policy.

Copy link
Member

@psychocoderHPC psychocoderHPC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

offline discussed catch2 should be fetched on the need to not increase the repository size to much.

The rest looks good! ONly one small question for the heap init part is open.

mallocMC::OOMPolicies::ReturnNull,
mallocMC::ReservePoolPolicies::AlpakaBuf<Acc>,
mallocMC::AlignmentPolicies::Shrink<ShrinkConfig>>;

ALPAKA_STATIC_ACC_MEM_GLOBAL int** arA;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO this must be equal to https://github.com/alpaka-group/alpaka/blob/7e1b6ce24c5fadc9c58dd0d1dc59c022e96a087b/test/unit/mem/view/src/ViewStaticAccMem.cpp#L19

alpaka::DevGlobal is missing and the usage requires a .get() but it looks it runs and the DevGlobal is only required for SYCL. SO I am fine to keep it as it is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened: #272 to track this.

using VecType = alpaka::Vec<Dim, Idx>;

auto poolView = alpaka::createView(dev, reinterpret_cast<char*>(pool), alpaka::Vec<Dim, Idx>(memsize));
alpaka::memset(queue, poolView, 0U);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: You mentioned last time offline that we use a kernel to initialize the heap but here the memset is still present. It is fine for this PR I would like to avoid that we use memset and a on device initialization.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, good catch! There is an init method for the heap but apparently it is unused. It's on my TODO list.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened #273 to track this.


auto workDivSingleThread
= alpaka::WorkDivMembers<Dim, Idx>{VecType::ones(), VecType::ones(), VecType::ones()};
alpaka::exec<TAcc>(queue, workDivSingleThread, FlatterScatterAlloc::InitKernel{}, heap, pool, memsize);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds like the kernel which is performing the on device init equal to the memset above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, damn! It's much worse! The above memset is completely redundant. -.-

ALPAKA_FN_INLINE ALPAKA_FN_ACC auto getAvailableMultiPages(auto const& /*acc*/, uint32_t const chunkSize) const
-> uint32_t
{
// TODO(lenz): This is not thread-safe!
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should not forget this todo and handle it in a future PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is kind of a minor issue: The informational methods getAvailableSlots and its helpers provide information that cannot possibly be used in a thread-safe manner. So, calling them while other threads are write-accessing the same allocator is not a reasonable thing to do anyways. Mostly left it in for informational purposes but maybe I should drop the TODO(lenz) part. It is documented in the Usage.md document by the way.

TAcc const& acc,
uint32_t const index,
uint32_t const numBytes,
uint32_t& chunkSizeCache) -> bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only a note: this should be refactored in the future and return the value together with the current return value as tuple.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened #274 to track this.

@chillenzer
Copy link
Contributor Author

Okay. I've surgically removed the Catch2 commit and replaced it with a git submodule. I've removed the redundant memset in the init method and will certainly do the initialisation benchmark again! Other than that I've opened issues to track your good suggestions.

@chillenzer chillenzer linked an issue Nov 25, 2024 that may be closed by this pull request
@chillenzer chillenzer mentioned this pull request Nov 25, 2024
@psychocoderHPC psychocoderHPC merged commit 127b29f into alpaka-group:dev Nov 26, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update to Catch2v3 Weird test failures
2 participants