Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DFT][rocFFt] Address rocFFT failing tests #563

Merged
merged 7 commits into from
Sep 30, 2024

Conversation

s-Nick
Copy link
Contributor

@s-Nick s-Nick commented Aug 30, 2024

Description

This PR addresses two issues arose recently in rocFFT backend. Both issues are related to rocFFT internal changes.
One issue is fixed by small modification to oneMKL backend. The other issue cannot be fixed internally, but it was fixed by AMD and it will available in the future versions of rocFFT.
To avoid that users incur with the second issues a warning is emitted during the configuration and eventually an exception is thrown.

first issue -> ROCm/rocFFT#504
second issue -> ROCm/rocFFT#507

Fixes # (GitHub issue)

This PR address oneMKL issue #559

Checklist

All Submissions

Bug fixes

hjabird and others added 2 commits August 30, 2024 13:47
* rocFFT distributed with ROCm 6+ contains a bug meaning that the
strides cannot be set on a plan description twice.
* This commit works around this issue.
Due to rocFFt internal bug ROCm/rocFFT#507
Add cmake version checks based upon rocFFT version.
Add exception to deal with faulty cases for affected rocFFT version.
RocFFT versions taken from https://github.com/ROCm/rocFFT/blob/develop/CHANGELOG.md
@s-Nick s-Nick requested a review from Rbiessy August 30, 2024 13:32
@@ -39,6 +39,7 @@
#include "rocfft_handle.hpp"

#include <rocfft.h>
#include <rocfft-version.h>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This header is now needed to check rocFFT version if necessary

// 1.0.23 to 1.0.30. Those rocFFT version correspond to rocm version from
// 5.6.0 to 6.3.0.
// Link to rocFFT issue: https://github.com/ROCm/rocFFT/issues/507
if constexpr (rocfft_version_major == 1 && rocfft_version_minor == 0 &&
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This if-statement could be converted to #if directive. I preferred to use this style instead, but I am open to change it

src/dft/backends/rocfft/CMakeLists.txt Outdated Show resolved Hide resolved
src/dft/backends/rocfft/commit.cpp Show resolved Hide resolved
src/dft/backends/rocfft/commit.cpp Outdated Show resolved Hide resolved
src/dft/backends/rocfft/commit.cpp Show resolved Hide resolved
// Link to rocFFT issue: https://github.com/ROCm/rocFFT/issues/507
if constexpr (rocfft_version_major == 1 && rocfft_version_minor == 0 &&
(rocfft_version_patch > 22 && rocfft_version_patch < 31)) {
if (dom == dft::domain::COMPLEX && dimensions > 2) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change should allow in_place routine to run

Suggested change
if (dom == dft::domain::COMPLEX && dimensions > 2) {
if (dom == dft::domain::COMPLEX && config_values.placement == dft::config_value::NOT_INPLACE && dimensions > 2) {

Removing rocFFT version check during configurations.
Update checks on dft command for rocFFT version with known issue.
@Rbiessy Rbiessy requested a review from a team September 2, 2024 12:46
namespace oneapi {
namespace mkl {
namespace dft {
namespace oneapi::mkl::dft::detail {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems the namespace in this file is not consistent in all backends.
With this change rocFFT, oneMKL CPU, oneMKL GPU have the oneapi::mkl::dft::detail namespace but cuFFT and portFFT have oneapi::mkl::dft namespace.
I know this is unrelated to the original issue, but since you changed one of them, could you please make them consistent across all backends?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your review @lhuot
No problem, I updated them in aed1f63

Update namespace to make it consistent across all backends.
if constexpr (rocfft_version_major == 1 && rocfft_version_minor == 0 &&
(rocfft_version_patch > 22 && rocfft_version_patch < 31)) {
if (dom == dft::domain::COMPLEX &&
config_values.placement == dft::config_value::NOT_INPLACE && dimensions > 2) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we certain this is limited to dimensions > 2? Is a 2D case like
lengths = {A, B}
inStrides = {B 1}
outStrides = {1, A}
present in the test base?
Possibly relevant as well are 1D batch cases like
length = N;
batch = M;
inStrides = 1; inDistance = N;
outStrides = M; outDistance = 1;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your review @raphael-egan
The 2D configuration above is not directly tested. If added, rocfft backend skips the tests with the following output:

Skipping test because: "oneMKL: DFT/commit: function is not implemented rocfft requires a stride to be divisible by all dimensions associated with smaller strides!"

If a configuration that respect inputs requirement is provided, tests are passed. Similar configurations are already in the test suite, so I don't think adding this test is a priority.

For 1D there isn't a test that reproduces what you suggested. I added it to check. It is computed correctly with the input above, although the outDistance is computed internally to perform the operation correctly. In any case, both configuration do not cause any issue with rocFFT.

Therefore, I wouldn't change the if-statement condition

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that the exception is thrown by this project and not by rocFFT itself, I would recommend to add a comment like
// rocFFT's functional status for problems like cfoA:B:1xB:1:A is unknown as of [commit hash] (due to project's implementation preventing testing thereof)

Unless an explicit mention of that as a known limitation is found in rocFFT's documentation/release notes (I could not find any), it looks to me that the check triggering this exception to be thrown is present here to ensure consistency of subsequent stride-related implementation steps (sorting operations, in particular).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, added comment in e84a5b2

@s-Nick s-Nick merged commit 00a9af7 into uxlfoundation:develop Sep 30, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants