-
Notifications
You must be signed in to change notification settings - Fork 162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DFT][rocFFt] Address rocFFT failing tests #563
Conversation
* rocFFT distributed with ROCm 6+ contains a bug meaning that the strides cannot be set on a plan description twice. * This commit works around this issue.
Due to rocFFt internal bug ROCm/rocFFT#507 Add cmake version checks based upon rocFFT version. Add exception to deal with faulty cases for affected rocFFT version. RocFFT versions taken from https://github.com/ROCm/rocFFT/blob/develop/CHANGELOG.md
@@ -39,6 +39,7 @@ | |||
#include "rocfft_handle.hpp" | |||
|
|||
#include <rocfft.h> | |||
#include <rocfft-version.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This header is now needed to check rocFFT version if necessary
// 1.0.23 to 1.0.30. Those rocFFT version correspond to rocm version from | ||
// 5.6.0 to 6.3.0. | ||
// Link to rocFFT issue: https://github.com/ROCm/rocFFT/issues/507 | ||
if constexpr (rocfft_version_major == 1 && rocfft_version_minor == 0 && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This if-statement could be converted to #if
directive. I preferred to use this style instead, but I am open to change it
src/dft/backends/rocfft/commit.cpp
Outdated
// Link to rocFFT issue: https://github.com/ROCm/rocFFT/issues/507 | ||
if constexpr (rocfft_version_major == 1 && rocfft_version_minor == 0 && | ||
(rocfft_version_patch > 22 && rocfft_version_patch < 31)) { | ||
if (dom == dft::domain::COMPLEX && dimensions > 2) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change should allow in_place
routine to run
if (dom == dft::domain::COMPLEX && dimensions > 2) { | |
if (dom == dft::domain::COMPLEX && config_values.placement == dft::config_value::NOT_INPLACE && dimensions > 2) { |
Removing rocFFT version check during configurations. Update checks on dft command for rocFFT version with known issue.
namespace oneapi { | ||
namespace mkl { | ||
namespace dft { | ||
namespace oneapi::mkl::dft::detail { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems the namespace in this file is not consistent in all backends.
With this change rocFFT, oneMKL CPU, oneMKL GPU have the oneapi::mkl::dft::detail namespace but cuFFT and portFFT have oneapi::mkl::dft namespace.
I know this is unrelated to the original issue, but since you changed one of them, could you please make them consistent across all backends?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update namespace to make it consistent across all backends.
if constexpr (rocfft_version_major == 1 && rocfft_version_minor == 0 && | ||
(rocfft_version_patch > 22 && rocfft_version_patch < 31)) { | ||
if (dom == dft::domain::COMPLEX && | ||
config_values.placement == dft::config_value::NOT_INPLACE && dimensions > 2) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we certain this is limited to dimensions > 2
? Is a 2D case like
lengths = {A, B}
inStrides = {B 1}
outStrides = {1, A}
present in the test base?
Possibly relevant as well are 1D batch cases like
length = N;
batch = M;
inStrides = 1; inDistance = N;
outStrides = M; outDistance = 1;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your review @raphael-egan
The 2D configuration above is not directly tested. If added, rocfft backend skips the tests with the following output:
Skipping test because: "oneMKL: DFT/commit: function is not implemented rocfft requires a stride to be divisible by all dimensions associated with smaller strides!"
If a configuration that respect inputs requirement is provided, tests are passed. Similar configurations are already in the test suite, so I don't think adding this test is a priority.
For 1D there isn't a test that reproduces what you suggested. I added it to check. It is computed correctly with the input above, although the outDistance
is computed internally to perform the operation correctly. In any case, both configuration do not cause any issue with rocFFT.
Therefore, I wouldn't change the if-statement condition
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that the exception is thrown by this project and not by rocFFT itself, I would recommend to add a comment like
// rocFFT's functional status for problems like cfoA:B:1xB:1:A is unknown as of [commit hash] (due to project's implementation preventing testing thereof)
Unless an explicit mention of that as a known limitation is found in rocFFT's documentation/release notes (I could not find any), it looks to me that the check triggering this exception to be thrown is present here to ensure consistency of subsequent stride-related implementation steps (sorting operations, in particular).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, added comment in e84a5b2
Description
This PR addresses two issues arose recently in rocFFT backend. Both issues are related to rocFFT internal changes.
One issue is fixed by small modification to oneMKL backend. The other issue cannot be fixed internally, but it was fixed by AMD and it will available in the future versions of rocFFT.
To avoid that users incur with the second issues a warning is emitted during the configuration and eventually an exception is thrown.
first issue -> ROCm/rocFFT#504
second issue -> ROCm/rocFFT#507
Fixes # (GitHub issue)
This PR address oneMKL issue #559
Checklist
All Submissions
Do all unit tests pass locally? Attach a log.
older rocm version(5.4.3) -> rocm5_4_3_full_log.txt
current release -> rocm6_1_fix_log.txt
rocFFT develop branch -> develop_fix_log.txt
Have you formatted the code using clang-format?
Bug fixes
GitHub issue or in this PR)?
issue [ROCFFT] RocFFT fails tests when using ROCm 6.0 or later #559