Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix RAII issue by introducing wrapper classes for backend plans #208

Merged
merged 66 commits into from
Jan 7, 2025

Conversation

yasahi-hpc
Copy link
Collaborator

@yasahi-hpc yasahi-hpc commented Dec 3, 2024

Resolves #197

In this PR, we try to resolve RAII issues by introducing a thin wrapper for backend fft plans.
Following modifications are made,

  • Introducing a KokkosFFT_FFTW_Types.hpp file including the FFTW plan wrapper class and common code bases related fftw
  • Introducing a wrapper class for automatic deallocation. Wrappers are added for FFTW, cufft, hipfft, and rocfft (Scoped***Plan)
    For CUDA, we have
struct ScopedCufftPlanType {
   cufftHandle m_plan;
   // Should we raise an error here, if it fails?
   ScopedCufftPlanType() { cufftCreate(&m_plan); }
   
   // Should we abort here, if it fails?
   ~ScopedCufftPlanType() { cufftDestroy(m_plan); }

   cufftHandle &plan() { return m_plan; }
};
  • Info and Buffer that are required only for rocfft are moved inside rocfft helper class
  • Add missing cleanup_threads() in the destructor of FFTW helper class
  • Remove destroy_plan_and_info since it is automatically done in destructors of helper classes

Questions

  1. We are creating plans inside a constructor for cufft and hipfft. Should we raise assertions in constructor if it fails?
  2. What should we do if destroy functions in destructor fails. Should we abort?

@yasahi-hpc yasahi-hpc self-assigned this Dec 3, 2024
@yasahi-hpc yasahi-hpc added bug Something isn't working enhancement New feature or request cleanup labels Dec 3, 2024
@yasahi-hpc yasahi-hpc changed the title Fix raii issue Fix RAII issue by introducing wrapper classes for backend plans Dec 3, 2024
@yasahi-hpc yasahi-hpc marked this pull request as draft December 3, 2024 15:04
@yasahi-hpc yasahi-hpc marked this pull request as ready for review December 4, 2024 19:30
@tpadioleau
Copy link
Member

I would go a step further and make these RAII classes unique-owners. That implies to disallow shallow copy either by simply deleting copy constructor and assignment or make them allocate and copy a new plan if backends offer such a copy function. We can keep the move semantics but we have to keep track if the current object must call cufftDestroy in the destructor. If we delete both the copy and move semantics that gives:

class ScopedCufftPlanType {
 private:
  cufftHandle m_plan;

 public:
  ScopedCufftPlanType() {
    cufftResult cufft_rt = cufftCreate(&m_plan);
    KOKKOSFFT_THROW_IF(cufft_rt != CUFFT_SUCCESS, "cufftCreate failed");
  }

  ScopedCufftPlanType(ScopedCufftPlanType const& rhs) = delete;

  ScopedCufftPlanType(ScopedCufftPlanType&& rhs) = delete;

  ~ScopedCufftPlanType() noexcept {
    cufftResult cufft_rt = cufftDestroy(m_plan);
    if (cufft_rt != CUFFT_SUCCESS) Kokkos::abort("cufftDestroy failed");
  }

  ScopedCufftPlanType& operator=(ScopedCufftPlanType const& rhs) = delete;

  ScopedCufftPlanType& operator=(ScopedCufftPlanType&& rhs) = delete;

  cufftHandle plan() const noexcept { return m_plan; }
};

The ScopedCufftPlanType::plan can return the backend handle by copy for CUDA, I don't know for the other backends.

I notice that we could also improve the error messages, I see that cufftCreate can return different error types https://docs.nvidia.com/cuda/cufft/index.html?highlight=cufftMakePlan1d#c.cufftCreate.

@yasahi-hpc
Copy link
Collaborator Author

I would go a step further and make these RAII classes unique-owners. That implies to disallow shallow copy either by simply deleting copy constructor and assignment or make them allocate and copy a new plan if backends offer such a copy function. We can keep the move semantics but we have to keep track if the current object must call cufftDestroy in the destructor.

I would prefer to disallow both copy and move, because it may be unsafe for some backends.

If we delete both the copy and move semantics that gives:

class ScopedCufftPlanType {
 private:
  cufftHandle m_plan;

 public:
  ScopedCufftPlanType() {
    cufftResult cufft_rt = cufftCreate(&m_plan);
    KOKKOSFFT_THROW_IF(cufft_rt != CUFFT_SUCCESS, "cufftCreate failed");
  }

  ScopedCufftPlanType(ScopedCufftPlanType const& rhs) = delete;

  ScopedCufftPlanType(ScopedCufftPlanType&& rhs) = delete;

  ~ScopedCufftPlanType() noexcept {
    cufftResult cufft_rt = cufftDestroy(m_plan);
    if (cufft_rt != CUFFT_SUCCESS) Kokkos::abort("cufftDestroy failed");
  }

  ScopedCufftPlanType& operator=(ScopedCufftPlanType const& rhs) = delete;

  ScopedCufftPlanType& operator=(ScopedCufftPlanType&& rhs) = delete;

  cufftHandle plan() const noexcept { return m_plan; }
};

The ScopedCufftPlanType::plan can return the backend handle by copy for CUDA, I don't know for the other backends.

Does it return a shallow copy of the cufftHandle? I could not find the details of copy constructor of cufftHandle. That should work for oneMKL backend, but I am not sure about the behavior of other backends

I notice that we could also improve the error messages, I see that cufftCreate can return different error types https://docs.nvidia.com/cuda/cufft/index.html?highlight=cufftMakePlan1d#c.cufftCreate.

That is true. We can improve the error messages

@tpadioleau
Copy link
Member

I would prefer to disallow both copy and move, because it may be unsafe for some backends.

Ok for me. I don't think there should be any problem with implementing the move semantics later. It requires us to only pass these objects by reference.

Does it return a shallow copy of the cufftHandle? I could not find the details of copy constructor of cufftHandle. That should work for oneMKL backend, but I am not sure about the behavior of other backends

I mean if we do

ScopedCufftPlanType plan1;
ScopedCufftPlanType plan2;
plan2 = plan1;

This compiles fine and plan2 will get the same cufftHandle than plan1. The plan2 will destroy correctly the handle but then plan1 will likely fail to do so.

@yasahi-hpc
Copy link
Collaborator Author

I would prefer to disallow both copy and move, because it may be unsafe for some backends.

Ok for me. I don't think there should be any problem with implementing the move semantics later. It requires us to only pass these objects by reference.

Does it return a shallow copy of the cufftHandle? I could not find the details of copy constructor of cufftHandle. That should work for oneMKL backend, but I am not sure about the behavior of other backends

I mean if we do

ScopedCufftPlanType plan1;
ScopedCufftPlanType plan2;
plan2 = plan1;

This compiles fine and plan2 will get the same cufftHandle than plan1. The plan2 will destroy correctly the handle but then plan1 will likely fail to do so.

That is right. This does not work in the current implementation. Yes, for safety, it is better to disallow copy

fft/src/KokkosFFT_Cuda_types.hpp Outdated Show resolved Hide resolved
fft/src/KokkosFFT_Cuda_transform.hpp Outdated Show resolved Hide resolved
fft/src/KokkosFFT_Cuda_transform.hpp Outdated Show resolved Hide resolved
fft/src/KokkosFFT_Cuda_types.hpp Outdated Show resolved Hide resolved
fft/src/KokkosFFT_FFTW_Types.hpp Outdated Show resolved Hide resolved
fft/src/KokkosFFT_ROCM_types.hpp Outdated Show resolved Hide resolved
fft/src/KokkosFFT_ROCM_types.hpp Outdated Show resolved Hide resolved
fft/src/KokkosFFT_ROCM_plans.hpp Outdated Show resolved Hide resolved
fft/src/KokkosFFT_ROCM_plans.hpp Outdated Show resolved Hide resolved
fft/src/KokkosFFT_ROCM_plans.hpp Outdated Show resolved Hide resolved
@yasahi-hpc
Copy link
Collaborator Author

@tpadioleau I have fixed based on the discussion. Can I have another review?

fft/src/KokkosFFT_Cuda_types.hpp Outdated Show resolved Hide resolved
fft/src/KokkosFFT_HIP_types.hpp Outdated Show resolved Hide resolved
fft/src/KokkosFFT_HIP_types.hpp Outdated Show resolved Hide resolved
fft/src/KokkosFFT_Cuda_types.hpp Outdated Show resolved Hide resolved
fft/src/KokkosFFT_FFTW_Types.hpp Show resolved Hide resolved
fft/src/KokkosFFT_FFTW_Types.hpp Outdated Show resolved Hide resolved
fft/src/KokkosFFT_ROCM_types.hpp Show resolved Hide resolved
fft/src/KokkosFFT_ROCM_types.hpp Show resolved Hide resolved
@tpadioleau
Copy link
Member

Why do we need the rocfft backend already ? For performance ?

@yasahi-hpc
Copy link
Collaborator Author

Why do we need the rocfft backend already ? For performance ?

For performance. I prefer rocfft over hipfft

@tpadioleau
Copy link
Member

For performance. I prefer rocfft over hipfft

I don't understand how we can explain that ?

@yasahi-hpc
Copy link
Collaborator Author

For performance. I prefer rocfft over hipfft

I don't understand how we can explain that ?

Me neither.
When I measured the 2D batched FFT with a reused plan, I measured a better performance with rocfft.
Even if the overhead is small, rocfft must outperform hipfft since hipfft just wraps rocfft

Copy link
Member

@tpadioleau tpadioleau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are almost there, just a concern with the FFTW backend and threads cleanup

fft/src/KokkosFFT_Cuda_types.hpp Outdated Show resolved Hide resolved
fft/src/KokkosFFT_FFTW_Types.hpp Outdated Show resolved Hide resolved
fft/src/KokkosFFT_HIP_types.hpp Outdated Show resolved Hide resolved
fft/src/KokkosFFT_ROCM_types.hpp Show resolved Hide resolved
@yasahi-hpc yasahi-hpc requested a review from tpadioleau January 7, 2025 08:37
Copy link
Member

@tpadioleau tpadioleau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor changes and then we are good

fft/src/KokkosFFT_FFTW_Types.hpp Outdated Show resolved Hide resolved
fft/src/KokkosFFT_ROCM_types.hpp Outdated Show resolved Hide resolved
fft/src/KokkosFFT_ROCM_types.hpp Show resolved Hide resolved
@yasahi-hpc
Copy link
Collaborator Author

@tpadioleau Thank you.
I have fixed accordingly.
If you feel good, I will merge this and start working on #217 and #218

@yasahi-hpc yasahi-hpc requested a review from tpadioleau January 7, 2025 10:41
Copy link
Member

@tpadioleau tpadioleau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine for me, please can you add me as a git co-author for this PR ? As no suggestion was applied I will not appear at all.

Co-authored-by: Yuuichi Asahi <[email protected]>
Co-authored-by: Thomas Padioleau <[email protected]>
@yasahi-hpc yasahi-hpc merged commit b4b272a into kokkos:main Jan 7, 2025
16 checks passed
@yasahi-hpc yasahi-hpc deleted the fix-RAII-issue branch January 7, 2025 11:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cleanup enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

RAII issue
2 participants