-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
inline static members in Kokkos 4.0 class not persistent with CUDA backend #55
Comments
To reproduce I would expect any CUDA kernel to fail when Kokkos::Initialize() is called from pykokkos-base, and a subsequent kokkos kernel is called. I cannot reproduce in Kokkos/C++ only code. |
It seems like the inline static member behavior is different when Kokkos is compiled as a static versus a shared library. Because pybind11 requires PIC, generally one just compiles Kokkos as a shared library, so there are no problems when compiling pykokkos-base. However, this leads to the behavior described above (with 4.0). However, when I compile Kokkos as static libraries, with -fPIC, I am able to get Kokkos 4.0 to run on the cuda backend. This is well over my compiling/C++ object lifetime/ instruction unit pay grade, so not sure what to make of it. But at least it works. |
hm interesting. @nliber do you have any idea what this could be? I think it is potentially the jitting of stuff where we would have inline static things inside header files? So if something gets recompiled and then relinked it might cause issues? I wonder if this is fixable by having all inline-static variables actually be static variables inside functions which are compiled inside the Kokkos library itself. I.e. for every |
@kaschau do you feel you could take this experiment on, i.e. make a branch of Kokkos Core go through all these variables and see if we can get this fixed that way? |
@crtrott I'm a c++ ignoramos but I think I can give it a shot. I think just being able to prove one variable (the tile size for example) survives this way should be doable for me, as a proof of concept. |
@kaschau A bit of a shot in the dark but try setting this variable to OFF and rebuild pykokkos-base:
I suspect the reason you see this issue with shared libraries is there is some symbol that exists in both the pykokkos-base library and the Kokkos library and pykokkos-base is initializing it's copy of the symbol instead of the one that exists in the Kokkos library. And when a static Kokkos library is used, these symbols get merged. |
A potential starting place might be to use the |
@jrmadsen Tried this, still had the same issue. I will take a look at |
Commit that broke pybind11 : kokkos/kokkos@1f048cf
And python itself
|
I was trying to figure out what is going on in my case, and something very odd is happening since if i look at the addresses of this variable in here and here they are different.
Everything starts to work properly |
Kokkos 4.0 changed many class members set with
Kokkos::initialize()
toinline static T
types. With this change it seems there is an issue with pybind11 and setting these members persistently when called from python.Whenever using cuda, the
TileSizeProperties
attributemaxThreads
is being set to zeros, and causes an abort at the first MDRange execution.When
Kokkos::initialize()
is called (from python bound function),cudaProp.maxThreadsPerMultiProcessor
(from here ) reports 1024, however, by the time we get to the MDRange policy here, thespace.impl_internal_space_instance()->m_maxThreadsPerSM
is 0. This causes an abort at this check here.I am only having an issue with CUDA, and it works fine with OpenMP and Serial backends. It has been consistent with every host/device compiler I have tried.
Primarily gcc 9.4.0/intel19.04 + CUDA 11.7
The text was updated successfully, but these errors were encountered: