-
Notifications
You must be signed in to change notification settings - Fork 10k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ROCm: use native CMake HIP support #5966
Conversation
Since this requires cmake 3.21, I think it would be good to add a |
@slaren Good point, adjusted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cannot test this, but the change looks good to me.
@slaren responding to the test request. Since I learned that the only way this works on Fedora is to pass Not setting anything for
If I pass
Shouldn't it first look in So basically... either I am not understanding how to use the new options or they do not work on Fedora. Could you please hint on how the build should be started? |
I was hoping that with these changes you wouldn't need to pass additional parameters to |
|
@GZGavinZhao but isn't this exactly what cause the segfault for me in #6031 ? When I was passing system's clang it crashed, when I was passing So, based on what you wrote above I tried:
As far as I can tell, if |
Ah sorry, |
Sorry my bad - I mispasted the
But then when I
I think Fedora has a slightly non standard setup at the moment, they use vanilla llvm, but add some AMD bits and pieces on top and I guess this is why compiling with |
Ah, so you should be calling To clarify,
|
Ok, passing the env vars to This was the line how it compiled:
Now, unfortunately the real issue here is that building it like that causes it to crash just as described in #6031 and with this PR applied I can not build it "the old way" anymore, which was needed to avoid the crash, so the "working" build on Fedora works on
Based on my tests I'd say that the PR would not make it possible to build a working version of |
Hmmm it's weird that they introduce different compiling results... My suspicion is that |
Here it is:
|
Ah okay, so Edit: if that works, then please try if |
By the way, now with |
Interesting. Does running |
Hmm, it does indeed. So now we reduced it all to just:
And here is the cmake otuput:
Only the warnings seem odd, I guess some hip-only flags are also passed to the non-hip build:
I wonder if there is a way for you to autodetect what to set in I'm off for tonight, so if you need more testing I'll be around tomorrow. |
Yes, in fact there is! When you get a chance, please try this: HIPCXX="$(hipconfig -l)/clang" cmake <cmake-options>
make If this works, I'm going to add this info to the documentation in this PR. |
OK now that's odd and I am not sure if I messed something up, now suddenly it can't find the ROCm directory again and is looking in I am sure I did an |
So it'd be:
|
Can you update the build instructions for HIP in the |
One note though, if I understood Fedora packagers correctly |
I think it is good (and important) to follow AMD's recommendations for building. If Fedora wants to deviate from this process, they should deal with the problems that it causes. |
Isn't AMD's recommendation exactly this - to build everything exclusively with their llvm version or am I mistaken? Afaik the whole ROCm suite is also supposed to be built like that? |
I don't see that in the documentation linked here (https://rocm.docs.amd.com/en/docs-6.0.0/conceptual/cmake-packages.html#using-hip-in-cmake). It may have been that way in the past to deal with poor support in the build tools, but that no longer seems to be the case. I don't see any reason to compile source files without HIP code with the HIP compiler. |
I will. Sorry for the delay, got a bit busy yesterday. I'll update the docs later today. |
Fallback added, docs added, and added a CI check for HIP builds. I still need to test on Windows though. |
Bruh the docker image for ROCm is so big the runner went out of space. Used a smaller ROCm image and we install dependencies by ourselves. |
Have you been able to test this on Windows? |
I tried
That file doesn't seem to be included in the HIP 5.5 SDK. Maybe change the regex in CMakeLists.txt to something like After making the regex change this worked for me, getting the old build behaviour:
After installing the 5.7 SDK and replacing the 5.5 folder in my
So proper Windows support will need to wait for a fixed release from AMD. |
Sorry, I've been a bit busy. I will update the hipcc detection and give Windows ROCm a try tonight. It's weird that 5.7 fails on that specific error, maybe some path needs to be passed manually... |
a6428e6
to
e067689
Compare
ROCm 5.7 SDK for Windows 10 & 11 is indeed broken and not fixable on our side, because 1) a hard-coded path sneaked into their CMake files that broke file search and 2) the file to search for ( |
Yes it was totally broken that why I follow your request and crying AMD to fix it 3 months ago :)) |
Windows CI have just been killing me. I don't get how the folks at ollama managed to get it working with the 5.7 SDK. If I still can't figure it out by the end of today I'm going to switch back to the 5.5 SDK that will hopefully make the process easier. |
Only CMakeLists.txt.
Only CMakeLists.txt.
Only CMakeLists.txt.
Only CMakeLists.txt.
dafc19a
to
ea7e38b
Compare
Hi, sorry for the delay. It seems like native CMake HIP support for Windows is just not ready yet (ROCm/ROCm#2996), so if on Windows we fallback to the old behavior. Windows CI is added and passing. Also, I don't think this should be a https://github.com/ggerganov/llama.cpp/labels/review%20complexity%20%3A%20high. We're just switching to a better supported way to compile HIP, and the actual changes is only about 20 lines of CMake. The rest are mainly for CI testing. |
Supercedes ggerganov#4024 and ggerganov#4813. CMake's native HIP support has become the recommended way to add HIP code into a project (see [here](https://rocm.docs.amd.com/en/docs-6.0.0/conceptual/cmake-packages.html#using-hip-in-cmake)). This PR makes the following changes: 1. The environment variable `HIPCXX` or CMake option `CMAKE_HIP_COMPILER` should be used to specify the HIP compiler. Notably this shouldn't be `hipcc`, but ROCm's clang, which usually resides in `$ROCM_PATH/llvm/bin/clang`. Previously this was control by `CMAKE_C_COMPILER` and `CMAKE_CXX_COMPILER`. Note that since native CMake HIP support is not yet available on Windows, on Windows we fall back to the old behavior. 2. CMake option `CMAKE_HIP_ARCHITECTURES` is used to control the GPU architectures to build for. Previously this was controled by `GPU_TARGETS`. 3. Updated the Nix recipe to account for these new changes. 4. The GPU targets to build against in the Nix recipe is now consistent with the supported GPU targets in nixpkgs. 5. Added CI checks for HIP on both Linux and Windows. On Linux, we test both the new and old behavior. The most important part about this PR is the separation of the HIP compiler and the C/C++ compiler. This allows users to choose a different C/C++ compiler if desired, compared to the current situation where when building for ROCm support, everything must be compiled with ROCm's clang. ~~Makefile is unchanged. Please let me know if we want to be consistent on variables' naming because Makefile still uses `GPU_TARGETS` to control architectures to build for, but I feel like setting `CMAKE_HIP_ARCHITECTURES` is a bit awkward when you're calling `make`.~~ Makefile used `GPU_TARGETS` but the README says to use `AMDGPU_TARGETS`. For consistency with CMake, all usage of `GPU_TARGETS` in Makefile has been updated to `AMDGPU_TARGETS`. Thanks to the suggestion of @jin-eld, to maintain backwards compatibility (and not break too many downstream users' builds), if `CMAKE_CXX_COMPILER` ends with `hipcc`, then we still compile using the original behavior and emit a warning that recommends switching to the new HIP support. Similarly, if `AMDGPU_TARGETS` is set but `CMAKE_HIP_ARCHITECTURES` is not, then we forward `AMDGPU_TARGETS` to `CMAKE_HIP_ARCHITECTURES` to ease the transition to the new HIP support. Signed-off-by: Gavin Zhao <[email protected]>
Supercedes #4024 and #4813.
CMake's native HIP support has become the recommended way to add HIP code into a project (see here). This PR makes the following changes:
The environment variable
HIPCXX
or CMake optionCMAKE_HIP_COMPILER
should be used to specify the HIP compiler. Notably this shouldn't behipcc
, but ROCm's clang, which usually resides in$ROCM_PATH/llvm/bin/clang
. Previously this was control byCMAKE_C_COMPILER
andCMAKE_CXX_COMPILER
.CMake option
CMAKE_HIP_ARCHITECTURES
is used to control the GPU architectures to build for. Previously this was controled byGPU_TARGETS
.Updated the Nix recipe to account for these new changes.
The GPU targets to build against in the Nix recipe is now consistent with the supported GPU targets in nixpkgs.
Added CI checks for HIP on both Linux and Windows. On Linux, both the new and old behavior are tested. On Windows, only the old behavior is tested. Please see the end of the PR description for the rationale.
The most important part about this PR is the separation of the HIP compiler and the C/C++ compiler. This allows users to choose a different C/C++ compiler if desired, compared to the current situation where when building for ROCm support, everything must be compiled with ROCm's clang.
Makefile is unchanged. Please let me know if we want to be consistent on variables' naming because Makefile still usesMakefile usedGPU_TARGETS
to control architectures to build for, but I feel like settingCMAKE_HIP_ARCHITECTURES
is a bit awkward when you're callingmake
.GPU_TARGETS
but the README says to useAMDGPU_TARGETS
. For consistency with CMake, all usage ofGPU_TARGETS
in Makefile has been updated toAMDGPU_TARGETS
.Thanks to the suggestion of @jin-eld, to maintain backwards compatibility (and not break too many downstream users' builds), if
CMAKE_CXX_COMPILER
ends withhipcc
, then we still compile using the original behavior and emit a warning that recommends switching to the new HIP support. Similarly, ifAMDGPU_TARGETS
is set butCMAKE_HIP_ARCHITECTURES
is not, then we forwardAMDGPU_TARGETS
toCMAKE_HIP_ARCHITECTURES
to ease the transition to the new HIP support.Since native CMake HIP support is still problematic on Windows (ROCm/ROCm#2996), on Windows we still compile using the original behavior.