-
-
Notifications
You must be signed in to change notification settings - Fork 15.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rocmPackages: 6.0.2 -> 6.3.3, and various ROCm build fixes and new packages #367695
base: staging
Are you sure you want to change the base?
Conversation
76e05f1
to
c2abb37
Compare
not knowing much about the rocm stack, mainly here as i am using btop with rocm support enabled. I receive the following error when compiling
|
@Shawn8901 Should be fixed now, was broken when I first opened the PR. |
One thing that I've wanted to do for a long time is to completely remove the ROCm LLVM as an stdEnv, which should solve a lot of these weird compilation errors. Solus's ROCm stack does this, and we compile every non-HIP code with GCC. In this way, the entire ROCm LLVM can be compacted into a single derivation and the complexity of packaging/updating ROCm LLVM is drastically reduced. That is, you should be able to use just the default stdenv with GCC to compile non-HIP code and tell CMake/HIPCC to use the ROCm LLVM only when compiling HIP code. You can achieve this entirely through environment variables. It doesn't make sense that because a portion of the codebase contains HIP code, any C/C++ in the codebase needs to be compiled with ROCm LLVM's C compiler. |
Ok I just noticed the "Contemplate trying to make a normal Nix style CC wrapper work again" section, so it seems like you've already experienced the pain of the ROCm LLVM 😅 I will give my idea a try in the next few days and get back. |
It looks like upstream are moving away from a separate hipcc and using clang (now Maintaining a separate HIP only compiler might require maintaining significant cmakefile patches to get it to be used, but if you can work out a way to do this that isn't maintenance hell that's great. |
Please see https://github.com/GZGavinZhao/rocm-llvm-project/commits/solus-rocm-6.2.x for the patches and https://lists.debian.org/debian-ai/2024/12/msg00042.html for more details. I hope they apply cleanly on v6.3, but if not I think the changes are easy enough to manually rewrite them. If you need patches for other components, please see |
Solus does this and we didn't have to use any patches. Most of the work done was figuring out the environment variables to tell CMake and/or HIPCC what our intended HIP compiler is. The only thing I'm worrying about is locating sysroots due to non-standard installation prefix, but other than that Solus's experience shows that this is definitely doable. |
22f00e1
to
c05b8cb
Compare
This should be testable (including torch) if you have an instinct MI50 or newer, Radeon VII, W/RX 6800 or similar, W/RX 78/900 or similar. Cards which relied on the ISA compat patches like iGPUs or 66/7xx aren't going to work, haven't applied @GZGavinZhao's updated patches yet. If you're going to try to test it you want >200G free space so hipblaslt can spew an unfathomable amount of asm into the build temp directory and a lot of cores or you'll be here all month. |
I'm having trouble getting this to build. I get Failed Tests (2): during the triton-llvm-19.1.0-rc1 test phase. RX6800XT. X86_64-linux on nixos. No overlays or config or anything, just trying to create a devshell with python312Packages.torch. I can post a (nearly) minimum reproducible flake:
|
@henryrgithub Your test flake is using non-rocm torch which is broken in master with the same error.
If you use
Recommend updating inputs before starting a build of torchWithRocm, just fixed some issues. |
|
}; | ||
|
||
rocm-opencl-runtime = symlinkJoin { | ||
name = "rocm-opencl-runtime-meta"; | ||
# FIXME: we have compressed code objects now, may be able to skip two stages? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CCOB (compressed code object bundle) is on by default for composable_kernel for ROCm v6.2+, so yes no need for two-stage builds now.
I tried to use this PR via overlay
But got a collision when building ollama
But overall it seems to be building without errors. When not using ollama I was able to rebuild my system without errors. |
I got a separate HIP compiler working and successfully compiled |
ROCm's standard toolchain is clang + GNU libs including libstdc++. There are a few packages which don't compile with Clashing error is because I added a /llvm link to clr for use by other ROCm packages and ollama also already adds one to its ROCm env internally, can be resolved by dropping the /llvm link from ollama. |
Added exact GPU targets as pkgs.rocmPackages_6.gfx908, gfx1030 etc. |
JFYI, I've just built my system from your latest commit (without rocmSupport = true, that takes ages, just rocm itself and ollama). Finally it works and I can schedule models across several GPUs without resorting to Docker. Thank you. Zluda support would be very nice to have, it makes many things much easier. If you are open to donations I would be happy to chip in. |
I repaired exo to properly detect AMD last night, however tinygrad needs Was unable to rebuild tinygrad w/ ROCMSupport because of Pytorch being marked as broken (overriding fails building magma-2.9.0). This morning I repointed my nixpkgs to your branch to attempt to build against, it spent a very long time to compiling on my 13900K (w/ 6900XT) before eventually dying on Curious how you got composable_kernel to build successfully on your set-up? Also I want to thank you for your Herculean efforts on this PR! 🍻
|
It needs all the RAM. There's supposed to be code to limit the parallelism if you don't have enough but I guess it isn't tuned right. |
I confirm that I was able to successfully build @LunNova maybe let's mark this ready for review and I'll merge? we can still do further fixes in follow-up fixups, I am just afraid we will run into merge conflicts again if we do not merge this PR soon enough. |
🤣... now that you mention it, I did notice the system over 80GB of my 128GB of mem in btop at one point. I didn't see how high it got before it finally died, if I have another go I'll try to track that. Edit: I'm rerunning, I think I might have misread available for in use, 78GB available now w/ 43 in use. |
I've gotten it to build a couple of times on 64GB with Nix limited to one build at a time, with nothing else significant running the lowest available RAM I got to was 4GB IIRC. I made a little python script just to track it 😬. Still haven't gotten models to cooperate with it, but things have been moving fast. Hope it gets merged soon, it's been fun following the progress! |
Is there anyone we need to give heads up to in terms of hydra resource usage for the slow core and memory hungry builds? |
Good point. Maybe this should be targeted at staging instead of master? |
use zram swap, limit jobs count and core count in nix settings, the higher the parallelism, the more ram it needs. Use remote builders to speed things up. In my experiments, a 128Gb machine should be limited to 24-28 cores with zram on in order to be able to finish the build. On how high it might get... While running with 64 cores it ate 230Gb. This might also help (add your VRAM as a swap, it's very efficient):
There is a neat calculator on this page: https://www.foroelectro.net/english-f34/using-excess-video-memory-as-swap-in-linux-t497.html but essentially you just have to sum two numbers. Source: https://wiki.archlinux.org/title/Swap_on_video_RAM Test your swap before you build, otherwise there might be nasty surprises. Turn off swaps on your nvme/ssd, if you leave them on, that almost guarantees 12309-ish freeze. |
url = "https://github.com/GZGavinZhao/MIOpen/commit/416088b534618bd669a765afce59cfc7197064c1.patch"; | ||
hash = "sha256-OwONCA68y8s2GqtQj+OtotXwUXQ5jM8tpeM92iaD4MU="; | ||
}) | ||
# (fetchpatch { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missed this arch compat patch. Might cause missing kernels for some ops for GPUs that aren't officially supported like iGPUs.
|
if I build with
miopen fails in 54aead4:
|
Includes patch suggested by @shuni64 which fixes half precision ABI issues Includes hipblaslt compression patch Includes configurable hipblaslt support in rocblas rocmPackages_6.hipblaslt: respect NIX_BUILD_CORES in tensilelite rocmPackages_6.hipblas: propagate hipblas-common rocmPackages_6.clr: avoid confusion with hipClangPath Co-authored-by: Gavin Zhao <[email protected]>
rocmPackages.hsa-amd-aqlprofile-bin: 6.3.0 -> 6.3.3 rocmPackages.composable_kernel: unsplit build and clean up remove unused disable-amdgpu-inline.patch update documentation for build flags composable_kernel's library files are compressed now so no need to apply our own compression Co-authored-by: Gavin Zhao <[email protected]> rocmPackages.llvm: comment out -orig attrs and document usage rocmPackages.amdsmi: fix build rocmPackages_6.rocm-docs-core: fix build rocmPackages.rpp-opencl: mark broken rocmPackages.rocgdb: fix build rocmPackages.migraphx: fix build rocmPackages.llvm: remove unneeded duplicate patch rocmPackages.rdc: fix build
Signed-off-by: Gavin Zhao <[email protected]>
Fixes the build against rocmPackages_6 Can't easily apply the PR as a patch because we rely on the tarball with pregenerated hipified files ∴ fetchpatch of the PR will apply cleanly but fail to build
@pshirshov Should be fixed, marked CK as broken when it has no gfx9 arches selected and disabled MIOpen support for it with the same condition. |
Fixes #337159
Fixes #383836
Fixes #379354
WIP bump to 6.3 for rocmPackages_6 package set.
Upstream PRs/issues Raised
TODO List
and msgpackworking for hipblaslt. 10GB derivation is not ok.Make use of working LLVM packages .override to simplify LLVMoverrideScope isn't present and is needed.Allow better build parallelism by creating -minimal versions of some of the huge packages built for no gfx archesToo difficult, not doing in this PRThings done
nix.conf
? (See Nix manual)sandbox = relaxed
sandbox = true
nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD"
. Note: all changes have to be committed, also see nixpkgs-review usage./result/bin/
)Add a 👍 reaction to pull requests you find important.