-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Testing GEOSgcm with M1 GNU #417
Comments
Additional issue. I can get GEOSgcm to build just fine, but it can't run. The issue seems to be something in the M1 GNU I built. I did this following how @fxcoudert (or someone??) did in the Homebrew recipe for GCC, but by hand since on my laptop, I'm not allowed to install in I built as per my modulefile for gcc 11.2.0, where I use the same @fxcoudert tarball as well as a patch that I see for arm64 in the brew recipe. The issue seems to be that the executable built by our CMake is not handling rpaths the same on Arm as it does on Intel. When
and if I
Now, let us compare that to what I see on my Intel Mac (on Big Sur):
So...not sure what to do. I do see some output in this Homebrew issue: like:
and I wonder if I need to do the same thing in my by-hand install of GCC? I never have to do it on my version of GCC on Intel on Big Sur, but that is both architecture and OS changing! A few swings around Google did bring up a configure flag called Sadly, I nuked my original build of GCC to try |
OK. Can we start from which version of gcc you have built and how you have configured it. It should be entirely possible (on Arm64 or x86_64) [gcc-11.3-darwin-r0 or gcc-12.1-darwin0r0] to configure it to be installed in your home directory (or /users/Shared) i.e. somewhere writable to non-admin. If you then use that compiler to build a dependent project, the compiler should embed the rpaths that it needs (additional to any that are supplied by CMAKE or other build system). This is (probably) not an issue with GEOSgcm but more with the mechanisms being used to combine pieces of the build environments. |
Oooh. Interesting! I see from this comment, @iains has recommended using: https://github.com/iains/gcc-11-branch/releases/tag/gcc-11.3-darwin-r0 Well then, I have a task for tomorrow I do believe! Let's try building this! |
@iains I essentially echo'd what Homebrew did as a first attempt, I grabbed the tarball they use:
and a patch:
and apply the patch. Then it's the usual way I build gcc:
It's not as fancy as Homebrew's method, but I decided to start from what I knew. As for the "not in |
That |
Okay. Per @iains to fix an issue with https://github.com/iains/gcc-11-branch This seems to have support for quad-precision, so we can stop caring about many of the PRs and issues above. However, I seem to now have an error with HDF5:
I've filed an issue (iains/gcc-11-branch#3) |
Welp, thanks to @iains in iains/gcc-11-branch#3 (comment), I can run GEOS on M1 with GCC 11.3. I'll use this space to talk about speed once I can get runs on my Intel laptop, M1 laptop, and discover at c12. I'll do C12 with and without ExtData just to be fair. |
Performance Results of GEOSgcm (6 hours, C12, 1x6, No ExtData)Release
Aggressive
|
So, I'll fill out the table more tomorrow when I build Intel models on my Intel Mac. It looks like our Release flags for M1 are equivalent to Aggressive flags on Intel. One odd thing is that the M1 Mac crashes with our M1 Aggressive flags...which aren't that different from release. Here are essentially the flags:
Everything else is pretty much the same. I wonder which one is the one M1 does not like... |
Would larger final Tput be "better"? Could you try the aggressive with TBH I am not sure what we would do for |
I'd hazard that the loop unrolling heuristics might be geared to Intel? (they have the appearance of being experimentally-determined) .. presumably, there will be some similar sweet spot for M1. |
Ah. Yes. "Tput" = "throughput" (in model-days/wall-clock-day). I just abbreviated it because the table was already too wide :)
Sadly, moving to
(If -O2 is the cause, I'll eat my hat 😄 ) The GNU Aggressive flags we have are from Jerry DeLisle on the GCC list, so they were a bit above me. It will be interesting to figure out which flag(s) might be causing this. Though it is about 1 hour between each test thanks to big build! Though, honestly, our "Release" flags seem to have all the performance we might get anyway, so the easy answer might be to just code in CMake "if APPLE and ARM64, set Aggflag=Relflags". |
I also did a set of runs at C24 (so 4x the number of columns per process) as this is a bit closer to the per-process number-of-columns we run with. And the numbers are much the same. Good news is that GCC 11 M1 is roughly as good as Intel 2022 on Coffee Lake with Aggressive flags! Kudos to @iains and the GCC Team (and I suppose the Apple chip engineers)! Performance Results of GEOSgcm (6 hours, C24, 1x6, No ExtData)Release
Aggressive
|
Unless one has a chocolate hat, always a dodgy statement :)
Well, obv. that's a reliable source - was that also for 'aarch64'?
I did some simplistic benchmarking in early days (on a DTK) using a fortran code - which suggested that the arm chip did very well against regular (non AVX512) chips ... but I need to dig that out and re-run on my cascade box. |
Oh, no. He had suggestions just for Intel. My only experience with aarch64 is with Graviton2 on AWS. And there I spent a few days figuring out this:
let things work with Release and Aggressive (and it was happy with the rest of our flags). I think I/we are just in pioneer space which is when you start to try all the flags and see what helps! Maybe once more people start using M1 + gfortran, I can steal/borrow the flags from OpenBenchmarking Fortran tests (though they usually just do |
|
Interesting. I tried doing: Oh well, at least I learned about I suppose my next work is to move on to GCC 12 and see if it has the same issues with our model on M1 as it does on Intel. |
did you have a chance to try the 12.1-pre-r1 version? edit: is the Intel issue compiler-related or something else? |
Not yet. I'm hoping to try building it today. I was pulled away to other work, but I have some time now to build compilers, etc. I'm finalizing my "get our flags right" setup first. Hopefully then GCC 12 will be plug-and-play!
This issue is...weird. For some reason, GCC 12 does not like GEOS and I don't know why. I recently ran 10 runs of C12 with GCC 11 and GCC 12. With GCC 11, all 10 ran successfully. With GCC 12, only 2 of 10 ran. The other eight died in the same vertical remapping call. But no flags or source code changed between these two runs, just changed the compiler! And the crash is sporadic. One dies at 1000z, another at 0700z, another at 1330z... it's so random! I'm going to try doing a 10-run set at a higher resolution and see what I see. C12 is so coarse that I might just be exposing a weird setup issue. But C24 is a sort of my "gold standard" low-res case. But maybe we're seeing some sort of memory corruption on our end that GCC 12 is just more sensitive to? |
Well, same issue at C24. Not as "severe" as 70% of the runs succeeded, but the ones that failed failed on the same line:
with:
That is some boring math to die on. And we've never seen it die with our debugging flags with Intel or GCC before. Weird... |
yeah, probably what's more likely is that one of the array accesses is going wrong and then some inappropriate number is being pulled into the calculation. edit : well unless there's some possibility of underflow in the subtractions or overflow in the product. edit2: does this happen without the fancy loop unroll params? Is there any way to produce the assembly / .s or .i even (plus the flags used to compile) .. ? |
@iains Looks like your GCC 12.1 branch runs! Using a sample size of 1 run, it seems a bit slower than 11.3...but that could just be variability of the laptop (maybe it hit the efficiency cores a bit more?). One interesting thing is that it looks like C12 GCC 12.1 on M1 is stable. I ran it a few times and nothing went wiggy. So I thought I should look at the default Intel GNU flags:
I mean, westmere is ancient at this point and I think everything we might run on is Broadwell/Haswell at least (and with Intel Fortran we don't support anything below that because we run with core-avx2) so I decided, heck, let's try Does this tell me why GCC 12 doesn't seem to like GEOS anymore? No. But is a valid workaround if it means not supporting a processor we don't use? I mean... 😄 |
heh. I have to confess that nothing leaps out as meaningful from those options - probably using -mtune=haswell too might be more reasonable. I do have (and use for testing) westmere and nehalem machines - but only used for testing older OS revs. good to read that M1 on 12.1 is better, although that is probably nothing to do with the Darwin port - we have pretty much the same code on 11.3 and 12.1 (and if I get to it in time on the upcoming 10.4). I've no way to judge what effect we might get with the efficiency cores ... |
Yeah. I might need to stroll through the Open MPI lists/repos and ask/annoy the gurus there (sorry in advance @jsquyres) and see if they have info about what happens if you do Heck, from an |
Final thought: With #426 we should get M1 compatibility. Huzzah! |
Let's do an update of the timings with the newer GNU flags and now that I have access to an M1 Max. These are all 1-day and no extdata, history, or checkpointing to interfere. I think the first take away is "M1 Max be nice". I mean, it's right there with Cascade Lake with Intel compilers on Aggressive optimizations. And I'm guessing the fine folks at GNU could probably figure out how to tune for the M1 given time. At the moment I'm mainly doing:
because "it works" but it's not tuned at all and Aggressive is pretty much just Release with the M1. It's now time to try fiddling around. The Performance Results of GEOSgcm (1 day, C24, 1x6, No ExtData, No history, No checkpointing)Release
Aggressive
|
Update: |
I'm going to open this as a tracking branch for GEOSgcm issues when running with GNU on M1.
Thanks to @iains, these are not needed anymore for M1 and GEOS.
pFUnit (Build fails when Real128 not available Goddard-Fortran-Ecosystem/pFUnit#338 (comment))pFlogger (REAL128 issues with pFlogger Goddard-Fortran-Ecosystem/pFlogger#80)GMAO_stoch (Allow for non-quad-precision systems GMAO_Shared#261)At present the only real "bug" would be in ESMA_env where we need a slight change to
g5_modules
to make it not complain:The text was updated successfully, but these errors were encountered: