Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update GH CI workflows #816

Merged
merged 32 commits into from
Dec 31, 2024
Merged

Conversation

DavidHuber-NOAA
Copy link
Collaborator

Description

This updates the workflows for apparent recent updates to the GH CI environment.

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)

How Has This Been Tested?

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • New and existing tests pass with my changes

@CoryMartin-NOAA CoryMartin-NOAA self-requested a review December 11, 2024 16:10
Copy link
Contributor

@RussTreadon-NOAA RussTreadon-NOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't test but assume it works.

Approve.

@RussTreadon-NOAA
Copy link
Contributor

Workflow tests are still failing

@DavidHuber-NOAA
Copy link
Collaborator Author

This isn't working yet. There are still issues with both the GCC and Intel builds. I'm waiting for @AlexanderRichert-NOAA to get a look at this.

@AlexanderRichert-NOAA
Copy link
Contributor

For the GCC/gsi failure, it looks like it's not finding libblas. In the most recent successful run of that workflow, it found /usr/lib/x86_64-linux-gnu/libblas.so, so I'm assuming this is what you want. I don't see it being explicitly installed anywhere, so maybe it was installed by default previously but got removed from the runner image? In any case, I would try sudo apt install libblas-dev and see if that does the trick.

I'll see if I can narrow down which command is failing in the Intel/build job.

@AlexanderRichert-NOAA
Copy link
Contributor

AlexanderRichert-NOAA commented Dec 11, 2024

The issue with the intel one is your [[ ! -z "${...}" ]] line-- I'm not sure the exact issue, but in any case using if [[ ... ]]; then ... seems to fix it.

@DavidHuber-NOAA
Copy link
Collaborator Author

Ah good calls, thanks @AlexanderRichert-NOAA! I'll give those a shot.

@DavidHuber-NOAA
Copy link
Collaborator Author

@AlexanderRichert-NOAA I was able to get a bit further, but both builds are still failing.

For GCC, mpich (mpicc) is looking for gcc-10 in /usr/bin instead of looking in the spack environment. And earlier (during the setup stage), it appears that the environment is using /usr/bin/gcc-10 in the environment, so I'm not sure why it cannot find it during the gsi stage. Do you have an idea on what is missing here?

For Intel, it is failing to build snappy as there are undefined symbols (e.g. _Float32). I believe it is trying to get these symbols from GCC, but I'm not sure. Did spack-stack have to do anything special to build snappy with Intel compilers?

@DavidHuber-NOAA
Copy link
Collaborator Author

@RussTreadon-NOAA It may be possible to set up an image on a cloud instance and try building via spack there, though I don't think the same Ubuntu image is available, so it wouldn't be a 1-to-1 representation.

@RussTreadon-NOAA
Copy link
Contributor

@DavidHuber-NOAA . Several GSI-based repos contain the same workflow scripts and are encountering the same failure (e.g, GSI-utils PR #59). Are any other non-GSI EMC repos running similar CI but not experiencing the same problem? If yes, can we apply their approach in GSI repos?

@DavidHuber-NOAA
Copy link
Collaborator Author

The ufs_utils has a similar setup and it looks like @AlexanderRichert-NOAA just fixed the Intel builds in ufs-community/UFS_UTILS#998, though it appears that he moved to the new LLVM compilers. I can see if a hybrid approach will work.

@RussTreadon-NOAA
Copy link
Contributor

OK. Thank you @DavidHuber-NOAA ... but only work on this as your other tasks and priorities allow.

@RussTreadon-NOAA
Copy link
Contributor

RussTreadon-NOAA commented Dec 18, 2024

@DavidHuber-NOAA : gcc passes but intel fails. The intel failure is with spack. I see ubuntu-ci-x86_64-intel.yaml in spack-stack/.github/workflows.

Can the spack-stack team help us resolve this problem?

@AlexanderRichert-NOAA
Copy link
Contributor

I'm looking at this right now. Disabling blosc eliminates the snappy dependency, so that problem goes away, but now it's failing on building netcdf-c because it can't find hdf5... I'll let you know when I have a fix.

@RussTreadon-NOAA
Copy link
Contributor

Thank you @AlexanderRichert-NOAA . Sorry that this has become such a time consuming endeavor for @DavidHuber-NOAA and you.

@AlexanderRichert-NOAA
Copy link
Contributor

@DavidHuber-NOAA please check out my version of your branch, it seems to be working with Intel classic (it looks like you made some more recent updates involving oneAPI which I have not incorporated). The highlights are that it relocates /usr/local (I think this was messing up cmake), installs external blas/lapack, and gets rid of the c-blosc dependency for netcdf-c.

@DavidHuber-NOAA
Copy link
Collaborator Author

Thanks @AlexanderRichert-NOAA I will give that a shot today.

@DavidHuber-NOAA
Copy link
Collaborator Author

@AlexanderRichert-NOAA @RussTreadon-NOAA The CI builds are appear to again be working!! Marking this PR ready for review.

@DavidHuber-NOAA DavidHuber-NOAA marked this pull request as ready for review December 30, 2024 16:45
@RussTreadon-NOAA RussTreadon-NOAA self-requested a review December 30, 2024 17:22
Copy link
Contributor

@RussTreadon-NOAA RussTreadon-NOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

github CI passes.

Approve.

@RussTreadon-NOAA
Copy link
Contributor

Thank you @CoryMartin-NOAA for the review and approval. @ShunLiu-NOAA and @hu5970 , this PR is ready for merger into develop. If you're OK with the PR, any of us can merge it into develop.

@RussTreadon-NOAA RussTreadon-NOAA merged commit 27c03e8 into NOAA-EMC:develop Dec 31, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants