-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add CUDA support to software_layer #212
Conversation
value = 'add_property("arch","gpu")' | ||
cuda_version = 0 | ||
for dep in iter(ec_dict["dependencies"]): | ||
# Make CUDA a build dependency only (rpathing saves us from link errors) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This approach saves us from explicitly loading CUDA to run a CUDA dependent package. This allows us to write an Lmod hook that protects loading the CUDA module unless certain criteria are met (i.e., that the symlinks are unbroken).
): | ||
ec.log.info("[parse hook] Injecting gpu as Lmod arch property and envvar with CUDA version") | ||
key = "modluafooter" | ||
value = 'add_property("arch","gpu")' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This property allows us to protect the loading of any GPU package via an Lmod hook (which can be overridden): unless the compat libraries are installed you can't load GPU modules
eb_hooks.py
Outdated
target = source.replace("versions", "host_injections") | ||
os.remove(source) | ||
# Using os.symlink requires the existence of the target directory, so we use os.system | ||
os.system("ln %s %s" % (target, source)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be checking that these are succeeding
This is working but requires an additional script to unbreak the symlinks in the CUDA installation (which is being extracted from #172 ) |
This requires easybuilders/easybuild-framework#4119 or you install the compat libraries so that the rpath check passes |
install_cuda_host_injections.sh
Outdated
else | ||
# The install is pretty fat, you need lots of space for download/unpack/install (~3*5GB), need to do a space check before we proceed | ||
avail_space=$(df --output=avail ${cuda_install_dir}/ | tail -n 1 | awk '{print $1}') | ||
if (( ${avail_space} < 16000000 )); then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should allow this to be overridden
install_cuda_host_injections.sh
Outdated
# always be in versions instead of host_injections and have symlinks pointing | ||
# to host_injections for everything we're not allowed to ship | ||
# (existence of easybuild subdir implies a successful install) | ||
if [ -d ${cuda_install_dir}/software/CUDA/${install_cuda_version}/easybuild ]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The parent dir also needs to be writable.
This will allow us to log where creating directory structures under `host_injections` is breaking down.
install_cuda_host_injections.sh
Outdated
echo "CUDA software found! No need to install CUDA again, proceed with testing." | ||
else | ||
# The install is pretty fat, you need lots of space for download/unpack/install (~3*5GB), need to do a space check before we proceed | ||
avail_space=$(df --output=avail ${cuda_install_dir}/ | tail -n 1 | awk '{print $1}') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This assumes that ${cuda_install_dir}
exists, which it may not
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general this space check is a bit clumsy, it's assuming everything happens in one location but there are actually 3 (source, build, install) each of which takes ~5GB
@ocaisa Conflicts to fix. Shall we re-target this to the (new) |
…ld/4.8.2 {2023.06}[system] EasyBuild V4.8.2
GPU support implemented with #434 |
This is part of a logical splitting of #172 to make it a bit more manageable
Requires #228