Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SHTns] up to v3.7 and add CUDA support #9813

Open
wants to merge 69 commits into
base: master
Choose a base branch
from
Open
Changes from 67 commits
Commits
Show all changes
69 commits
Select commit Hold shift + click to select a range
2660c9c
start cuda shtns
fgerick Nov 18, 2024
7f60436
up SHTns v3.7 and compile cuda
fgerick Nov 19, 2024
becbf42
cuda not found on buildkite, change library path
fgerick Nov 19, 2024
e66689a
rm gencode flags, exports outside if block
fgerick Nov 19, 2024
c8e3b44
try without enable-shared configure
fgerick Nov 19, 2024
d044097
forgot -Xcompiler
fgerick Nov 19, 2024
e31f06c
try LDFLAGS to find cuda
fgerick Nov 19, 2024
c8afb81
change cuda lib path in configure
fgerick Nov 19, 2024
53eda2b
another try of compiling cuda
fgerick Nov 19, 2024
ee60a74
flag dont dlopen
fgerick Nov 19, 2024
4252219
yggdrasilpath
fgerick Nov 19, 2024
4575095
unlink and no cxx expand
fgerick Nov 19, 2024
382d6b4
remove skip audit, infinite loops
fgerick Nov 19, 2024
f5c342e
change deps, infinite loop
fgerick Nov 19, 2024
2ea398f
remove dont dlopen and use new augment block for cuda
fgerick Nov 19, 2024
b72fab7
move cuda softlink and retry dont dlopen with new augment
fgerick Nov 19, 2024
e80f47c
dlopen
fgerick Nov 19, 2024
35d55be
use only cuda.augment dont dlopen
fgerick Nov 20, 2024
daccf65
expand microarchitecture test
fgerick Nov 20, 2024
c9bf7a4
build everything, autofix solves issue
fgerick Nov 20, 2024
38f750b
again infinite loop, change microarchitecture build
fgerick Nov 20, 2024
2f01fc4
remove microarchitecture cuda
fgerick Nov 20, 2024
fffdec7
cannot build multiple cuda versions at once
fgerick Nov 20, 2024
32526e9
different build_tarballs
fgerick Nov 20, 2024
fe23804
another build_tarballs
fgerick Nov 20, 2024
ed3d40b
infinite loop again, revert
fgerick Nov 20, 2024
59e50b7
only one
fgerick Nov 20, 2024
0b0388c
another variant
fgerick Nov 20, 2024
9aa971a
dlopen
fgerick Nov 20, 2024
62decbc
cuda augment and skip
fgerick Nov 26, 2024
fe6d625
no preferred gcc and julia 1.10
fgerick Nov 26, 2024
46e620a
change cudapath
fgerick Nov 26, 2024
8ccf06d
no cuda
fgerick Nov 26, 2024
9f55efe
build cuda platforms
fgerick Nov 26, 2024
59f0875
should build platform to avoid infinite loop
fgerick Nov 26, 2024
ebedb2f
fancy toys
fgerick Nov 26, 2024
cfab789
fancy toys path
fgerick Nov 26, 2024
2203fdb
no softlink
fgerick Nov 26, 2024
eb65b0f
library linking
fgerick Nov 26, 2024
65331fa
build all
fgerick Nov 26, 2024
fbdde51
reverse platforms
fgerick Nov 26, 2024
8260c64
another try
fgerick Nov 26, 2024
7c6fe2b
typo
fgerick Nov 26, 2024
be55376
expand cuda platforms
fgerick Nov 27, 2024
adfdd45
only cuda
fgerick Nov 27, 2024
128ee4d
complicated build loop
fgerick Nov 27, 2024
bc73c7d
complicated build loop
fgerick Nov 27, 2024
583f1d8
complicated build loop v3
fgerick Nov 27, 2024
a6b9284
complicated build loop v4
fgerick Nov 27, 2024
91a84c5
complicated build loop v5
fgerick Nov 27, 2024
823401f
again different
fgerick Nov 27, 2024
fafd11a
all platforms
fgerick Nov 27, 2024
973b736
cuda none for all platforms
fgerick Nov 27, 2024
b238ee0
doesnt work
fgerick Nov 27, 2024
978e8c8
different
fgerick Nov 27, 2024
9f24e8e
different v2
fgerick Nov 27, 2024
58f442b
different v3
fgerick Nov 27, 2024
981801d
different v4
fgerick Nov 27, 2024
80f2661
dont expand
fgerick Nov 27, 2024
696349a
different v5
fgerick Nov 27, 2024
61bf0c0
different v6
fgerick Nov 27, 2024
0adfc6b
different v7
fgerick Nov 27, 2024
3edaf20
different v8
fgerick Nov 27, 2024
1fd943d
remove mtune=skylake for cc and change gencode for nvcc
fgerick Nov 27, 2024
b411251
don't build vor cuda v11.4
fgerick Nov 27, 2024
fc65c6c
don't build for cuda v11.4
fgerick Nov 27, 2024
a94217b
Merge branch 'master' of https://github.com/fgerick/Yggdrasil
fgerick Nov 27, 2024
8bf67d4
remove blank
fgerick Nov 29, 2024
817704e
version string
fgerick Nov 29, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
88 changes: 55 additions & 33 deletions S/SHTns/build_tarballs.jl
Original file line number Diff line number Diff line change
Expand Up @@ -2,53 +2,66 @@
# `julia build_tarballs.jl --help` to see a usage message.
using BinaryBuilder, Pkg

include(joinpath(@__DIR__, "..", "..", "platforms", "microarchitectures.jl"))
YGGDRASILPATH = joinpath(@__DIR__, "..", "..")

include(joinpath(YGGDRASILPATH, "fancy_toys.jl"))
include(joinpath(YGGDRASILPATH, "platforms", "microarchitectures.jl"))
include(joinpath(YGGDRASILPATH, "platforms", "cuda.jl"))

name = "SHTns"
version = v"3.6.6"
version = v"3.7"
version_string = version.patch == 0 ? string(version.major)*"."*string(version.minor) : string(version)
fgerick marked this conversation as resolved.
Show resolved Hide resolved

# Collection of sources required to complete build (note to self: use `sha256sum` to generate the checksum from tarball)
sources = [
ArchiveSource("https://gricad-gitlab.univ-grenoble-alpes.fr/schaeffn/shtns/-/archive/v$(version)/shtns-v$(version).tar.gz",
"f060757ed6914c837cc2b251d370078e4c92b6894fef7aac189a9a1f5f1521a2")
ArchiveSource("https://gricad-gitlab.univ-grenoble-alpes.fr/schaeffn/shtns/-/archive/v$(version_string)/shtns-v$(version_string).tar.gz",
"6c727ccc4d15d3170c3e20ad2b8a721c8b1fd838b1944c7d7e515a4fce43f75c")
]

# Bash recipe for building across all platforms
script = raw"""
cd $WORKSPACE/srcdir/shtns*/
export CFLAGS="-fPIC -O3" #only -fPIC produces slow code on linux x86 and MacOS x86 (maybe others)
export LDFLAGS=""

#remove lfftw3_omp library references, as FFTW_jll does not provide it
sed -i -e 's/lfftw3_omp/lfftw3/' configure
sed -i -e 's/lfftw3_omp/lfftw3/g' configure

#remove mtune and gencode flags, replace by nvcc -arch=all (good?)
sed -i -e '/-mtune=skylake/d' configure
sed -i -e 's/nvcc -std=c++11 \$nvcc_gencode_flags/nvcc -Xcompiler -fPIC -std=c++11 -arch=all/' configure

sed -i -e 's/lib64/lib/g' configure

./configure --prefix=${prefix} --host=${target} --enable-openmp --enable-kernel-compiler=cc
make -j${nproc}
make install
configure_args="--prefix=${prefix} --host=${target} --enable-openmp --enable-kernel-compiler=cc "
link_flags="-lfftw3 -lm "

if [[ -d "${prefix}/cuda" ]]; then
export CUDA_PATH="$prefix/cuda"
export PATH=$CUDA_PATH/bin:$PATH
LDFLAGS+="-L$CUDA_PATH/lib -L$CUDA_PATH/lib/stubs"
configure_args+="--enable-cuda"
link_flags+="-lcuda -lnvrtc -lcudart"
fi

./configure $configure_args
make -j${nproc}
rm *.a
mkdir -p ${libdir}
cc -fopenmp -shared -o "${libdir}/libshtns.${dlext}" *.o -lfftw3
rm "${prefix}/lib/libshtns_omp.a"
cc -fopenmp -shared $CFLAGS $LDFLAGS -o "${libdir}/libshtns.${dlext}" *.o $link_flags

install_license LICENSE
"""

# These are the platforms we will build for by default, unless further
# platforms are passed in on the command line

# Expand for microarchitectures on x86_64 (library doesn't have CPU dispatching)
platforms = expand_microarchitectures(supported_platforms(), ["x86_64", "avx", "avx2", "avx512"])

augment_platform_block = """
$(MicroArchitectures.augment)
function augment_platform!(platform::Platform)
# We augment only x86_64
@static if Sys.ARCH === :x86_64
augment_microarchitecture!(platform)
else
platform
end
end
"""
cpu_platforms = supported_platforms()
cuda_platforms = CUDA.supported_platforms(; min_version=v"11.5") #v11.4 does not have -arch=all available

filter!(p -> arch(p) != "aarch64", cuda_platforms) #doesn't work

platforms = [cuda_platforms;cpu_platforms]

# The products that we will ensure are always built
products = [
Expand All @@ -57,15 +70,24 @@ products = [

# Dependencies that must be installed before this package can be built
dependencies = [
Dependency(PackageSpec(name="FFTW_jll", uuid="f5851436-0d7a-5f13-b9de-f02708fd171a")),
Dependency(PackageSpec(name="FFTW_jll")),
# For OpenMP we use libomp from `LLVMOpenMP_jll` where we use LLVM as compiler (BSD
# systems), and libgomp from `CompilerSupportLibraries_jll` everywhere else.
Dependency(PackageSpec(name="CompilerSupportLibraries_jll", uuid="e66e0078-7015-5450-92f7-15fbd957f2ae"); platforms=filter(!Sys.isbsd, platforms)),
Dependency(PackageSpec(name="LLVMOpenMP_jll", uuid="1d63c593-3942-5779-bab2-d838dc0a180e"); platforms=filter(Sys.isbsd, platforms)),
# systems), and libgomp from `CompilerSupportLibraries_jll` everywhere else.
fgerick marked this conversation as resolved.
Show resolved Hide resolved
Dependency(PackageSpec(name="CompilerSupportLibraries_jll"); platforms=filter(!Sys.isbsd, platforms)),
Dependency(PackageSpec(name="LLVMOpenMP_jll"); platforms=filter(Sys.isbsd, platforms)),
]

# Build the tarballs, and possibly a `build.jl` as well.
build_tarballs(ARGS, name, version, sources, script, platforms, products, dependencies;
julia_compat="1.6",
preferred_gcc_version=v"10",
augment_platform_block)
# Build the tarballs
for platform in platforms
if Sys.islinux(platform) && (arch(platform) == "x86_64") && (libc(platform) == "glibc")
if !haskey(platform,"cuda")
platform["cuda"] = "none"
end
end
should_build_platform(triplet(platform)) || continue
build_tarballs(ARGS, name, version, sources, script, [platform], products, [dependencies; CUDA.required_dependencies(platform)];
julia_compat = "1.6",
lazy_artifacts=true,
preferred_gcc_version = v"10",
augment_platform_block = CUDA.augment, dont_dlopen=true, skip_audit=true)
end