Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TTG device tasks with coroutines #265

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
154 commits
Select commit Hold shift + click to select a range
31af820
Add support for HIP
devreal Aug 28, 2023
9b817f3
Bump PaRSEC version to current master
devreal Aug 28, 2023
30d7ee0
SPMM: move tiles in the non-cuda version
devreal Aug 28, 2023
81cca94
Fixes to make Clang happy
devreal Aug 30, 2023
975e878
Try to send from accelerator device directly
devreal Sep 6, 2023
2b7e67c
Fix cpu-only spmm compilation
devreal Sep 6, 2023
7faf783
Add have_hip_op to TT produced by make_tt
devreal Sep 22, 2023
a423e87
Add TTG_FORCE_DEVICE_COMM to force use of device-side comm buffers
devreal Sep 25, 2023
a7fe69a
Bump TA to support ROCM allocators
devreal Sep 25, 2023
6d73621
Fix device comm code
devreal Sep 25, 2023
88f8da9
SPMM: restore write-back to Eigen matrix for verification
devreal Sep 25, 2023
3ba4a4f
We need one cublas/hipblas context per device
devreal Sep 28, 2023
b570108
Add missing header to install
devreal Sep 28, 2023
5afb32b
Bump TA sha to pull in newer Umpire
devreal Sep 29, 2023
3aff11c
SPMM: Flush out PaRSEC data between measurements
devreal Sep 29, 2023
3fa8001
Minor fixes to device cide
devreal Sep 29, 2023
4d60c3f
[cmake] bump MADNESS tag to pull in https://github.com/m-a-d-n-e-s-s/…
evaleev Sep 29, 2023
4f21ccd
Add missing device::sendk and sendv routines
devreal Sep 29, 2023
f91fa92
Add device-based chain task benchmark
devreal Oct 1, 2023
d1a6ef3
User needs to include any std:: serialization they use
therault Sep 6, 2023
7576321
Work in progress: adding CUDA version of POTRF task in POTRF example …
therault Sep 29, 2023
4a23f42
Further steps towards device-enabled POTRF
devreal Oct 1, 2023
57faac4
Add allocator to ttg::buffer
devreal Oct 2, 2023
194e93d
Add first full draft of device-enabled POTRF
devreal Oct 2, 2023
e41a4b1
POTRF: Use the right precompiler guard for CUDA and HIP
devreal Oct 2, 2023
18577f9
POTRF: make it run, still fails factorization
devreal Oct 2, 2023
b60577c
Fix release of data copies in buffer
devreal Oct 3, 2023
a2f4bbe
Enable POTRF on hip
devreal Oct 4, 2023
9654335
POTRF: add -n parameters to control number of iterations
devreal Oct 4, 2023
410dc23
Minor fixes to compile on Frontier
devreal Oct 4, 2023
5b81ac4
buffer: cannot capture this pointer for deleter and move, use raw poi…
devreal Oct 5, 2023
403b06b
POTRF: construct whole TTG each iteration
devreal Oct 5, 2023
1998655
Add TTG_ENABLE_ASAN CMake option
devreal Oct 5, 2023
548b65b
Chain benchmark: Disable wait_kernel co_yield
devreal Oct 6, 2023
fdf6e2e
PaRSEC: Increase max AM size to 1MB
devreal Oct 6, 2023
9d21225
Remove TTG cublas_helper, superseeded by devblas helper in examples
devreal Oct 6, 2023
24155bc
[cmake] FindTBB can handle OneTBB
evaleev Oct 11, 2023
ee82565
[cmake] introduce TTG_HAVE_DEVICE
evaleev Oct 11, 2023
0993f79
[cmake] introduce TTG_HAVE_DEVICE used to conditionally compile *port…
evaleev Oct 11, 2023
366e60e
bspmm: support other reals ... depends on https://github.com/ValeevGr…
evaleev Oct 11, 2023
94f3a81
Merge branch 'ttg-device-support-master-coro-with-stream-tasks' into …
devreal Oct 11, 2023
fa4aedf
Merge pull request #7 from therault/potrf-cuda-wip
devreal Oct 11, 2023
148a3ad
Add debugging capabilities for POTRF benchmark
devreal Oct 30, 2023
57f914e
Cleanup devblas support in rocm support
devreal Oct 30, 2023
162a0a1
Lazily compute norm on MatrixTile
devreal Oct 30, 2023
9dbe648
Fix previous commits
devreal Oct 30, 2023
fdc3c95
Fix previous commit
devreal Oct 30, 2023
7200e1c
Fix bspmm-cuda after device and devblas helper changes
devreal Oct 30, 2023
9d15ba1
POTRF: make tile argument to dispatcher const
devreal Oct 31, 2023
166bdec
Multi-GPU: Play nice with the PaRSEC accelerator
devreal Nov 3, 2023
33a5a8d
Bump PaRSEC and MADNESS tags for proper tear-down
devreal Nov 3, 2023
f9375c8
Bump PaRSEC reference for TTG-specific fixes
devreal Nov 6, 2023
9903f25
Bump PaRSEC ref for reworked GPU backend
devreal Nov 15, 2023
e9271ff
Add a LEVEL_ZERO implementation for the GEMM kernel in spmm_cuda.cc
therault Nov 16, 2023
e20758a
Add level zero support and refactor device naming
devreal Nov 16, 2023
af6f684
Merge branch 'ttg-device-support-master-coro-with-stream-tasks' of gi…
devreal Nov 16, 2023
97d0bf6
Attempt at cmakery for level-zero
therault Nov 16, 2023
c2da1f5
Merge branch 'ttg-device-support-master-coro-with-stream-tasks' of gi…
therault Nov 16, 2023
6af1828
Missing declaration for level zero cmakery
therault Nov 16, 2023
0be1212
Agnostic async copy and non-device support
devreal Nov 16, 2023
a119828
Add missing parsec backend header file to CMake
devreal Nov 16, 2023
4f125ad
Add MKL dependency to spmm-lz
therault Nov 16, 2023
67434d1
Merge branch 'ttg-device-support-master-coro-with-stream-tasks' of gi…
therault Nov 16, 2023
03b54bb
Work on device_gpu_module instead of device_module
devreal Nov 16, 2023
a9e5959
Merge branch 'ttg-device-support-master-coro-with-stream-tasks' of gi…
devreal Nov 16, 2023
dde416c
Get sycl queue in spmm
devreal Nov 16, 2023
5569b57
Add parsec level-zero include
devreal Nov 16, 2023
86b92db
Return sycl queue as non-const
devreal Nov 16, 2023
5aac5c2
Use the right parsec device module types
devreal Nov 16, 2023
afeda46
Add inline in device.h
devreal Nov 16, 2023
d1fb80f
PaRSEC: Add have_level_zero_op()
devreal Nov 16, 2023
bfa3fde
Add missing have_level_zero_op_non_type_t
devreal Nov 16, 2023
68e4825
Fix bug in level zero detection of derived TT
devreal Nov 16, 2023
9afb6f8
Don't ever assume that 2 is the first device
devreal Nov 16, 2023
961b0d3
[cmake] cleanup FindOrFetchTiledArray.cmake
evaleev Nov 20, 2023
78876e5
[cmake] bump TA tag to pull in https://github.com/ValeevGroup/tiledar…
evaleev Nov 20, 2023
cf757ee
Proper use of device ID conversion functions
devreal Nov 27, 2023
d660b07
Bump parsec commit to rebased fork
devreal Nov 27, 2023
0ab8f47
There is no allocators for level zero in TiledArray yet
devreal Nov 27, 2023
fb5f111
Fix misplaced assert
devreal Nov 27, 2023
383d36d
ze: add the missing link-libraries to spmm
abouteiller Nov 27, 2023
f4b482a
Use std::size_t in parsec matrix wrapper
devreal Nov 27, 2023
0b1325e
MatrixTile: use std::size_t consistently
devreal Nov 27, 2023
1f3619a
Fun with signed integers in parsec matrix wrapper
devreal Nov 27, 2023
81c25e3
Fix is_dummy name collision in parsec_ttg_task_base_t
devreal Nov 29, 2023
bf821a5
Bump PaRSEC version to get first set of eviction fixes
devreal Dec 4, 2023
6727faf
task dummy() -> is_dummy()
devreal Dec 5, 2023
049297e
Fix usage of wrong device ID in buffer
devreal Dec 5, 2023
8146481
Bump PaRSEC tag for more memory oversubscription fixes
devreal Dec 5, 2023
b9727a4
Map blas/solver handles to streams, not threads and streams
devreal Dec 13, 2023
603af4e
Naming convetions: rename ttg::Buffer
devreal Dec 18, 2023
9afe0c0
Remove legacy device implementation files
devreal Dec 18, 2023
f424bcd
Put all device functions into ttg::device namespace
devreal Dec 18, 2023
3bf1e16
Add device-related fwd-decl to madness backend
devreal Dec 19, 2023
913f792
Include fwd.h instead of impl_selector.h
devreal Dec 20, 2023
7c6044a
Bump catch2 version to latest release
devreal Dec 20, 2023
ed055fb
Fix clang compile issues
devreal Dec 20, 2023
0c7d291
Forward keyT in make_tt overload
devreal Dec 20, 2023
b486a2f
Bump github CI Ubuntu to 22.04 to get boost 1.74 needed for TA
devreal Dec 20, 2023
555ae2f
Add stub implementations of ttvalue and buffer for madness
devreal Dec 20, 2023
5cf2502
More fixes to ttvalue and ptr
devreal Dec 20, 2023
220c038
More catch2 include fixes
devreal Dec 20, 2023
e9b3233
Fix ttvalue and buffer in parsec
devreal Dec 20, 2023
65ffd4c
Add derived_has_*_op() to madness backend
devreal Dec 20, 2023
e7bba1b
Allow for non-coro device tasks
devreal Dec 20, 2023
b653b9b
Re-enbale disabled unit tests
devreal Dec 20, 2023
82a7b2a
Enable madness serialization for madness backend buffer
devreal Dec 21, 2023
0d64297
Only build device_coro test if device support is detected
devreal Dec 21, 2023
5f876e3
Fix possible use-after-free in TT::release
devreal Dec 21, 2023
6f645df
Use key directly in task_snprintf instead of locals
devreal Dec 21, 2023
ba6f7a1
Remove legacy devicemock example
devreal Dec 21, 2023
c899f07
Fix device chain benchmark
devreal Dec 21, 2023
bd454dd
Comment preliminary mra device code pieces
devreal Dec 21, 2023
ec67c6b
Gracefully handle missing TA allocators
devreal Dec 21, 2023
8ccab44
Use stock doxygen for relase test
devreal Dec 21, 2023
beed36a
Fix const handling in reducer tasks
devreal Dec 21, 2023
f30f901
Only enable cuda op if cuda was detected
devreal Dec 21, 2023
ece8cc3
Fix streams unit test
devreal Dec 22, 2023
db4f736
Make sure new reducer target copies are marked mutable
devreal Dec 22, 2023
4547d8c
Add ttg/ttvalue.h to cmake
devreal Dec 22, 2023
f9900fe
Reset readers on reduced copy
devreal Dec 22, 2023
d42cb92
Fix keystr string name
devreal Jan 19, 2024
7b9c248
PaRSEC task: copy task locals to comply with profiling reqs
devreal Jan 29, 2024
641c81f
SPMM: Make writing C back into Eigen optional
devreal Jan 29, 2024
d584e06
SPMM: distribute A/B over all procs and throttle broadcasts
devreal Jan 29, 2024
a85c2c4
btas::Tensor: do not fill with zeroes when constructing (e.g., in zer…
evaleev Jan 30, 2024
7a5e2a5
Merge branch 'ttg-device-support-master-coro-with-stream-tasks' of gi…
devreal Jan 30, 2024
1f2a4fb
[cmake] bump MADNESS, BTAS, and TA tags to sync with latest; MADNESS …
evaleev Jan 31, 2024
1125f96
fixup
evaleev Jan 31, 2024
549ba94
introduce initialize_fpe to turn on/off floating-point exceptions
evaleev Jan 31, 2024
f05530f
can use bundled Boost.CallableTraits only if don't have real Boost an…
evaleev Feb 1, 2024
9765f75
bump MADNESS tag to fix up CI issues
evaleev Feb 1, 2024
b70444c
[ci] install libboost-random-dev to help TA build itself
evaleev Feb 1, 2024
1ffaf6a
implemented Read{A,B} that read A and B more intelligently
evaleev Feb 2, 2024
41cc814
Bump PaRSEC tag
devreal Feb 2, 2024
8993e49
Implement support for inlining of objects with SplitMetadata
devreal Feb 2, 2024
becdbf4
SPMM: Allow changing the work distribution
devreal Feb 2, 2024
3dad2c1
Read{A,B} can use MultipleK schedule
evaleev Feb 2, 2024
4416f3b
Enforce strict ordering of broadcasts in column/row
devreal Feb 2, 2024
8fd7be9
Merge branch 'ttg-device-support-master-coro-with-stream-tasks' of gi…
devreal Feb 2, 2024
59779c0
Bump PaRSEC sha
devreal Feb 2, 2024
94b2b75
Bump PaRSEC version for MPI backend instrumentation
devreal Feb 14, 2024
c06654f
Add missing <numeric> header for CI
devreal Feb 23, 2024
b8c2cc8
Order broadcast keys so that we send to myrank+1 first
devreal Feb 23, 2024
67b0fb3
Reverted ReadA/B additions
evaleev Feb 23, 2024
b741e78
[cleanup] Design-Device.md
evaleev Feb 23, 2024
ed270da
[dox] new vars config.in.h [skip ci]
evaleev Feb 23, 2024
12d1c1f
[dox] stub for device::Device
evaleev Feb 23, 2024
27c03f7
[dox] stub for ttg_madness::Buffer
evaleev Feb 23, 2024
fcbd2e8
[cleanup] device/task.h
evaleev Feb 23, 2024
9cd39b8
unit test informational logging interferes with catch_discover_tests,…
evaleev Feb 24, 2024
8104b49
pass required Boost version to FindOrFetchBoost
evaleev Feb 24, 2024
1826821
HIP LANGUAGE needs CMake 3.21 or later
evaleev Feb 24, 2024
56ed942
update INSTALL.md
evaleev Feb 24, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 7 additions & 25 deletions .github/workflows/cmake.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,11 +10,11 @@ jobs:
fail-fast: false
matrix:
build_type : [ Release, Debug ]
os : [ macos-latest, ubuntu-20.04 ]
os : [ macos-latest, ubuntu-22.04 ]
include:
- os: ubuntu-20.04
cc: /usr/bin/gcc-10
cxx: /usr/bin/g++-10
- os: ubuntu-22.04
cc: /usr/bin/gcc-12
cxx: /usr/bin/g++-12
- os: macos-latest
cc: clang
cxx: clang++
Expand Down Expand Up @@ -48,37 +48,19 @@ jobs:
run: brew install ninja gcc@10 boost eigen open-mpi bison ccache

- name: Install prerequisites Ubuntu packages
if: ${{ matrix.os == 'ubuntu-20.04' }}
if: ${{ matrix.os == 'ubuntu-22.04' }}
run: |
wget -O - https://apt.kitware.com/keys/kitware-archive-latest.asc 2>/dev/null | gpg --dearmor - | sudo tee /etc/apt/trusted.gpg.d/kitware.gpg >/dev/null
sudo apt-add-repository "deb https://apt.kitware.com/ubuntu/ $(lsb_release -cs) main"
sudo apt-get update
sudo apt-get -y install ninja-build g++-10 liblapack-dev libboost-dev libboost-serialization-dev libeigen3-dev openmpi-bin libopenmpi-dev libtbb-dev ccache flex bison cmake
sudo apt-get -y install ninja-build g++-12 liblapack-dev libboost-dev libboost-serialization-dev libboost-random-dev libeigen3-dev openmpi-bin libopenmpi-dev libtbb-dev ccache flex bison cmake doxygen

- name: Create Build Environment
# Some projects don't allow in-source building, so create a separate build directory
# We'll use this as our working directory for all subsequent commands
run: |
cmake -E make_directory ${{github.workspace}}/build


- name: Install doxygen for Release test
if: ${{ matrix.os == 'ubuntu-20.04' }}
run: |
if [ "${{matrix.build_type}}" = "Release" ]; then
sudo apt-get -y install libclang1-9 libclang-cpp9 graphviz fonts-liberation
cd ${{github.workspace}}/build
# If we fail getting doxygen-${DOXYGEN_VERSION}.linux.bin.tar.gz from sourceforge,
# use EFV's gdrive mirror of 1.9.2 to work around the unreliable sourceforge
# the sharing link: https://drive.google.com/file/d/16GXpH4YOEUxGXQrXOKdAIibhdfzATY0d/view?usp=sharing
wget https://downloads.sourceforge.net/project/doxygen/rel-${DOXYGEN_VERSION}/doxygen-${DOXYGEN_VERSION}.linux.bin.tar.gz || wget -4 --no-check-certificate -O doxygen-${DOXYGEN_VERSION}.linux.bin.tar.gz "https://drive.google.com/uc?export=download&id=16GXpH4YOEUxGXQrXOKdAIibhdfzATY0d"
tar xzf ./doxygen-${DOXYGEN_VERSION}.linux.bin.tar.gz
export DOXYGEN_DIR=${{github.workspace}}/build/doxygen-${DOXYGEN_VERSION}
${DOXYGEN_DIR}/bin/doxygen --version
# doxygen should be in PATH in subsequent steps
echo "${DOXYGEN_DIR}/bin" >> $GITHUB_PATH
fi

- name: Prepare ccache timestamp
id: ccache_cache_timestamp
shell: cmake -P {0}
Expand Down Expand Up @@ -144,7 +126,7 @@ jobs:
cmake --build test_install_userexamples/build

- name: Build+Deploy Dox
if: ${{ matrix.os == 'ubuntu-20.04' && matrix.build_type == 'Release' && github.ref == 'refs/heads/master' }}
if: ${{ matrix.os == 'ubuntu-22.04' && matrix.build_type == 'Release' && github.ref == 'refs/heads/master' }}
working-directory: ${{github.workspace}}/build
shell: bash
run: |
Expand Down
75 changes: 73 additions & 2 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,8 @@ project(ttg

if (NOT DEFINED CMAKE_CXX_STANDARD)
set(CMAKE_CXX_STANDARD "20" CACHE STRING "The C++ standard")
elseif (${CMAKE_CXX_STANDARD} LESS 20)
message(FATAL_ERROR "TTG requires C++ compiler with C++20, but CMAKE_CXX_STANDARD is set to ${CMAKE_CXX_STANDARD}; bump up CMAKE_CXX_STANDARD to 20 or above")
endif()
if (NOT CMAKE_CXX_EXTENSIONS)
set(CMAKE_CXX_EXTENSIONS OFF CACHE BOOL "The C++ standard extensions allowed?")
Expand All @@ -52,10 +54,13 @@ set(CMAKE_INSTALL_CMAKEDIR "lib/cmake/ttg"
########################################
option(TTG_PARSEC_USE_BOOST_SERIALIZATION "Whether to select Boost serialization methods in PaRSEC backend" ON)
option(TTG_ENABLE_CUDA "Whether to TTG will look for CUDA" OFF)
option(TTG_ENABLE_HIP "Whether to TTG will look for HIP" OFF)
option(TTG_ENABLE_LEVEL_ZERO "Whether to TTG will look for Intel oneAPI Level Zero" OFF)
option(TTG_EXAMPLES "Whether to build examples" OFF)
option(TTG_ENABLE_ASAN "Whether to enable address sanitizer" OFF)

option(TTG_FETCH_BOOST "Whether to fetch+build Boost, if missing" OFF)
option(TTG_IGNORE_BUNDLED_EXTERNALS "Whether to skip installation and use of bundled external depenedencies (Boost.CallableTraits)" OFF)
option(TTG_IGNORE_BUNDLED_EXTERNALS "Whether to skip installation and use of bundled external dependencies (Boost.CallableTraits)" OFF)
option(TTG_ENABLE_TRACE "Whether to enable ttg::trace() output" OFF)
# See https://medium.com/@alasher/colored-c-compiler-output-with-ninja-clang-gcc-10bfe7f2b949
option (FORCE_COLORED_OUTPUT "Always produce ANSI-colored output (GNU/Clang only)." TRUE)
Expand All @@ -67,6 +72,12 @@ if (FORCE_COLORED_OUTPUT)
endif ()
endif (FORCE_COLORED_OUTPUT)

if (TTG_ENABLE_ASAN)
add_compile_options(-fsanitize=address)
add_link_options(-fsanitize=address)
endif (TTG_ENABLE_ASAN)

set(TTG_HIP_PLATFORM "__HIP_PLATFORM_AMD__" CACHE STRING "Which platform to use when compiling HIP-related code (default: __HIP_PLATFORM_AMD__)")
##########################
#### prerequisites
##########################
Expand All @@ -82,7 +93,7 @@ endif (BUILD_TESTING)
#### optional prerequisites
###########################
# Boost
include(FindOrFetchBoost)
include("${PROJECT_SOURCE_DIR}/cmake/modules/FindOrFetchBoost.cmake")
# Cereal
#include(FindOrFetchCereal)
# C++ coroutines
Expand All @@ -104,7 +115,53 @@ if (TTG_ENABLE_CUDA)
if (TARGET CUDA::cudart)
set(TTG_HAVE_CUDART True CACHE BOOL "TTG supports execution on CUDA devices")
endif()
endif(TTG_ENABLE_CUDA)

if (TTG_ENABLE_HIP)
# HIP LANGUAGE introduced in 3.21
cmake_minimum_required(VERSION 3.21)
include(CheckLanguage)
check_language(HIP)
if(CMAKE_HIP_COMPILER)
enable_language(HIP)
endif(CMAKE_HIP_COMPILER)
set(TTG_HAVE_HIP ${CMAKE_HIP_COMPILER} CACHE BOOL "True if TTG supports compiling .hip files")

find_package(hipblas)
if (TARGET roc::hipblas)
set(TTG_HAVE_HIPBLAS True CACHE BOOL "TTG detected support for hipBLAS")
endif()

find_package(hipsolver)
if (TARGET roc::hipsolver)
set(TTG_HAVE_HIPSOLVER True CACHE BOOL "TTG detected support for hipSolver")
endif()
add_compile_definitions(${TTG_HIP_PLATFORM})
endif(TTG_ENABLE_HIP)

if (TTG_ENABLE_LEVEL_ZERO)
find_package(level-zero)
set(TTG_HAVE_LEVEL_ZERO ${LEVEL_ZERO_FOUND} CACHE BOOL "True if TTG provide support for Intel Level Zero")
if(TTG_HAVE_LEVEL_ZERO)
include_directories("${LEVEL_ZERO_INCLUDE_DIR}/level_zero/")
find_package(DPCPP)
if(DPCPP_EXECUTABLE)
set(TTG_HAVE_DPCPP TRUE CACHE BOOL "True if TTG knows how to compile DPCPP code")
message(STATUS "Found Intel level-zero ${LEVEL_ZERO_VERSION} in -I${LEVEL_ZERO_INCLUDE_DIR} / -L${LEVEL_ZERO_LIBRARY_DIR}")
message(STATUS "Found dpcpp in ${DPCPP_EXECUTABLE}")

find_package(MKL)
else(DPCPP_EXECUTABLE)
set(TTG_HAVE_DPCPP FALSE CACHE BOOL "True if TTG knows how to compile DPCPP code")
endif(DPCPP_EXECUTABLE)
endif(TTG_HAVE_LEVEL_ZERO)
endif(TTG_ENABLE_LEVEL_ZERO)

set(_ttg_have_device FALSE)
if (TTG_HAVE_CUDA OR TTG_HAVE_HIP OR TTG_HAVE_LEVEL_ZERO)
set(_ttg_have_device TRUE)
endif()
set(TTG_HAVE_DEVICE ${_ttg_have_device} CACHE BOOL "True if TTG has support for any device programming model")

##########################
#### prerequisite runtimes
Expand All @@ -120,6 +177,20 @@ if (TARGET MADworld)
message(STATUS "MADNESS_FOUND=1")
endif(TARGET MADworld)

####################################################
#### Check for MPIX_Query_[cuda|rocm]_support
#### Open MPI provides mpi-ext.h for such extensions
#### so check for that first.
####################################################
find_package(MPI)
set(TTG_HAVE_MPI MPI_FOUND)
if (MPI_FOUND)
include(CheckIncludeFiles)
set(CMAKE_REQUIRED_INCLUDES ${MPI_C_INCLUDE_DIRS})
check_include_files("mpi-ext.h" TTG_HAVE_MPIEXT)
else(MPI_FOUND)
set(TTG_HAVE_MPIEXT $<BOOL:false>)
endif(MPI_FOUND)

##########################
#### Examples
Expand Down
45 changes: 26 additions & 19 deletions INSTALL.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,12 +13,12 @@ $ cmake --build ttg/build --target install
TTG is usable only on POSIX systems.

## mandatory prerequisites
- [CMake](https://cmake.org/), version 3.14 or higher
- C++ compiler with support for the [C++17 standard](http://www.iso.org/standard/68564.html), or a more recent standard. This includes the following compilers:
- [GNU C++](https://gcc.gnu.org/), version 7.0 or higher
- [Clang](https://clang.llvm.org/), version 5 or higher
- [Apple Clang](https://en.wikipedia.org/wiki/Xcode), version 9.3 or higher
- [Intel C++ compiler](https://software.intel.com/en-us/c-compilers), version 19 or higher
- [CMake](https://cmake.org/), version 3.14 or higher; version 3.21 or higher is required to support execution on HIP/ROCm-capable devices.
- C++ compiler with support for the [C++20 standard](http://www.iso.org/standard/68564.html), or a more recent standard. This includes the following compilers:
- [GNU C++](https://gcc.gnu.org/), version 10.0 or higher; GCC is the only compiler that can be used for accelerator programming.
- [Clang](https://clang.llvm.org/), version 10 or higher
- [Apple Clang](https://en.wikipedia.org/wiki/Xcode), version 10.0 or higher
- [Intel C++ compiler](https://software.intel.com/en-us/c-compilers), version 2021.1 or higher
- one or more of the following runtimes:
- [PaRSEC](https://bitbucket.org/icldistcomp/parsec): this distributed-memory runtime is the primary runtime intended for high-performance implementation of TTG
- [MADNESS](https://github.org/m-a-d-n-e-s-s/madness): this distributed-memory runtime is to be used primarily for developmental purposes
Expand All @@ -27,12 +27,15 @@ While the list of prerequisites is short, note that the runtimes have many more
Also: it is _strongly_ recommended that the runtimes are built as parts of the TTG build process (this requires some of the optional prerequisites, listed below). This will make sure that the correct versions of the runtimes are used.

## optional prerequisites
- [Git](https://git-scm.com) 1.8 or later: needed to obtain the source code for PaRSEC or MADNESS runtimes
- [Boost](https://boost.org/) version 1.66 or later: needed to use TTG with classes serializable by the [Boost.Serialization](https://www.boost.org/doc/libs/master/libs/serialization/doc/index.html) library.
- The [Boost.Serialization](https://www.boost.org/doc/libs/master/libs/serialization/doc/index.html) library is not header-only, i.e., it must be compiled.
- If the Boost package is not detected TTG can download and build Boost as part of its build process; to do that configure TTG with the CMake cache variable `TTG_FETCH_BOOST` set to `ON` (e.g., by adding `-DTTG_FETCH_BOOST=ON` to the CMake executable command line)
- *Note to package maintainers*: TTG also requires Boost.CallableTraits; if Boost is not found or built, TTG installs and uses a bundled copy of Boost.CallableTraits. To avoid the installation and use of the bundled Boost.CallableTraits configure TTG with the CMake cache variable `TTG_IGNORE_BUNDLED_EXTERNALS` set to `ON`.
- ([Doxygen](http://www.doxygen.nl/), version 1.8.12 or later: needed for building documentation
- [Git](https://git-scm.com): needed to obtain the source code for any prerequisite built from source code as part of TTG, such as PaRSEC or MADNESS runtimes
- [Boost](https://boost.org/) version 1.81 or later. If the Boost package is not detected TTG can download and build Boost as part of its build process, but this is NOT recommended, you should obtain Boost via the system or third-party package manager. Experts may try to build Boost from source as part of TTG by configuring it with the CMake cache variable `TTG_FETCH_BOOST` set to `ON` (e.g., by adding `-DTTG_FETCH_BOOST=ON` to the CMake executable command line). The following primary Boost libraries/modules (and their transitive dependents) are used:
- (required) [Boost.CallableTraits](): used to introspect generic callables given to `make_tt`. P.S. TTG has a bundled copy of `Boost.CallableTraits` which is used and installed if Boost is not found or built from source. To avoid the installation and use of the bundled Boost.CallableTraits configure TTG with the CMake cache variable `TTG_IGNORE_BUNDLED_EXTERNALS` set to `ON`.
- (optional) [Boost.Serialization](https://www.boost.org/doc/libs/master/libs/serialization/doc/index.html): needed to use TTG with classes serializable by the [Boost.Serialization](https://www.boost.org/doc/libs/master/libs/serialization/doc/index.html) library. Note that `Boost.Serialization` is not header-only, i.e., it must be compiled. This is only required if TTG is configured with CMake cache variable `TTG_PARSEC_USE_BOOST_SERIALIZATION` set to `ON`.
- ([Doxygen](http://www.doxygen.nl/), version 1.8.12 or later: needed for building documentation.
- for execution on GPGPUs and other accelerators, the following are required:
- [CUDA compiler and runtime](https://developer.nvidia.com/cuda-zone) -- for execution on NVIDIA's CUDA-enabled accelerators. CUDA 11 or later is required.
- [HIP/ROCm compiler and runtime](https://developer.nvidia.com/cuda-zone) -- for execution on AMD's ROCm-enabled accelerators.
- [oneAPI DPC++/SYCL/LevelZero compiler and runtime](https://developer.nvidia.com/cuda-zone) -- for execution on Intel accelerators.

## transitive prerequisites

Expand Down Expand Up @@ -60,10 +63,14 @@ TTG includes several examples that may require additional prerequisites. These a

## useful cmake cache variables:

| Variable |Default | Description |
|--------------------------------|--------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `BUILD_TESTING` | `ON` | whether target `check-ttg` and its relatives will actually build and run unit tests |
| `TTG_EXAMPLES` | `OFF` | whether target `check-ttg` and its relatives will actually build and run examples; setting this to `ON` will cause detection of several optional prerequisites, and (if missing) building from source |
| `TTG_ENABLE_TRACE` | `OFF` | setting this to `ON` will enable the ability to instrument TTG code for tracing (see `ttg::trace()`, etc.); if this is set to `OFF`, `ttg::trace()` is a no-op |
| `TTG_FETCH_BOOST` | `OFF` | whether to download and build Boost automatically, if missing |
| `TTG_IGNORE_BUNDLED_EXTERNALS` | `OFF` | whether to install and use bundled external dependencies (currently, only Boost.CallableTraits) |
| Variable |Default | Description |
|--------------------------------------|--------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `TTG_ENABLE_CUDA` | `OFF` | whether to enable CUDA device support |
| `TTG_ENABLE_HIP` | `OFF` | whether to enable HIP/ROCm device support |
| `TTG_ENABLE_LEVEL_ZERO` | `OFF` | whether to enable Intel oneAPI Level Zero device support |
| `BUILD_TESTING` | `ON` | whether target `check-ttg` and its relatives will actually build and run unit tests |
| `TTG_EXAMPLES` | `OFF` | whether target `check-ttg` and its relatives will actually build and run examples; setting this to `ON` will cause detection of several optional prerequisites, and (if missing) building from source |
| `TTG_ENABLE_TRACE` | `OFF` | setting this to `ON` will enable the ability to instrument TTG code for tracing (see `ttg::trace()`, etc.); if this is set to `OFF`, `ttg::trace()` is a no-op |
| `TTG_PARSEC_USE_BOOST_SERIALIZATION` | `OFF` | whether to use Boost.Serialization for serialization for the PaRSEC backend; if this is set to `OFF`, PaRSEC backend will only be able to use trivially-copyable data types or, if MADNESS backend is available, MADNESS-serializable types. |
| `TTG_FETCH_BOOST` | `OFF` | whether to download and build Boost automatically, if missing |
| `TTG_IGNORE_BUNDLED_EXTERNALS` | `OFF` | whether to install and use bundled external dependencies (currently, only Boost.CallableTraits) |
16 changes: 9 additions & 7 deletions cmake/modules/ExternalDependenciesVersions.cmake
Original file line number Diff line number Diff line change
@@ -1,11 +1,13 @@
# for each dependency track both current and previous id (the variable for the latter must contain PREVIOUS)
# to be able to auto-update them

# need Boost.CallableTraits (header only, part of Boost 1.66 released in Dec 2017) for wrap.h to work
set(TTG_TRACKED_BOOST_VERSION 1.66)
set(TTG_TRACKED_CATCH2_VERSION 2.13.1)
set(TTG_TRACKED_VG_CMAKE_KIT_TAG 7ea2d4d3f8854b9e417f297fd74d6fc49aa13fd5) # used to provide "real" FindOrFetchBoost
set(TTG_TRACKED_CATCH2_VERSION 3.5.0)
set(TTG_TRACKED_CEREAL_VERSION 1.3.0)
set(TTG_TRACKED_MADNESS_TAG 31d803325623de75371774feffb0270c796bea24)
set(TTG_TRACKED_PARSEC_TAG 9fc74b6f165605a133125d8a5b62cf55642c1907)
set(TTG_TRACKED_BTAS_TAG d73153ad9bc41a177e441ef04eceff7fab0c766d)
set(TTG_TRACKED_TILEDARRAY_TAG f6f5039b54ee5a1180106351cac05e0d33e14c62)
set(TTG_TRACKED_MADNESS_TAG 2eb3bcf0138127ee2dbc651f1aabd3e9b0def4e3)
set(TTG_TRACKED_PARSEC_TAG 0b3140f58ad9dc78a3d64da9fd73ecc7f443ece7)
set(TTG_TRACKED_BTAS_TAG 4e8f5233aa7881dccdfcc37ce07128833926d3c2)
set(TTG_TRACKED_TILEDARRAY_TAG 493c109379a1b64ddd5ef59f7e33b95633b68d73)

# need Boost.CallableTraits (header only, part of Boost 1.66 released in Dec 2017) for wrap.h to work
set(TTG_OLDEST_BOOST_VERSION 1.66)
Loading
Loading