From 4bea5bd72a725110b0eb831338aef5d3f7f0e7ea Mon Sep 17 00:00:00 2001 From: Terry Cojean Date: Mon, 6 Nov 2023 18:17:59 -0500 Subject: [PATCH 1/6] Add initial 1.7.0 changelog --- CHANGELOG.md | 100 +++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 100 insertions(+) diff --git a/CHANGELOG.md b/CHANGELOG.md index af4d3c06bb3..8e601c368c6 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -11,6 +11,106 @@ git log --first-parent Please visit our wiki [Changelog](https://github.com/ginkgo-project/ginkgo/wiki/Changelog) for unreleased changes. +## Version 1.7.0 + +The Ginkgo team is proud to announce the new Ginkgo minor release 1.7.0. This release brings new features such as: +- Complete GPU-resident sparse direct solvers feature set and interfaces, +- Sparse batched iterative kernels support for Dense and ELL matrix types and the BiCGSTAB solver, +- A new MC64 reordering, +- MPI support for the SYCL backend, +- Improved Cholesky factorization performance, +- Improved ParILU(T)/ParIC(T) preconditioner convergence, +and more! + +If you face an issue, please first check our [known issues page](https://github.com/ginkgo-project/ginkgo/wiki/Known-Issues) and the [open issues list](https://github.com/ginkgo-project/ginkgo/issues) and if you do not find a solution, feel free to [open a new issue](https://github.com/ginkgo-project/ginkgo/issues/new/choose) or ask a question using the [github discussions](https://github.com/ginkgo-project/ginkgo/discussions). + +Supported systems and requirements: ++ For all platforms, CMake 3.16+ ++ C++14 compliant compiler ++ Linux and macOS + + GCC: 5.5+ + + clang: 3.9+ + + Intel compiler: 2019+ + + Apple Clang: 14.0 is tested. Earlier versions might also work. + + NVHPC: 22.7+ + + Cray Compiler: 14.0.1+ + + CUDA module: CMake 3.18+, and CUDA 10.0+ or NVHPC 22.7+ + + HIP module: ROCm 4.5+ + + DPC++ module: Intel OneAPI 2021.3+ with oneMKL and oneDPL. Set the CXX compiler to `dpcpp` or `icpx`. + + MPI: version 3.1+, ideally with GPUDirect support for best performance ++ Windows + + MinGW: GCC 5.5+ + + Microsoft Visual Studio: VS 2019+ + + CUDA module: CUDA 10.0+, Microsoft Visual Studio + + OpenMP module: MinGW. + +### Version support changes + ++ CUDA 9.2 is no longer supported [#1382](https://github.com/ginkgo-project/ginkgo/pull/1382) ++ Ginkgo now requires CMake version 3.16 (and 3.18 for CUDA) [#1368](https://github.com/ginkgo-project/ginkgo/pull/1368) + +### Interface changes + ++ `const` Factory parameters can no longer be modified through `with_*` functions, as this breaks const-correctness [#1336](https://github.com/ginkgo-project/ginkgo/pull/1336) [#1439](https://github.com/ginkgo-project/ginkgo/pull/1439) + +### Deprecations + ++ The `device_reset` parameter of CUDA and HIP executors no longer has an effect, and its `allocation_mode` parameters have been deprecated in favor of the `Allocator` interface [#1315](https://github.com/ginkgo-project/ginkgo/pull/1315) ++ The CMake parameter `GINKGO_BUILD_DPCPP` has been deprecated in favor of `GINKGO_BUILD_SYCL`. [#1350](https://github.com/ginkgo-project/ginkgo/pull/1350) ++ The `gko::reorder::Rcm` interface has been deprecated in favor of `gko::experimental::reorder::Rcm` based on `Permutation` [#1418](https://github.com/ginkgo-project/ginkgo/pull/1418) + +### Added features + ++ Adds a batch::BatchLinOp class that forms a base class for batched linear operators such as batched matrix formats, solver and preconditioners [#1379](https://github.com/ginkgo-project/ginkgo/pull/1379) ++ Adds a batch::MultiVector class that enables operations such as dot, norm, scale on batched vectors [#1371](https://github.com/ginkgo-project/ginkgo/pull/1371) ++ Adds a batch::Dense matrix format that stores batched dense matrices and provides gemv operations for these dense matrices. [#1413](https://github.com/ginkgo-project/ginkgo/pull/1413) ++ Adds a batch::Ell matrix format that stores batched Ell matrices and provides spmv operations for these batched Ell matrices. [#1416](https://github.com/ginkgo-project/ginkgo/pull/1416) [#1437](https://github.com/ginkgo-project/ginkgo/pull/1437) ++ Add a batch::Bicgstab solver (class, core, and reference kernels) that enables iterative solution of batched linear systems [#1438](https://github.com/ginkgo-project/ginkgo/pull/1438). ++ Add device kernels (CUDA, HIP, and DPCPP) for batch::Bicgstab solver. [#1443](https://github.com/ginkgo-project/ginkgo/pull/1443). ++ New MC64 reordering algorithm which optimizes the diagonal product or sum of a matrix by permuting the rows, and computes additional scaling factors for equilibriation [#1120](https://github.com/ginkgo-project/ginkgo/pull/1120) ++ New interface for (non-symmetric) permutation and scaled permutation of Dense and Csr matrices [#1415](https://github.com/ginkgo-project/ginkgo/pull/1415) ++ LU and Cholesky Factorizations can now be separated into their factors [#1432](https://github.com/ginkgo-project/ginkgo/pull/1432) ++ New symbolic LU factorization algorithm that is optimized for matrices with an almost-symmetric sparsity pattern [#1445](https://github.com/ginkgo-project/ginkgo/pull/1445) ++ Sorting kernels for SparsityCsr on all backends [#1343](https://github.com/ginkgo-project/ginkgo/pull/1343) ++ Allow passing pre-generated local solver as factory parameter for the distributed Schwarz preconditioner [#1426](https://github.com/ginkgo-project/ginkgo/pull/1426) ++ Add DPCPP kernels for Partition [#1034](https://github.com/ginkgo-project/ginkgo/pull/1034), and CSR's `check_diagonal_entries` and `add_scaled_identity` functionality [#1436](https://github.com/ginkgo-project/ginkgo/pull/1436) ++ Adds a helper function to create a partition based on either local sizes, or local ranges [#1227](https://github.com/ginkgo-project/ginkgo/pull/1227) ++ Add function to compute arithmetic mean of dense and distributed vectors [#1275](https://github.com/ginkgo-project/ginkgo/pull/1275) ++ Adds `icpx` compiler supports [#1350](https://github.com/ginkgo-project/ginkgo/pull/1350) ++ All backends can be built simultaneously [#1333](https://github.com/ginkgo-project/ginkgo/pull/1333) ++ Emits a CMake warning in downstream projects that use different compilers than the installed Ginkgo [#1372](https://github.com/ginkgo-project/ginkgo/pull/1372) ++ Reordering algorithms in sparse_blas benchmark [#1354](https://github.com/ginkgo-project/ginkgo/pull/1354) ++ Benchmarks gained an `-allocator` parameter to specify device allocators [#1385](https://github.com/ginkgo-project/ginkgo/pull/1385) ++ Benchmarks gained an `-input_matrix` parameter that initializes the input JSON based on the filename [#1387](https://github.com/ginkgo-project/ginkgo/pull/1387) ++ Benchmark inputs can now be reordered as a preprocessing step [#1408](https://github.com/ginkgo-project/ginkgo/pull/1408) + + +### Improvements + ++ Significantly improve Cholesky factorization performance [#1366](https://github.com/ginkgo-project/ginkgo/pull/1366) ++ Improve parallel build performance [#1378](https://github.com/ginkgo-project/ginkgo/pull/1378) ++ Allow constrained parallel test execution using CTest resources [#1373](https://github.com/ginkgo-project/ginkgo/pull/1373) ++ Use arithmetic type more inside mixed precision ELL [#1414](https://github.com/ginkgo-project/ginkgo/pull/1414) ++ Most factory parameters of factory type no longer need to be constructed explicitly via `.on(exec)` [#1336](https://github.com/ginkgo-project/ginkgo/pull/1336) [#1439](https://github.com/ginkgo-project/ginkgo/pull/1439) ++ Improve ParILU(T)/ParIC(T) convergence by using more appropriate atomic operations [#1434](https://github.com/ginkgo-project/ginkgo/pull/1434) + +### Fixes + ++ Fix an over-allocation for OpenMP reductions [#1369](https://github.com/ginkgo-project/ginkgo/pull/1369) ++ Fix DPCPP's common-kernel reduction for empty input sizes [#1362](https://github.com/ginkgo-project/ginkgo/pull/1362) ++ Fix several typos in the API and documentation [#1348](https://github.com/ginkgo-project/ginkgo/pull/1348) ++ Fix inconsistent `Threads` between generations [#1388](https://github.com/ginkgo-project/ginkgo/pull/1388) ++ Fix benchmark median condition [#1398](https://github.com/ginkgo-project/ginkgo/pull/1398) ++ Fix HIP 5.6.0 compilation [#1411](https://github.com/ginkgo-project/ginkgo/pull/1411) ++ Fix missing destruction of rand_generator from cuda/hip [#1417](https://github.com/ginkgo-project/ginkgo/pull/1417) ++ Fix PAPI logger destruction order [#1419](https://github.com/ginkgo-project/ginkgo/pull/1419) ++ Fix TAU logger compilation [#1422](https://github.com/ginkgo-project/ginkgo/pull/1422) ++ Fix relative criterion to not iterate if the residual is already zero [#1079](https://github.com/ginkgo-project/ginkgo/pull/1079) ++ Fix memory_order invocations with C++20 changes [#1402](https://github.com/ginkgo-project/ginkgo/pull/1402) ++ Fix `check_diagonal_entries_exist` report correctly when only missing diagonal value in the last rows. [#1440](https://github.com/ginkgo-project/ginkgo/pull/1440) ++ Fix checking OpenMPI version in cross-compilation settings [#1446](https://github.com/ginkgo-project/ginkgo/pull/1446) ++ Fix false-positive deprecation warnings in Ginkgo, especially for the old Rcm (it doesn't emit deprecation warnings anymore as a result but is still considered deprecated) [#1444](https://github.com/ginkgo-project/ginkgo/pull/1444) + ## Version 1.6.0 The Ginkgo team is proud to announce the new Ginkgo minor release 1.6.0. This release brings new features such as: From ada87238de6b86899bc139f9d288aebc785e1889 Mon Sep 17 00:00:00 2001 From: Terry Cojean Date: Mon, 6 Nov 2023 20:02:18 -0500 Subject: [PATCH 2/6] Changelog updates Co-authored-by: Pratik Nayak Signed-off-by: Terry Cojean --- CHANGELOG.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 8e601c368c6..6d451d5b1ec 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -15,10 +15,10 @@ Please visit our wiki [Changelog](https://github.com/ginkgo-project/ginkgo/wiki/ The Ginkgo team is proud to announce the new Ginkgo minor release 1.7.0. This release brings new features such as: - Complete GPU-resident sparse direct solvers feature set and interfaces, -- Sparse batched iterative kernels support for Dense and ELL matrix types and the BiCGSTAB solver, +- Improved Cholesky factorization performance, - A new MC64 reordering, +- Batched iterative solver support with the BiCGSTAB solver with batched Dense and ELL matrix types, - MPI support for the SYCL backend, -- Improved Cholesky factorization performance, - Improved ParILU(T)/ParIC(T) preconditioner convergence, and more! @@ -36,8 +36,8 @@ Supported systems and requirements: + Cray Compiler: 14.0.1+ + CUDA module: CMake 3.18+, and CUDA 10.0+ or NVHPC 22.7+ + HIP module: ROCm 4.5+ - + DPC++ module: Intel OneAPI 2021.3+ with oneMKL and oneDPL. Set the CXX compiler to `dpcpp` or `icpx`. - + MPI: version 3.1+, ideally with GPUDirect support for best performance + + DPC++ module: Intel OneAPI 2022.1+ with oneMKL and oneDPL. Set the CXX compiler to `dpcpp` or `icpx`. + + MPI: standard version 3.1+, ideally GPU Aware, for best performance + Windows + MinGW: GCC 5.5+ + Microsoft Visual Studio: VS 2019+ From c56663341d2098a5b08532b093fe065fb71a1681 Mon Sep 17 00:00:00 2001 From: Terry Cojean Date: Mon, 6 Nov 2023 20:03:23 -0500 Subject: [PATCH 3/6] Changelog: recap previous deprecations Signed-off-by: Terry Cojean --- CHANGELOG.md | 16 +++++++++++++++- 1 file changed, 15 insertions(+), 1 deletion(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 6d451d5b1ec..32732cc7838 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -53,12 +53,26 @@ Supported systems and requirements: + `const` Factory parameters can no longer be modified through `with_*` functions, as this breaks const-correctness [#1336](https://github.com/ginkgo-project/ginkgo/pull/1336) [#1439](https://github.com/ginkgo-project/ginkgo/pull/1439) -### Deprecations +### New Deprecations + The `device_reset` parameter of CUDA and HIP executors no longer has an effect, and its `allocation_mode` parameters have been deprecated in favor of the `Allocator` interface [#1315](https://github.com/ginkgo-project/ginkgo/pull/1315) + The CMake parameter `GINKGO_BUILD_DPCPP` has been deprecated in favor of `GINKGO_BUILD_SYCL`. [#1350](https://github.com/ginkgo-project/ginkgo/pull/1350) + The `gko::reorder::Rcm` interface has been deprecated in favor of `gko::experimental::reorder::Rcm` based on `Permutation` [#1418](https://github.com/ginkgo-project/ginkgo/pull/1418) +### Summary of previous deprecations ++ The class `AmgxPgm` is deprecated in favor of `Pgm`. ++ `gko::lend()` is not necessary anymore. ++ Default constructors for the CSR `load_balance` and `automatical` strategies ++ The classes `RelativeResidualNorm` and `AbsoluteResidualNorm` are deprecated in favor of `ResidualNorm`. ++ The PolymorphicObject's move-semantic `copy_from` variant ++ The templated `SolverBase` class. ++ Logger constructors and create functions with the `executor` parameter. ++ The virtual, protected, Dense functions `compute_norm1_impl`, `add_scaled_impl`, etc. ++ Logger events for solvers and criterion without the additional `implicit_tau_sq` parameter. ++ The Permutation class' `permute_mask` functionality ++ The global `gko::solver::default_krylov_dim`, use instead `gko::solver::gmres_default_krylov_dim`. ++ Multiple functions with typos (`set_complex_subpsace()`, range functions such as `conj_operaton` etc). + ### Added features + Adds a batch::BatchLinOp class that forms a base class for batched linear operators such as batched matrix formats, solver and preconditioners [#1379](https://github.com/ginkgo-project/ginkgo/pull/1379) From 83c65d89db306a1a11a22005adc46530670a34aa Mon Sep 17 00:00:00 2001 From: Terry Cojean Date: Wed, 8 Nov 2023 09:16:00 -0500 Subject: [PATCH 4/6] Remove lend from batch documentation --- include/ginkgo/core/base/batch_lin_op.hpp | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/ginkgo/core/base/batch_lin_op.hpp b/include/ginkgo/core/base/batch_lin_op.hpp index 497936374a2..03f3a6dc6f4 100644 --- a/include/ginkgo/core/base/batch_lin_op.hpp +++ b/include/ginkgo/core/base/batch_lin_op.hpp @@ -215,7 +215,7 @@ class BatchLinOp : public EnableAbstractPolymorphicObject { * // create a batch linear operator which represents the solver * auto batch_cg = batch_cg_factory->generate(A); * // solve the system - * batch_cg->apply(gko::lend(batch_b), gko::lend(batch_x)); + * batch_cg->apply(batch_b, batch_x); * ``` * * @ingroup BatchLinOp From 541c47ea7aede93a1658a202e60105dadfe234ac Mon Sep 17 00:00:00 2001 From: Terry Cojean Date: Thu, 9 Nov 2023 21:06:35 -0500 Subject: [PATCH 5/6] Changelog: update the verion requirement and ordering --- CHANGELOG.md | 19 ++++++++++--------- 1 file changed, 10 insertions(+), 9 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 32732cc7838..e7239058991 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -34,19 +34,19 @@ Supported systems and requirements: + Apple Clang: 14.0 is tested. Earlier versions might also work. + NVHPC: 22.7+ + Cray Compiler: 14.0.1+ - + CUDA module: CMake 3.18+, and CUDA 10.0+ or NVHPC 22.7+ + + CUDA module: CMake 3.18+, and CUDA 10.1+ or NVHPC 22.7+ + HIP module: ROCm 4.5+ + DPC++ module: Intel OneAPI 2022.1+ with oneMKL and oneDPL. Set the CXX compiler to `dpcpp` or `icpx`. + MPI: standard version 3.1+, ideally GPU Aware, for best performance + Windows + MinGW: GCC 5.5+ + Microsoft Visual Studio: VS 2019+ - + CUDA module: CUDA 10.0+, Microsoft Visual Studio + + CUDA module: CUDA 10.1+, Microsoft Visual Studio + OpenMP module: MinGW. ### Version support changes -+ CUDA 9.2 is no longer supported [#1382](https://github.com/ginkgo-project/ginkgo/pull/1382) ++ CUDA 9.2 is no longer supported and 10.0 is untested [#1382](https://github.com/ginkgo-project/ginkgo/pull/1382) + Ginkgo now requires CMake version 3.16 (and 3.18 for CUDA) [#1368](https://github.com/ginkgo-project/ginkgo/pull/1368) ### Interface changes @@ -55,23 +55,24 @@ Supported systems and requirements: ### New Deprecations -+ The `device_reset` parameter of CUDA and HIP executors no longer has an effect, and its `allocation_mode` parameters have been deprecated in favor of the `Allocator` interface [#1315](https://github.com/ginkgo-project/ginkgo/pull/1315) ++ The `device_reset` parameter of CUDA and HIP executors no longer has an effect, and its `allocation_mode` parameters have been deprecated in favor of the `Allocator` interface. [#1315](https://github.com/ginkgo-project/ginkgo/pull/1315) + The CMake parameter `GINKGO_BUILD_DPCPP` has been deprecated in favor of `GINKGO_BUILD_SYCL`. [#1350](https://github.com/ginkgo-project/ginkgo/pull/1350) -+ The `gko::reorder::Rcm` interface has been deprecated in favor of `gko::experimental::reorder::Rcm` based on `Permutation` [#1418](https://github.com/ginkgo-project/ginkgo/pull/1418) ++ The `gko::reorder::Rcm` interface has been deprecated in favor of `gko::experimental::reorder::Rcm` based on `Permutation`. [#1418](https://github.com/ginkgo-project/ginkgo/pull/1418) ++ The Permutation class' `permute_mask` functionality. [#1415](https://github.com/ginkgo-project/ginkgo/pull/1415) ++ Multiple functions with typos (`set_complex_subpsace()`, range functions such as `conj_operaton` etc). [#1348](https://github.com/ginkgo-project/ginkgo/pull/1348) ### Summary of previous deprecations -+ The class `AmgxPgm` is deprecated in favor of `Pgm`. + `gko::lend()` is not necessary anymore. -+ Default constructors for the CSR `load_balance` and `automatical` strategies + The classes `RelativeResidualNorm` and `AbsoluteResidualNorm` are deprecated in favor of `ResidualNorm`. ++ The class `AmgxPgm` is deprecated in favor of `Pgm`. ++ Default constructors for the CSR `load_balance` and `automatical` strategies + The PolymorphicObject's move-semantic `copy_from` variant + The templated `SolverBase` class. ++ The class `MachineTopology` is deprecated in favor of `machine_topology`. + Logger constructors and create functions with the `executor` parameter. + The virtual, protected, Dense functions `compute_norm1_impl`, `add_scaled_impl`, etc. + Logger events for solvers and criterion without the additional `implicit_tau_sq` parameter. -+ The Permutation class' `permute_mask` functionality + The global `gko::solver::default_krylov_dim`, use instead `gko::solver::gmres_default_krylov_dim`. -+ Multiple functions with typos (`set_complex_subpsace()`, range functions such as `conj_operaton` etc). ### Added features From 5f7b9201a511de6b8f5076893e39441d5454e2eb Mon Sep 17 00:00:00 2001 From: Terry Cojean Date: Fri, 10 Nov 2023 09:13:51 -0500 Subject: [PATCH 6/6] typo: fix OneAPI -> oneAPI Co-authored-by: Tobias Ribizel Signed-off-by: Terry Cojean --- CHANGELOG.md | 2 +- README.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index e7239058991..90834b209dc 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -36,7 +36,7 @@ Supported systems and requirements: + Cray Compiler: 14.0.1+ + CUDA module: CMake 3.18+, and CUDA 10.1+ or NVHPC 22.7+ + HIP module: ROCm 4.5+ - + DPC++ module: Intel OneAPI 2022.1+ with oneMKL and oneDPL. Set the CXX compiler to `dpcpp` or `icpx`. + + DPC++ module: Intel oneAPI 2022.1+ with oneMKL and oneDPL. Set the CXX compiler to `dpcpp` or `icpx`. + MPI: standard version 3.1+, ideally GPU Aware, for best performance + Windows + MinGW: GCC 5.5+ diff --git a/README.md b/README.md index 48c7db0715f..df4bdb6c0b4 100644 --- a/README.md +++ b/README.md @@ -64,7 +64,7 @@ The Ginkgo HIP module has the following __additional__ requirements: The Ginkgo DPC++(SYCL) module has the following __additional__ requirements: -* _OneAPI 2021.3+_ +* _oneAPI 2022.1+_ * Set `dpcpp` or `icpx` as the `CMAKE_CXX_COMPILER` * `c++17` is used to compile Ginkgo * The following oneAPI packages should be available: