Skip to content

Latest commit

 

History

History
395 lines (347 loc) · 18 KB

CHANGELOG.md

File metadata and controls

395 lines (347 loc) · 18 KB

Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog.

[0.7.0] - 2021-08-03

Compatibility Changes:

  • Visual Studio 2017 is no longer supported #1251
  • 32bit Windows is no longer supported #1251
  • CUDA 11.3 is now supported #1295
  • clang < 9 is no longer supported as CUDA compiler #1300
  • clang 11 is now supported #1310

Bug Fixes:

  • fixed ALPAKA_ACC_CPU_B_OMP2_T_SEQ_ENABLED being checked without being defined #1259

New Features:

  • when no specialization is provided by the user alpaka's math functions will now fall back to ADL to find a candidate #1248
  • the HIP back-end now supports callbacks #1269
  • added warp::shfl functionality #1273
  • added Front and Contains type list meta functions #1306

Misc:

  • alpaka's CMake build system now uses CMake's first-class CUDA support #1146
  • updated documentation for clang-format usage #1222
  • increased the static shared memory size to 47 KiB #1247
  • fixed table markup in README.md #1256
  • added example showcasing how to specialize kernels for particular back-ends #1271
  • removed section comments #1275
  • updated cheatsheet (added warp info, fixed names) #1281

Breaking Changes:

  • alpaka now requires CMake 3.18 or newer #1146
  • the CUDA and HIP back-ends no longer enable fast-math by default #1285
  • the CMake options ALPAKA_CUDA_FAST_MATH and ALPAKA_HIP_FAST_MATH have been replaced by ALPAKA_FAST_MATH #1289
  • the CMake options ALPAKA_CUDA_FTZ and ALPAKA_HIP_FTZ have been replaced by ALPAKA_FTZ #1289
  • the CMake option ALPAKA_CUDA_NVCC_SEPARABLE_COMPILATION has been replaced by the native CMake property CUDA_SEPARABLE_COMPILATION #1289
  • the CMake option ALPAKA_CUDA_NVCC_EXPT_EXTENDED_LAMBDA has been replaced by ALPAKA_CUDA_EXPT_EXTENDED_LAMBDA #1289

Test cases / CI:

  • enabled OpenMP back-ends for more Visual Studio builds #1219
  • fixed gh-pages #1230
  • added ICPC / ICC 2021.x to CI #1235
  • fixed deadlock in Ubuntu 20.04 container #1270
  • now CI-testing CMake 3.20 #1283

[0.6.1] - 2021-06-29

Compatibility Changes:

  • rework implementation of OpenMP schedule support #1279 #1309 #1313 #1341
    • alpaka::omp::Schedule is replaced by ompScheduleKind and ompScheduleChunkSize

Bug Fixes:

  • fix OpenMP 5 shared memory allocation #1254
  • fix static shared memory alignment #1282
  • fix BlockSharedMemStMemberImpl::getVarPtr for last var #1280
  • fix CPU static shared memory implementation #1258
  • unit tests: fix queue test #1266
  • fix CtxBlockOacc: SyncBlockThreads #1291
  • fix assert in DeclareSharedVar (OpenAcc) #1303
  • CMake CUDA: dev compile options not propagated #1294
  • example: fix warning (NVCC+OpenMP) #1307
  • TBB: Add missing header and fix integer namespace #1327
  • OpenAcc: TaskKernelOacc: copyin(all used local vars) #1342
  • port macOSX CI fix from #1283
  • CI: use ubuntu-18.04 for gcc-5 and gcc-6 builds #1252
  • CI: disable GCC 10.3 + NVCC tests #1302
  • CI: MSVC + nvcc workarounds and fixes #1332
  • CI: fix warp test #1339

Misc

  • add ALPAKA_ASSERT_OFFLOAD Macro #1260
  • document return value of empty() and isComplete() #1265
  • Prefer TBBConfig.cmake over FindTBB.cmake #1329

[0.6.0] - 2021-01-20

Compatibility Changes:

  • support for CUDA 11, 11.1, and 11.2 #1076 #1086 #1147 #1231
  • remove support for CUDA 11.0 with MSVC 2019 #1227
  • support for CMake 3.18.0 and 3.19.0 #1087 #1217
  • set minimal HIP version to 3.5 #1110
  • remove CMake HIP module shipped with alpaka #1189
  • set HIP-clang as default compiler for HIP #1113
  • support for NVCC + VS 2019 #1121
  • support for boost-1.74.0 #1142
  • explicitly require backends and do not enable them by default #1111
  • remove support for Xcode 11.1 #1206
  • support Xcode 11.21 - 12.2.0 #1206
  • update to Catch 2.13.3 #1215

Bug Fixes:

  • apply some clang-tidy fixes #1044
  • fix CUDA/HIP accelerator concept usage #1064
  • fix Intel compiler detection #1070
  • CMake: build type CXX flag not passed to nvcc #1073
  • work around Intel ICE (Internal Compiler Error) when using std::decay on empty template parameter packs #1074
  • BoostPredef.hpp: Add redefinition of BOOST_COMP_PGI #1082
  • fix min/max return type deduction #1085
  • CMake: fix boost fiber linking #1088
  • fix HIP-clang compile #1107
  • fix CUDA/HIP cmake flags #1152
  • fix error handling CUDA/HIP #1108
  • ALPAKA_DECAY_T: Fix Intel detection, Add PGI #1116
  • fix how to set HIP target architecture #1112
  • fix and improve block shared mem st member sanity checks #1128
  • HIP: remove copy device2device workaround #1188
  • pass native pointers to kernel instead of buffer objects #1193
  • fix bug in isPinned() and pin() #1196
  • fix marking of unit tests for concepts #1226

New Features:

  • add functions alpaka::atomicAnd et. al. as shortcuts to alpaka::atomicOp<alpaka::AtomicAnd> et. al. #1005
  • warp voting functions #1003 #1049 #1090 #1092
  • Sphinx Doc: Fix Doxygen integration on readthedocs #1042 #1093 #1151
  • add cheat sheet to the docs #1057 #1177
  • extend AccDevProps with shared memory size per block #1084
  • OpenMP 5 target offload backend #1126
  • OpenACC backend #1127
  • option to set OpenMP schedule for the Omp2Blocks backend #1223

Misc

  • tests for BufferSlicing #1024
  • use std::invoke_result_t instead of std::result_of_t when available #1047
  • simplify shared memory usage in tests #1075
  • remove boost::aligned_alloc #1094
  • add unit tests for work div #1095
  • change examples (except reduce) to use getValidWorkDiv #1104
  • example monte-carlo-integration #1106
  • invoke docker run only once instead of twice #1109
  • cpu/SysInfo.hpp: Add #else for cpuid; Add PGI #1119
  • Pgi std atomic workaround #1120
  • make BlockSharedMemDynMember::staticAllocBytes a function #1118
  • add IntrinsicFallback: basic fallback implementations #1122
  • allow ALPAKA_CXX_STANDARD to propagate to nvcc with MSVC 1920 and above #1130
  • add set kernel #1132
  • make Queue test generic to handle QueueGenericThreads* with different devices #1133
  • IdxBtOmp: Add GetIdx specialization for 1d #1140
  • test CMAKE_CXX_EXTENSIONS=OFF #1153
  • change block memory size back to be stored as 32 bit #1187
  • add comments to math function traits that explain valid argument range #1190
  • provide docker_retry #1191
  • add .clang-format file #1204
  • add CI check whether code is correctly formatted #1213
  • make test/common a CMake INTERFACE library #1228

Breaking changes:

The namespace structure of alpaka is now flattened. The script can help you to apply the changes to your code. The script only works if you used the full namespace alpaka::* for alpaka functions.

  • removed namespace alpaka::dev
  • removed namespace alpaka::pltf
  • renamed function alpaka::vec::cast to alpaka::castVec
  • renamed function alpaka::vec::reverse to alpaka::reverseVec
  • renamed function alpaka::vec::concat to alpaka::concatVec
  • removed namespace alpaka::vec
  • removed namespace alpaka::workdiv
  • removed namespace alpaka::acc
  • renamed functors alpaka::atomic::op::And et. al. to alpaka::AtomicAnd et. al. #1185
  • removed namespace alpaka::atomic::op
  • removed namespace alpaka::atomic
  • removed namespace alpaka::queue
  • removed namespace alpaka::idx
  • removed namespace alpaka::dim
  • removed namespace alpaka::kernel
  • removed namespace alpaka::wait
  • removed namespace alpaka::mem
  • removed namespace alpaka::offset
  • removed namespace alpaka::elem
  • removed namespace alpaka::intrinsic
  • renamed function alpaka::event::test to alpaka::isComplete
  • removed namespace alpaka::event
  • removed namespace alpaka::time
  • removed namespace alpaka::example
  • renamed function alpaka::alloc::alloc to alpaka::malloc
  • renamed function alpaka::buf::alloc to alpaka::allocBuf
  • removed namespace alpaka::alloc
  • removed namespace alpaka::buf
  • renamed function alpaka::view::set to alpaka::memset
  • renamed function alpaka::view::copy to alpaka::memcpy
  • removed namespace alpaka::view
  • removed namespace alpaka::block::shared::st
  • removed namespace alpaka::block::shared::dyn
  • removed namespace alpaka::block::sync
  • renamed function getMem to getDynSharedMem #1197
  • renamed function getVar to declareSharedVar #1197
  • renamed function freeMem to freeSharedVars #1197
  • renamed functors alpaka::block::op::LogicalAnd et. al. to alpaka::BlockAnd et. al.
  • removed namespace alpaka::block::op
  • removed namespace alpaka::block

[0.5.0] - 2020-06-26

Compatibility Changes:

  • the minimum required C++ version has been raised from C++11 to C++14 #900
  • drop support for CUDA 8.0 (does not support c++14)
  • drop support for gcc 4.9 (does not support c++14)
  • drop support for CMake versions lower than 3.15 (3.11, 3.12, 3.13 and 3.14)
  • raise minimum supported boost version from 1.62.0 to 1.65.1 #906
  • require HIP version to 3.3.0 #1006
  • drop HIP-hcc support #945

Bug Fixes:

  • fix CMake error #941
  • fix HIP math includes #947
  • fix: missing hipRand and rocRand library #948
  • fix VS 2017 CUDA builds #953
  • fix uninitialized pitch #963
  • fix windows CI builds #965
  • fix conversion warning in TinyMT #997

New Features:

  • add automated gh-pages deployment for branch develop #916
  • unify CUDA/HIP backend #928 #904 #950 #980 #981
  • add support for Visual Studio 2019 #949
  • simplify vector operator construction #977
  • example heat-equation #978
  • extend supported compiler combinations gcc-8+nvcc 10.1-10.2 #985
  • add support for CMake 3.17 #988
  • adds initial files for sphinx/rst and readthedocs. #990 #1017 #1048
  • add support for clang 10 #998
  • add popcount intrinsic #1004
  • emulate hip/cuda-Memcpy3D with a kernel #1014
  • simplify alpaka usage #1017

[0.4.0] - 2020-01-14

Compatibility Changes:

  • added support for CUDA 10.0, 10.1 and 10.2
  • dropped support for CUDA 7.0 and 7.5
  • added official support for Visual Studio 2017 on Windows with CUDA 10 (built on Travis CI instead of appveyor now)
  • added support for xcode10.2-11.3 (no official CUDA support yet)
  • added support for Ubuntu 18.04
  • added support for gcc 9
  • added support for clang 7.0, 8.0 and 9.0
  • dropped support for clang 3.5, 3.6, 3.7, 3.8 and 3.9
  • added support for CMake 3.13, 3.14, 3.15 and 3.16
  • dropped support for CMake 3.11.3 and lower, 3.11.4 is the lowest supported version
  • added support for Boost 1.69, 1.70 and 1.71
  • added support for usage of libc++ instead of libstdc++ for clang builds
  • removed dependency to Boost.MPL and BOOST_CURRENT_FUNCTION
  • replaced Boost.Test with Catch2 using an internal version of Catch2 by default but allowing to use an external one

Bug Fixes:

  • fixed some incorrect host/device function attributes
  • fixed warning about comparison unsigned < 0
  • There is no need to disable all other backends manually when using ALPAKA_ACC_GPU_CUDA_ONLY_MODE anymore
  • fixed static block shared memory of types with alignemnt higher than defaultAlignment
  • fixed race-condition in HIP/NVCC queue
  • fixed data races when a GPU updates host memory by aligning host memory buffers always to 4kib

New Features:

  • Added a new alpaka Logo!
  • the whole alpaka code has been relicensed to MPL2 and the examples to ISC
  • added ALPAKA_CXX_STANDARD CMake option which allows to select the C++ standard to be used
  • added ALPAKA_CUDA_NVCC_SEPARABLE_COMPILATION option to enable separable compilation for nvcc
  • added ALPAKA_CUDA_NVCC_EXPT_EXTENDED_LAMBDA and ALPAKA_CUDA_NVCC_EXPT_RELAXED_CONSTEXPR CMake options to enable/disable those nvcc options (they were always ON before)
  • added headers for standalone usage without CMake (alpaka/standalone/GpuCudaRt.h, ...) which set the backend defines
  • added experimental HIP back-end with using nvcc (HIP >= 1.5.1 required, latest rocRand). More on HIP setup: doc/markdown/user/implementation/mapping/HIP.md
  • added sincos math function implementations
  • allowed to copy and move construct ViewPlainPtr
  • added support for CUDA atomics using "unsigned long int"
  • added compile-time error for atomic CUDA ops which are not available due to sm restrictions
  • added explicit errors for unsupported types/operations for CUDA atomics
  • replaced usages of assert with ALPAKA_ASSERT
  • replaced BOOST_VERIFY by ALPAKA_CHECK and returned success from all test kernels
  • added alpaka::ignore_unused as replacement for boost::ignore_unused

Breaking changes:

  • renamed QueueAsync to QueueNonBlocking and QueueSync to QueueBlocking
  • renamed alpaka::size::Size to alpaka::idx::Idx, alpaka::size::SizeType to alpaka::idx::IdxType (and TSize to TIdx internally)
  • replaced ALPAKA_FN_ACC_NO_CUDA by ALPAKA_FN_HOST
  • replaced ALPAKA_FN_ACC_CUDA_ONLY by direct usage of device
  • renamed ALPAKA_STATIC_DEV_MEM_CONSTANT to ALPAKA_STATIC_ACC_MEM_CONSTANT and ALPAKA_STATIC_DEV_MEM_GLOBAL to ALPAKA_STATIC_ACC_MEM_GLOBAL
  • renamed alpaka::kernel::createTaskExec to alpaka::kernel::createTaskKernel
  • QueueCpuSync now correctly blocks when called from multiple threads
    • This broke some previous use-cases (e.g. usage within existing OpenMP parallel regions)
    • This use case can now be handled with the support for external CPU queues as can bee seen in the example QueueCpuOmp2CollectiveImpl
  • previously it was possible to have kernels return values even though they were always ignored. Now kernels are checked to always return void
  • renamed all files with *Stl suffix to *StdLib
  • renamed BOOST_ARCH_CUDA_DEVICE to BOOST_ARCH_PTX
  • executors have been renamed due to the upcoming standard C++ feature with a different meaning. All files within alpaka/exec/ have been moved to alpaka/kernel/ and the files and classes have been renamed from Exec* to TaskKernel*. This should not affect users of alpaka but will affect extensions.

[0.3.6] - 2020-01-06

Bug Fixes:

  • fix cuda stream race condition #850
  • fix: cuda exceptions #844
  • math/abs: Added trait specialisation for double. #862
  • alpaka/math Overloaded float specialization #837
  • Fixes name conflicts in alpaka math functions. #784

[0.3.5] - 2018-11-18

New Features:

  • used OpenMP atomics instead of critical sections

[0.3.4] - 2018-10-17

Compatibility Changes:

  • added support for boost-1.68.0
  • added support for CUDA 10
  • support for glibc < 2.18 (fix missing macros)
  • added checks for available OpenMP versions

Bug Fixes:

  • fixed empty(StreamCpuAsync) returning true even though the last task is still in progress
  • fixed integer overflows in case of int16_t being used as accelerator index type
  • made some throwing destructors not throwing to support clang 7
  • fixed broken alpaka::math::min for non-integral types

New Features:

  • added prepareForAsyncCopy which can be called to enable async copies for a specific buffer (if it is supported)
  • allowed to run alpaka OpenMP 2 block accelerated kernels within existing parallel region
  • added alpaka::ignore_unused which can be used in kernels

[0.3.3] - 2018-08-10

New Features:

  • added CPU random number generators based on std::random_device and TinyMT32
  • made TinyMT32 the default random number generator
  • added alpaka::ignore_unused

[0.3.2] - 2018-10-17

New Features:

  • Enhanced the compiler compatibility checks within the CMake scripts

Bugs Fixed:

  • fixed missing error in case of wrong OpenMP thread count being used by the runtime that was not triggered when not in debug mode
  • fixed CUDA driver API error handling
  • fixed CUDA memcpy and memset for zero sized buffers (division by zero)
  • fixed OpenMP 4 execution
  • fixed the VS2017 CUDA build (not officially supported)
  • fixed CUDA callback execution not waiting for the task to finish executing
  • fixed cudaOnly test being part of make test when cuda only mode is not enabled

Compatibility Changes:

  • added support for CUDA 9.2

[0.3.1] - 2018-06-11

New Features:

  • CMake: added option to control tests BUILD_TESTING
  • CMake: unified requirement of CMake 3.7.0+
  • CMake: used targets for Boost dependencies
  • CMake: made alpaka a pure interface library

Bugs Fixed:

  • fixed getDevCount documentation
  • fixed undefined define warnings
  • fixed self containing header check for CUDA

[0.3.0] - 2018-03-15

Bugs Fixed:

  • fixed multiple bugs where CPU streams/events could deadlock or behaved different than the native CUDA events
  • fixed a bug where the block synchronization of the Boost.Fiber backend crashed due to uninitialized variables

New Features / Enhancements:

  • added support for stream callbacks allowing to enqueue arbitrary host code using alpaka::stream::enqueue(stream, &{...});
  • added support for compiling for multiple architectures using e.g. ALPAKA_CUDA_ARCH="20;35"
  • added support for using host constexpr code within device code
  • enhanced the CUDA error handling
  • enhanced the documentation for mapping CUDA to alpaka

Compatibility Changes:

  • added support for CUDA 9.0 and 9.1
  • added support for CMake 3.9 and 3.10
  • removed support for CMake 3.6 and older
  • added support for boost-1.65.0
  • removed support for boost-1.61.0 and older
  • added support for gcc 7
  • added support for clang 4 and 5
  • removed support for VS2015

[0.2.0] - 2017-06-19

Compatibility fixes and small enhancements:

  • the documentation has been greatly enhanced
  • adds support for CUDA 8.0
  • adds support for CMake versions 3.6, 3.7 and 3.8
  • adds support for Boost 1.62, 1.63 and 1.64
  • adds support for clang-3.9
  • adds support for Visual Studio 2017
  • alpaka now compiles clean even with clang -Weverything
  • re-enabled the boost::fiber accelerator backend which was disabled in the last release

API changes:

  • mapIdx is moved from namespace alpaka::core to alpaka::idx
  • Vec is moved from namespace alpaka to alpaka::vec
  • vec::Vec is now allowed to be zero-dimensional (was previously forbidden)
  • added vec::concat
  • added element-wise operator< for vec::Vec which returns a vector of bool
  • CPU accelerators now support arbitrary dimensionality (both kernel execution as well as memory operations)
  • added support for syncBlockThreadsPredicate with block::sync::op::LogicalOr, block::sync::op::LogicalAnd and block::sync::op::Count
  • memory allocations are now aligned optimally for the underlying architecture (16 bit for SSE, 32 bit for AVX, 64 bit for AVX512) instead of 16 bit for all architectures in the previous release