Revisited & Fixed Half (fp16) data support #495

OuadiElfarouki · 2024-02-12T13:39:48Z

This PR is an update and extension of half data support in portBLAS and includes following changes :

half support is enabled using the cmake option BLAS_ENABLE_HALF and is only applied to operators meant to support half according to oneMKL spec (so far in this PR axpy, scal and gemm)
unittests & benchmarksare extended to support mixed-precision comparison (reference BLAS libs only support float/double).
Extended unittests for axpy, scal, and gemm (+gemm_batched) using half.
Extended portblas, cublas & rocblas benchmarks for gemm (+gemm_batched).
Separated gemm configurations when using half data type for each TUNING_TARGET from the float/double configurations.

Other notes :

half precision support is disabled when targetting DEFAULT_CPU due to lack of fp16 support.
some legacy gemm configurations for intel GPU targets with sycl::half have been removed (not based on a tuning but rather a temporary reduction of generated kernels)

… & gemm Ops

…ols for it

common/include/common/float_comparison.hpp

test/unittest/blas1/blas1_scal_test.cpp

benchmark/cublas/blas3/gemm.cpp

benchmark/rocblas/blas3/gemm.cpp

test/blas_test.hpp

Co-authored-by: pgorlani <[email protected]>

test/blas_test.hpp

src/operations/blas3/gemm_common.hpp

src/interface/gemm_interface.hpp

src/interface/blas3/backend/amd_gpu.hpp

benchmark/cublas/utils.hpp

benchmark/rocblas/utils.hpp

common/include/common/common_utils.hpp

Co-authored-by: pgorlani <[email protected]>

common/include/common/common_utils.hpp

common/include/common/float_comparison.hpp

… haf guards

OuadiElfarouki added 13 commits February 5, 2024 09:04

Updated half data support approach & intial enablement for some blas1…

9fb1f6d

… & gemm Ops

Enabled testing of half-data supporting operators & added relevant to…

b60ba9e

…ols for it

disabled HALF precision when targetting default CPU

b60f791

minor fixes

9744e12

Merge branch 'master' into half_data_revisited

da20e97

enabled benchmarks for fp16 supporting operators

c16db81

minor updates to mul_add for half/complex data

42fa2d0

Enabled fp16 benchmarking in cuBLAS

dfe2796

added half gemm config on nvidia gpu for improved perf

33ed262

minor fixes

13dce4e

enabled half data gemm benchmark on AMD GPU

ae2312e

Minor fixes to cublas & rocblas half-data benchmarks

4663f32

minor fix

6935da0

OuadiElfarouki requested review from pgorlani, s-Nick and muhammad-tanvir-1211 February 12, 2024 15:07

pgorlani reviewed Feb 13, 2024

View reviewed changes

common/include/common/float_comparison.hpp Outdated Show resolved Hide resolved

pgorlani reviewed Feb 14, 2024

View reviewed changes

test/unittest/blas1/blas1_scal_test.cpp Outdated Show resolved Hide resolved

OuadiElfarouki added 5 commits February 14, 2024 17:02

Removed unecessary custom cast function to half

97a3310

Moved half type casting within reference blas (Review adressed)

293d690

Removed uncessary half data branching in portblas benchmarks

a98091c

removed unecessary half data branching/casting in cublas bench

72934b6

Cleaning half-related redundant changes

52b3e35