Extended Gemm interface to support mixed precision operations #500

OuadiElfarouki · 2024-02-29T13:17:20Z

Extend Gemm operator interface to support mixed precision operations, namely by decoupling matrix A and B type element_in_t from output matrix C and scalars alpha and beta type element_out_t.

Following oneMKL's spec notation for Gemm API : https://spec.oneapi.io/versions/latest/elements/oneMKL/source/domains/blas/gemm.html#onemkl-blas-gemm, this PR enables (Ta==Tb) to be set independently from (Tc==Ts). This feature has been enabled at a first stage for Ta=Tb=sycl::half and Tc=Ts=float. Thus enabling half support also enables the mixed precision case of (half, float) for gemm.

Changes include:

Updating different Gemm kernel implementations to account for the decoupled types.
Necessary CMake and Kernel generation scripts updates to account for the couple of types instead of single type in gemm case.
Necessary changes to unit-tests to account for this feature in the Gemm case.

Note :
Following oneMKL expected Gemm API, Support of bfloat16-float and std::int8_t-float would be straightforward afterwards, but the additional cases of Ta==Tb==Tc while Ts (alpha & beta) is separate will require additional decoupling & re-design work..

… & gemm Ops

…ols for it

Co-authored-by: pgorlani <[email protected]>

… haf guards

…nels

cmake/CmakeFunctionHelper.cmake

include/operations/blas3_trees.h

src/operations/blas3/gemm_common.hpp

src/operations/blas3/gemm_local_joint_matrix.hpp

src/sb_handle/portblas_handle.hpp

test/blas_test_macros.hpp

test/unittest/blas3/blas3_gemm_common.hpp

s-Nick

Thank you for you work @OuadiElfarouki.
There is some confusion with element_in_t and element_out_t because you forget to update element_t to one or the other. Can you please fix it? @hjabird highlighted many of them. In general, I agree that something like in_element_t and out_element_t would be more readable, please consider it.

src/operations/blas3/gemm_interleaved.hpp

s-Nick · 2024-03-27T11:21:58Z

src/operations/blas3/gemm_interleaved.hpp

+          if constexpr (is_half<element_in_t>::value ||
+                        !std::is_same_v<element_in_t, element_out_t>) {
+#pragma unroll
+            for (int v = 0; v < VectorSize; ++v) {


Can you elaborate why this loop is necessary?

@s-Nick AdaptiveCpp doesn't support sycl::vecsycl::half for mad operation, so had to break it down to its elements through the loop

Co-authored-by: HJA Bird <[email protected]>

OuadiElfarouki · 2024-04-18T15:15:20Z

@s-Nick @Rbiessy @hjabird I've rebased and fixed some bugs in the current PR after recent commits. Feel free to check it ! Thanks.

Rbiessy

From a quick look I don't have any major concern.

s-Nick

LGTM!

…xed_precision_gemm

OuadiElfarouki and others added 30 commits February 5, 2024 09:04

Updated half data support approach & intial enablement for some blas1…

9fb1f6d

… & gemm Ops

Enabled testing of half-data supporting operators & added relevant to…

b60ba9e

…ols for it

disabled HALF precision when targetting default CPU

b60f791

minor fixes

9744e12

Merge branch 'master' into half_data_revisited

da20e97

enabled benchmarks for fp16 supporting operators

c16db81

minor updates to mul_add for half/complex data

42fa2d0

Enabled fp16 benchmarking in cuBLAS

dfe2796

added half gemm config on nvidia gpu for improved perf

33ed262

minor fixes

13dce4e

enabled half data gemm benchmark on AMD GPU

ae2312e

Minor fixes to cublas & rocblas half-data benchmarks

4663f32

minor fix

6935da0

Removed unecessary custom cast function to half

97a3310

Moved half type casting within reference blas (Review adressed)

293d690

Removed uncessary half data branching in portblas benchmarks

a98091c

removed unecessary half data branching/casting in cublas bench

72934b6

Cleaning half-related redundant changes

52b3e35

Update test/blas_test.hpp

23c8efd

Co-authored-by: pgorlani <[email protected]>

Update benchmark/cublas/blas3/gemm.cpp

feaf0fe

Co-authored-by: pgorlani <[email protected]>

Remove extra bracket.

e1804e0

removed unecessary half data branching/casting in rocBLAS bench

e535773

further cleaning & simplifications

a09aba5

Update benchmark/cublas/utils.hpp

d30d252

Co-authored-by: pgorlani <[email protected]>

Update benchmark/rocblas/utils.hpp

919bc19

Co-authored-by: pgorlani <[email protected]>

Update common/include/common/common_utils.hpp

b8a5be3

Co-authored-by: pgorlani <[email protected]>

further simplifications

af93a62

minor update to benchmark rand gen

d29372e

updated readme, disabled complex support by default and removed extra…

a8282b2

… haf guards

minor cmake fix for header only use-case (oneMKL in particular)

25d3d23

Fixed issues in unit-tests

155781d

OuadiElfarouki requested review from muhammad-tanvir-1211, aacostadiaz, pgorlani and s-Nick and removed request for aacostadiaz February 29, 2024 16:26

OuadiElfarouki added 6 commits March 4, 2024 10:38

Typo fixes

38fd71e

Merge branch 'master' into mixed_precision_gemm

db67f2b

minor fix for default cpu gemm config

852722e

re-enabled half support for default CPUs and fixed gemm non-local ker…

26970c0

…nels

Separated half gemm config for default CPUs

a2f489f

cast half to float within mul_add

3c150cf

hjabird reviewed Mar 26, 2024

View reviewed changes

s-Nick reviewed Mar 27, 2024

View reviewed changes

OuadiElfarouki and others added 6 commits March 27, 2024 14:20

Apply typo suggestions from code review

620ea94

Co-authored-by: HJA Bird <[email protected]>

removed non-necessary gemm template parameter element in/out

aff966c

addressed PR comments

c7aedc9

Merge branch 'master' into mixed_precision_gemm

d3a962a

removed const specifier causing errors when const data is enabled (gemm)

95f77c6

Merge branch 'master' into mixed_precision_gemm

8e2930b

OuadiElfarouki requested a review from Rbiessy April 18, 2024 15:13

OuadiElfarouki and others added 2 commits May 7, 2024 11:40

Merge branch 'master' into mixed_precision_gemm

d0a64b9

Merge remote-tracking branch 'upstream/master' into mixed_precision_gemm

c7ce902

Rbiessy approved these changes May 9, 2024

View reviewed changes

s-Nick approved these changes May 13, 2024

View reviewed changes

Merge branch 'master' of github.com:codeplaysoftware/portBLAS into mi…

ae8a874

…xed_precision_gemm

hjabird approved these changes May 13, 2024

View reviewed changes

s-Nick merged commit 3a3113a into codeplaysoftware:master May 15, 2024
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extended Gemm interface to support mixed precision operations #500

Extended Gemm interface to support mixed precision operations #500

OuadiElfarouki commented Feb 29, 2024

s-Nick left a comment

s-Nick Mar 27, 2024

OuadiElfarouki Mar 27, 2024

OuadiElfarouki commented Apr 18, 2024

Rbiessy left a comment

s-Nick left a comment

Extended Gemm interface to support mixed precision operations #500

Extended Gemm interface to support mixed precision operations #500

Conversation

OuadiElfarouki commented Feb 29, 2024

s-Nick left a comment

Choose a reason for hiding this comment

s-Nick Mar 27, 2024

Choose a reason for hiding this comment

OuadiElfarouki Mar 27, 2024

Choose a reason for hiding this comment

OuadiElfarouki commented Apr 18, 2024

Rbiessy left a comment

Choose a reason for hiding this comment

s-Nick left a comment

Choose a reason for hiding this comment