Skip to content

Commit

Permalink
[tune_gemm] Update the filter for LDS usage for stream-pipelineV2 (#661)
Browse files Browse the repository at this point in the history
* Update the filter for LDS usage for stream-pipelineV2

* fix min --> max
  • Loading branch information
zhanglx13 authored Nov 13, 2024
1 parent 086312b commit 279cfa7
Showing 1 changed file with 8 additions and 2 deletions.
10 changes: 8 additions & 2 deletions python/perf-kernels/tools/tune_gemm/tune_gemm.py
Original file line number Diff line number Diff line change
Expand Up @@ -143,8 +143,14 @@ def prune_configs(M, N, K, configs, elemBytes_a, elemBytes_b):
continue
# out of shared memory resource
# TODO (zhanglx): This does not consider the LDS usage in the epilogue
LDS = BLOCK_SIZE_K * BLOCK_SIZE_M * elemBytes_a + BLOCK_SIZE_K * BLOCK_SIZE_N * elemBytes_b
LDS = LDS if not num_stages else LDS * num_stages
LDSA = BLOCK_SIZE_K * BLOCK_SIZE_M * elemBytes_a
LDSB = BLOCK_SIZE_K * BLOCK_SIZE_N * elemBytes_b
if num_stages <= 1:
# No pipeline, buffer A and buffer B can re-use each other
LDS = max(LDSA, LDSB)
else:
# Pipeline, we need (num_stages - 1) buffers for both A and B at the same time
LDS = (LDSA + LDSB) * (num_stages - 1)
if LDS > 65536:
continue
# Skip small block sizes and num_warps for large gemm
Expand Down

0 comments on commit 279cfa7

Please sign in to comment.