Some Qs about implementation #8

lylOwhd · 2024-02-21T03:28:02Z

AMOS represents an innovative approach that leverages automated Mapping Generation and performance optimization to enhance the utilization of emerging hardware units like TensorCore. I have encountered some implementation challenges that I seek guidance on.

in computing compute latency, the intrinsic latency, a fixed value, can be approximated using hardware models. The resulting latency is then multiplied by the trip counts of sequential loops, which operate in a sequential manner not tethered to parallel cores. An inquiry arises: why is this sequencing necessary?
Operations like tiling, fusion, and other scheduling actions typically precede tensorization, leading to the generation of parallel code. Moreover, scheduling adjustments may introduce variations in the number of software iterations. How should this fluctuation be addressed, and what is the current efficacy of the mapping generation process?

KnowingNothing · 2024-02-22T02:46:30Z

When there are no enough cores for parallel execution, some loops still remain sequential. This is common for tensor computation.
Mapping takes three steps: compute transform, scheduling, tensorization. Compute transform changes the compute expressions according to hardware intrinsic, scheduling mutates the loop structure, tensorization replaces innermost loops with intrinsic. To make sure tensorization won't be affected by scheduling, we perform a pre-tiling step in compute transform step to keep a fixed number of iterations as innermost loops. For example, a GEMM:

for i 
 for j 
  for k
   ...

will be transformed into

for io
 for jo
  for ko
   for ii in range 16
    for ji in range 16
     for ki in range 16
      ....

As for the efficacy, mapping generation is fast (several seconds) but performance tuning is slow (tens of minutes).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some Qs about implementation #8

Some Qs about implementation #8

lylOwhd commented Feb 21, 2024

KnowingNothing commented Feb 22, 2024

Some Qs about implementation #8

Some Qs about implementation #8

Comments

lylOwhd commented Feb 21, 2024

KnowingNothing commented Feb 22, 2024