A `noalias` related numerical error #250

newling · 2025-01-07T01:31:09Z

Background information (skippable)

We (iree-amd-aie) compile matmuls, batch matmuls, matmul + elementwise 'kernels' of different sizes through peano. We sometimes run out of program memory. To avoid running out of memory, we have recently implemented an mlir function outlining pass: duplicated blocks of code are then replaced with one-line function calls to the outlined function. This works very well in terms of reducing the amount of program memory we use, but results in severe performance degradation (2x or more slowdowns).

This degradation is a bit surprising, as when we use microkernels we see good performance, and microkernels use the same llvm function calls as the outlined functions do (tail call void @something). This suggests there is some missed optimization when we outline, or something in the way we are constructing our functions that is causing this slowdown.

The function signatures at llvm level (.ll file) that our outlined function are lowered to look something like

define void @generic_matmul_0_outlined(ptr %0, ptr %1, ptr %2) {
   // outlined common code for matmul (for example with m=n=k=32). 
}

The above is for matmul C = A@B, where %0 and %1 are pointers into A and B, and %2 is a pointer into C. The llvm opt pass from peano (we use -O2 and a few other flags) lowers this further to

define void @generic_matmul_0_outlined(ptr nocapture readonly %0, ptr nocapture readonly %1, ptr nocapture %2) {
   // optimized (unrolled, etc) matmul. 
}

Note that the above function signature contains no information about alignment, and no aliasing information. I assume that opt couldn't deduce that C does not alias A and B. Could this be the missing info that's causing the slowdown? In practice we know that %2 above is definitely not alised to %0 or %1. So I've tried manually (in an MLIR pass) adding the noalias attribute to the function signature, resulting in

define void @generic_matmul_0_outlined(ptr nocapture readonly %0, ptr nocapture readonly %1, ptr noalias nocapture %2) {
   ...
}

and this fixes the problem: with the noalias attribute added, the performance with and without outlining is basically the same.

Issue

For some 'unusual' matmuls and batch matmuls, adding the noalias attribute is resulting in numerical errors. I can't see anything wrong with the IR (.ll or .opt.ll files). We only observe the numerical error when opt is run with -O1, -O2, or -O3: at -O0 there is no numerical error. The numerical error is different at -O1 to -O2/-O3 (same numerical error at O2 and O3). All the shapes we're interested in do not have numerical issues, but obviously we want to be able to add noalias for all shapes (if this approach is sensible). One shape which I see the failure for is when the function does a matmul for M=N=K=32 (i.e. A, B, and C are all 32x32 matrices).

I have attached the following files to help triangulate the problem:

File	Notes
input.ll	The original IR for the function
input_no_alias.ll	Above, but with the `noalias` attribute added to final operand (C)
input.opt0.ll	The IR after running `opt -O0` on input.ll. Numerically correct.
input.opt1.ll	The IR after running `opt -O1` on input.ll. Numerically correct.
input.opt2.ll	The IR after running `opt -O2` on input.ll. Numerically correct.
input_no_alias.opt0.ll	The IR after running `opt -O0` on input_no_alias.ll. Numerically correct.
input_no_alias.opt1.ll	The IR after running `opt -O1` on input_no_alias.ll. Numerically incorrect.
input_no_alias.opt2.ll	The IR after running `opt -O2` on input_no_alias.ll. Numerically incorrect.

input.ll.txt
input_noalias.ll.txt
input.opt0.ll.txt
input.opt1.ll.txt
input.opt2.ll.txt
input_noalias.opt0.ll.txt
input_noalias.opt1.ll.txt
input_noalias.opt2.ll.txt

All files as zips:
opt_files.tar.gz
opt_files.zip

Some observations

The difference between input.opt1.ll and input_no_alias.opt1.ll is only the function signature:

 define void @generic_matmul_0_outlined(ptr nocapture readonly %0, ptr nocapture readonly %1, ptr nocapture %2) local_unnamed_addr #0 {

vs

 define void @generic_matmul_0_outlined(ptr nocapture readonly %0, ptr nocapture readonly %1, ptr noalias nocapture %2) local_unnamed_addr #0 {

Recall -- input_no_alias.opt1.ll gives the numerical error, while input.opt1.ll does not. Presumably this means that peano is using the noalias attribute after opt has run? (I'm not sure what peano does with the optimized llvm IR, pointers of where in the code to look would be helpful).

At O2, the difference between input.opt2.ll and input_no_alias.opt2.ll is more major, the body of the function with the noalias attribute is much shorter (this is presumably why the performance is better with the noalias attribute added with O2 and O3).

Questions

Is there anything obviously wrong with the IR, input_no_alias.ll?
Is there another, better way to manipulate the function signature of our outlined function, to bridge the performance gap with the inlined (non-outlined) version?
Where is the numerical error coming from?

The text was updated successfully, but these errors were encountered:

newling · 2025-01-09T00:13:21Z

Using a new version of peano aka llvm-aie fixes this problem. i.e.
wheel from September 2024 : numerical error
wheel from January 2025 : no numerical error

newling mentioned this issue Jan 7, 2025

Make outlined function arguments non-aliasing nod-ai/iree-amd-aie#1006

Closed

newling closed this as completed Jan 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A `noalias` related numerical error #250

A `noalias` related numerical error #250

newling commented Jan 7, 2025 •

edited

Loading

newling commented Jan 9, 2025

A noalias related numerical error #250

A noalias related numerical error #250

Comments

newling commented Jan 7, 2025 • edited Loading

Background information (skippable)

Issue

Some observations

Questions

newling commented Jan 9, 2025

A `noalias` related numerical error #250

A `noalias` related numerical error #250

newling commented Jan 7, 2025 •

edited

Loading