-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AIE2] Fix pathological case where greedy reloads directly from a CR register #211
Conversation
const TargetRegisterInfo &TRI = *MRI.getTargetRegisterInfo(); | ||
|
||
// Clear previous cache, reserved registers can change for each function | ||
SuitableRCForRA.clear(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can suggest that we don't need to clean this cache always (from the compile time perspective). You can encapsulate this map with a BitVector representing the current set or reserved regs in a standalone class/struct (you can access the reserved registers with BitVector getReservedRegs
from TRI).
We can have, in the end something like:
SuitableRCForRA.clearIfDiverges(TRI)
-> just clean if we have different sets here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
This is important as greedy feels free to fold a lot of operations into spill/reload instructions as long as the register class allows it In our case, we do not want to fold %0:ercr = LD_reload ... $crSat = COPY %0 into: $crSat = LD_reload ... as this makes generating spill code harder. We would need to bounce through a new GPR anyway.
12067ab
to
43bce7f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
It turns out greedy does quite a lot of "operation folding" to get more compact spill code.
It will turn
into
This is problematic because we cannot directly reload into a
$crSat
register. I actually expected such folding to be illegal, because after all,$crSat
is a reserved register. And usually, reserved registers just mean "don't you dare optimize me". But it's also true that this register is part ofeRCR
(which it needs to be in for some of our copy-propagation extensions to kick in), so I guess what greedy does is fair game.I could change
const TargetRegisterClass *canFoldCopy(...)
to actually never fold into reserved registers, but that causes one X86 test to fail. At this point, I'd rather just constraineRCR
toeR
before greedy, because it's anyway what we want: the classes are equivalent in terms of allocatable registers, andeR
is easy to spill.No QoR impact, which is expected, but you never know.