Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit operand to a set of physical registers #107

Open
KarelPeeters opened this issue Jan 9, 2023 · 5 comments
Open

Limit operand to a set of physical registers #107

KarelPeeters opened this issue Jan 9, 2023 · 5 comments

Comments

@KarelPeeters
Copy link

KarelPeeters commented Jan 9, 2023

It seems like there are currently two options when expressing which registers an Operand can use:

  • use reg_def/reg_use/..., in this case the operand can be allocated to any physical register.
  • use reg_fixed_def/reg_fixed_use, in this case the operand can only be a single physical register.

This does not seem to cover the case where only of a limited set of physical registers can be used as an operand.

This is often the case on x86 (32-bit), where some of the general purpose registers don't expose their lower 8 bits as a pseudoregister. This means that setcc and other instructions dealing with a byte can only use the first 4 registers, not eg. esi.

My current workaround is to always use fixed operands for these instructions, but that is a bit too limiting. The other solution is to completely remove the additional registers from the MachineEnv, but that has even bigger drawbacks. Is there a better solution? It seems none of the ISAs supported by wasmtime have this issue, so I can't steal any ideas there.

@cfallin
Copy link
Member

cfallin commented Jan 17, 2023

This is an interesting requirement, and it would certainly be useful, I agree!

It seems technically feasible-ish, and the general solution would be to (i) modify the Requirement lattice to have a notion of register sets (with the appropriate meet function, namely set-intersection), that sublattice embedded in the overall lattice; and (ii) modify the traversal over candidate registers to visit only those in the set.

There may be some subtle forward-progress issues with that; I'm not sure and would have to think a bit more. Also there are potentially interactions with the way that multiple conflicting requirements on one vreg at one time are handled (the "multi-fixed-reg fixup" logic).

Anyway, I invite anyone to experiment with this; I don't have the bandwidth or motivating use-case (from things I'm tasked to work on) to be able to spend time on it unfortunately but I'm happy to point to the appropriate parts of the implementation and/or review code!

@KarelPeeters
Copy link
Author

One issue I came across when thinking about this some more is the size of Operand, which currently uses bit-packing to fit everything into 32 bits. If a limited set of registers is added as an option, that basically requires adding a PRegSet-sized amount of information, which is 128 bits. Maybe it's not that bad since we only need to support one register class, but it's still going to increase the size of Operand (and related types) by a lot.

Another solution is to only allow a predefined set of register sets to be used as defined in MachineEnv, which can then be referred to with a short index.

I'll think about this some more, and look into the internals of regalloc2 a bit so see how much work this feature ends up being.

@afonso360
Copy link
Contributor

Another use case for this would be supporting the RISCV C extension where instructions can only reference 8 registers (either integer or floating point) instead of the 32 available in the regular ISA.

@KarelPeeters
Copy link
Author

Actually this could completely replace register classes (and fix #47), just annotate each def/usage with the right set of physical registers. Of course that still requires figuring out the bit packing. And maybe it's not very efficient since this would artificially entangle the integer and float register allocation problems.

@cfallin
Copy link
Member

cfallin commented Jan 17, 2023

In the end it's possible we might need to grow Operand slightly, though we should be careful doing so. It might be possible to experiment with e.g. 40 bits by keeping separate arrays of u32 and u8 (this is extraordinarily ugly, yes, but we can wrap it in accessors... and it would lead to better code than a multiply-by-5 packed array indexing). One could put lesser-accessed bits in the u8 part, maybe. Then maybe we could have a notion of a fixed number of subsets of all registers, indexed by a few bits (4? 8?). This is functionally equivalent to the "overlapping register classes" idea that many other allocators have.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants