Guest FPU state not accessible #3

blitz · 2013-02-05T15:51:43Z

Currently FPU/SSE/AVX/... instructions in the guest can not be emulated by the VMM, as NOVA does not export vCPU FPU state.

udosteinberg · 2013-02-05T17:38:33Z

Would you prefer the FPU state being transferred to the VMM in the UTCB or in the FPU registers themselves?

blitz · 2013-02-05T18:19:19Z

Regarding dumping it into the UTCB: Would you use the FXSAVE/XSAVE layout and the userland would have to figure out the size of it by itself (CPUID)?

Leaving it in the FPU has the advantage that if we compile vancouver without FPU support, we can just leave it there until we need it or call FXSAVE/XSAVE ourselves to dump it to memory. This might be a more generic solution than dumping it into the UTCB. Together with an MTD_FPU flag to trigger this, I would lean towards this solution.

Perhaps @vmmon has an opinion about that?

On a related note: It might be worthwhile to discard the current EC's FPU state on REPLY. Since GCC tends to spray SSE instructions all over the place nowadays, this could reduce FPU switching without hurting anyone. No I haven't benchmarked this. :)

udosteinberg · 2013-02-05T22:09:11Z

Regarding dumping it into the UTCB: Would you use the FXSAVE/XSAVE layout and the userland would have to figure out the size of it by itself (CPUID)?

The size could be included in the number of untyped items.

I'd prefer for MTD_FPU to always either transmit the state in the UTCB or in the FPU registers, and not making both behaviors configurable in the portal, because otherwise there are just too many different combinations. I'm leaning towards passing the state live in the FPU, because that's cheapest from a performance perspective and if the VMM wants to use the FPU itself, it could always do an fxsave early in the portal function.

The other thing you're suggesting is a kind of MTD_DROP_FPU flag, that tells the hypervisor that you aren't interested in your FPU state anymore? For consistency reasons that would not only apply when you are the current FPU owner, but it would also deallocate any previously saved FPU state.

blitz · 2013-02-05T22:49:48Z

We are on the same page with having only One Way To Do It™. Is it possible to make the exception handler execute on the faulting threads FPU state when MTD_FPU is set, in effect reducing the job of the hypervisor to some pointer bending? If that is the case, I am all for this solution. How do you want to handle the exception handler's own FPU state during this time? Is it just "shadowed" or destroyed? To be clear, I am perfectly fine with both semantics, just wanted to know.

Regarding the point of dropping FPU state on reply, I'll open another issue for discussion, because it is not directly related to this and just confusing the discussion. Sorry.

vmmon · 2013-02-06T15:00:39Z

On Tue, Feb 05, 2013 at 02:09:13PM -0800, Udo Steinberg wrote:

 Regarding dumping it into the UTCB: Would you use the FXSAVE/XSAVE
 layout and the userland would have to figure out the size of it by
 itself (CPUID)?
The size could be included in the number of untyped items.

I'd prefer for MTD_FPU to always either transmit the state in the UTCB or
in the FPU registers, and not making both behaviors configurable in the
portal, because otherwise there are just too many different combinations.

Transfering the state in the UTCB is a safe, but probably slow solution. If
the state is in the FPU, an xsave is required. If the state is inactive, a
(faster) memcpy can be used. If a handler does not need the FPU state, it would
not specify the MTD_FPU flag in its portal.

It is much faster if the FPU state can be transfered inline, because in the best
case the HV can just exchange the FPU context between sender and receiver.
However, the handler needs to make sure it does not need the FPU itself, neither
in its own code nor in any exception handler or library code he calls. The last
part is especially critical with any third party code. Furthermore, solving this
problem by saving the (inactive) FPU on the respective portal entry is much slower
than doing the same inside the kernel, as one pays an exception and two xsaves
instead of a single memcpy.

Thus transfering the state inline is a nice optimization in the special case,
but slower and more complex in the general case. I would prefer the UTCB version,
especially since we do not have any use-case that needs fast FPU access, except
for improving IPC throughput by keeping the words in FPU registers...

blitz mentioned this issue Feb 5, 2013

FPU/SSE support TUD-OS/seoul#6

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Guest FPU state not accessible #3

Guest FPU state not accessible #3

blitz commented Feb 5, 2013

udosteinberg commented Feb 5, 2013

blitz commented Feb 5, 2013

udosteinberg commented Feb 5, 2013

blitz commented Feb 5, 2013

vmmon commented Feb 6, 2013

Guest FPU state not accessible #3

Guest FPU state not accessible #3

Comments

blitz commented Feb 5, 2013

udosteinberg commented Feb 5, 2013

blitz commented Feb 5, 2013

udosteinberg commented Feb 5, 2013

blitz commented Feb 5, 2013

vmmon commented Feb 6, 2013