You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Having gotten hold of a box with a Zhaoxin KX-6580 CPU (Chinese x86 cpu vendor; formed as a joint venture of VIA and Shanghai; their designs are mostly a continuation of the VIA C3/C7/Nano series cores, mainly for the Chinese market but have started showing up elsewhere) I decided to do a whole bunch of testing on its PadLock functionality - and in doing so, I've made a number of findings of various undocumented and underdocumented features. The ones most relevant for disassembly tools like, say, Zydis, so far appear to be:
The rep montmul instruction takes, much to my surprise, a mandatory 67h address size prefix in 64-bit mode (!!). This is observed by the sequence f3 0f a6 c0 consistently producing an #UD exception, while something like f3 67 0f a6 c0 does not. The issue appears to be that rep montmul takes a pointer in rSI to a data structure that contains 5 pointers to various buffers needed by this instruction - this data structure does not appear to have ever been updated to work with 64-bit pointers, and so the 67h prefix is needed to force 32-bit addressing for the instruction. This makes the instruction fairly inconvenient to set up, since it becomes necessary to make sure that this structure and all its buffers reside in the bottom 4GB of virtual address space, but once that is done, the instruction variant with the 67h prefix (but not without) will execute a Montgomery multiply just fine.
The instruction encoding f3 0f a6 e0 is a seemingly undocumented instruction to accelerate SHA-512 hashing. In my testing, it appears to take the following arguments:
rCX = number of 128-byte blocks to hash
ES:rSI = pointer to source data
ES:rDI = pointer to a 64-byte digest to update
I haven't been able to find this instruction documented anywhere, but OpenSSL clearly knows about it (see https://github.com/openssl/openssl/blob/master/engines/asm/e_padlock-x86.pl , line 597), referring to it as rep xsha512. The instruction encoding f3 0f a6 d8 also appears to be an alias of this instruction.
The instruction encoding f3 0f a6 e8 is a Zhaoxin-specific "GMI" instruction: ccs_hash. This instruction is documented ( https://github.com/ZXOpenSource/OpenSSL-ZX-GMI/blob/master/GMI%20User%20Manual%20V1.0.pdf - in Chinese, but gets pretty readable after a trip through google translate) to provide support for the Chinese SM3 hashing algorithm - in my testing, it also provides undocumented support for SHA-1/256/512 that can be obtained by setting rBX to values in the range 0x10 to 0x15.
The instruction encoding f3 0f a7 f0 is another Zhaoxin-specific "GMI" instruction: ccs_encrypt. This instruction is documented to provide support for the Chinese SM4 encryption algorithm - it also provides undocumented support for AES-128/192/256 that can be obtained by setting rAX to values in the range 0x10 to 0x15.
The instruction encodings f3 0f a6 f0 and f3 0f a6 f8 are undocumented and I haven't been able to figure out what they might do. They produce a #GP exception for all sorts of arguments I've been trying to pass them, suggesting that they either expect a really odd input data format or are privileged instructions.
At least on this specifc CPU, the xstore instruction accepts the repne prefix, and treats it as a synonym for rep - f2 0f a7 c0 produces the same output as I would expect from rep xstoref3 0f a7 c0. None of the other Padlock instructions accept this prefix (#UD). The instruction encoding f3 0f a7 f8 appears to be an alias of rep xstore, however it doesn't accept repne.
From what I can find, all of the instructions in the Padlock space (0f a6 c0-ff and 0f a7 c0-ff) exhibit partial decode, where the bottom 3 bits of the last byte of the instruction are ignored - e.g. f3 0f a7 f7 is accepted as a valid instruction and behaves identically to f3 0f a7 f0.
The text was updated successfully, but these errors were encountered:
I've done a bit more testing, and made a few more minor findings:
rep montmul, in addition to lacking support for 64-bit addressing, also appears to lack support for 16-bit addressing. As such, the instruction requires the 67h address override prefix in 16-bit mode, or else it will #UD. (This includes real mode). Conversely, in 32-bit mode, the 67h prefix is not allowed and causes #UD if used.
rep montmul takes, in ES:ESI, a pointer to a data structure. Zydis currently reports this as a 4-byte memory operand; its actual accessed size (as measured by placing it next to an unmapped memory page) is 24 bytes.
Many of the Padlock instructions are officially documented as causing an Invalid Instruction Exception (#UD) if the operand size prefix 66h is used. This does not check out in my testing - I've been able to get every Padlock instruction to run with the 66h prefix - it does not appear to have any discernible effect on the execution of any of them.
Having gotten hold of a box with a Zhaoxin KX-6580 CPU (Chinese x86 cpu vendor; formed as a joint venture of VIA and Shanghai; their designs are mostly a continuation of the VIA C3/C7/Nano series cores, mainly for the Chinese market but have started showing up elsewhere) I decided to do a whole bunch of testing on its PadLock functionality - and in doing so, I've made a number of findings of various undocumented and underdocumented features. The ones most relevant for disassembly tools like, say, Zydis, so far appear to be:
The
rep montmul
instruction takes, much to my surprise, a mandatory67h
address size prefix in 64-bit mode (!!). This is observed by the sequencef3 0f a6 c0
consistently producing an #UD exception, while something likef3 67 0f a6 c0
does not. The issue appears to be thatrep montmul
takes a pointer in rSI to a data structure that contains 5 pointers to various buffers needed by this instruction - this data structure does not appear to have ever been updated to work with 64-bit pointers, and so the 67h prefix is needed to force 32-bit addressing for the instruction. This makes the instruction fairly inconvenient to set up, since it becomes necessary to make sure that this structure and all its buffers reside in the bottom 4GB of virtual address space, but once that is done, the instruction variant with the 67h prefix (but not without) will execute a Montgomery multiply just fine.The instruction encoding
f3 0f a6 e0
is a seemingly undocumented instruction to accelerate SHA-512 hashing. In my testing, it appears to take the following arguments:I haven't been able to find this instruction documented anywhere, but OpenSSL clearly knows about it (see https://github.com/openssl/openssl/blob/master/engines/asm/e_padlock-x86.pl , line 597), referring to it as
rep xsha512
. The instruction encodingf3 0f a6 d8
also appears to be an alias of this instruction.The instruction encoding
f3 0f a6 e8
is a Zhaoxin-specific "GMI" instruction:ccs_hash
. This instruction is documented ( https://github.com/ZXOpenSource/OpenSSL-ZX-GMI/blob/master/GMI%20User%20Manual%20V1.0.pdf - in Chinese, but gets pretty readable after a trip through google translate) to provide support for the Chinese SM3 hashing algorithm - in my testing, it also provides undocumented support for SHA-1/256/512 that can be obtained by setting rBX to values in the range 0x10 to 0x15.The instruction encoding
f3 0f a7 f0
is another Zhaoxin-specific "GMI" instruction:ccs_encrypt
. This instruction is documented to provide support for the Chinese SM4 encryption algorithm - it also provides undocumented support for AES-128/192/256 that can be obtained by setting rAX to values in the range 0x10 to 0x15.The instruction encodings
f3 0f a6 f0
andf3 0f a6 f8
are undocumented and I haven't been able to figure out what they might do. They produce a #GP exception for all sorts of arguments I've been trying to pass them, suggesting that they either expect a really odd input data format or are privileged instructions.At least on this specifc CPU, the
xstore
instruction accepts therepne
prefix, and treats it as a synonym forrep
-f2 0f a7 c0
produces the same output as I would expect fromrep xstore
f3 0f a7 c0
. None of the other Padlock instructions accept this prefix (#UD). The instruction encodingf3 0f a7 f8
appears to be an alias ofrep xstore
, however it doesn't acceptrepne
.From what I can find, all of the instructions in the Padlock space (
0f a6 c0-ff
and0f a7 c0-ff
) exhibit partial decode, where the bottom 3 bits of the last byte of the instruction are ignored - e.g.f3 0f a7 f7
is accepted as a valid instruction and behaves identically tof3 0f a7 f0
.The text was updated successfully, but these errors were encountered: