You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The paper gives a count of 27 XOR, 32 AND and 16 OR instructions on top of 16 circular and 32 logical shifts, where the ANDs, ORs and shifts come from 16 instances of the form ROR(BYTE_ROR_n, m) (per your C code in this repo.).
Actually in the rust code, we use just 32 circular shifts, 2 for each rotate_rows_and_columns_m_n call.
Essentially the outer ROR call is merged into the two shifts inside BYTE_ROR_n (which become circular shifts). Assuming ROR is converted by the compiler to a single rotate instruction, then there are 16 instructions to be saved here.
I'm not sure this would make much difference where there's a barrel shifter, but for the general case it may be worth reporting.
The text was updated successfully, but these errors were encountered:
Indeed one can use 2 circular shifts instead of 1 circular shift + 2 logical shifts.
As you suggest, it would have no impact on ARM because of the barrel shifter but might lead to improvements on other platforms.
On the other hand, it can also lead to a performance decrease if no rotate instruction is available (as in the RV32I instruction set we considered in our paper for benchmarks on RISC-V microcontroller).
Anyway I agree it is worth clearly mentioning it in the code/paper for the sake of completeness. I will take care of it.
Hi @aadomn,
Recalling this observation and your paper, specifically Figure 6 and the following paragraph on page 8.
The paper gives a count of 27 XOR, 32 AND and 16 OR instructions on top of 16 circular and 32 logical shifts, where the ANDs, ORs and shifts come from 16 instances of the form
ROR(BYTE_ROR_n, m)
(per your C code in this repo.).Actually in the rust code, we use just 32 circular shifts, 2 for each rotate_rows_and_columns_m_n call.
Essentially the outer ROR call is merged into the two shifts inside BYTE_ROR_n (which become circular shifts). Assuming ROR is converted by the compiler to a single rotate instruction, then there are 16 instructions to be saved here.
I'm not sure this would make much difference where there's a barrel shifter, but for the general case it may be worth reporting.
The text was updated successfully, but these errors were encountered: