Implemented X-Wing (ref and avx2) in libjade #104

JoaoDiogoDuarte · 2024-02-11T22:37:02Z

Hi,

I coded a almost-reference X-Wing implementation, the hybrid KEM defined in [draft-connolly-cfrg-xwing-kem](https://datatracker.ietf.org/doc/draft-connolly-cfrg-xwing-kem/ and analysed in https://eprint.iacr.org/2024/039. The almost-reference is because X-Wing technically uses ML-KEM-768, but as it has not been standardized or implemented in libjade (at the time of writing), I utilised Kyber 768.

IMO, two things need to be done, which are:

Code review by the maintainers of libjade as I am relatively new to Jasmin and the code standards for libjade
I am not sure how to generate the checksums for the META.yml file as these are not (obviously) on SUPERCOP yet.
Decide if we wait adding X-Wing to llibjade before or after MLKEM is added to libjade.

tfaoliveira · 2024-02-12T01:58:35Z

Hi! thanks for the contribution!

Wrt testing, I ran a quick test and found some possible issues that I would like you to take a look at before I enable the CI:

libjade/test$ make distclean CI=1; make CI=1 FILTER=../src/crypto_kem/xwing/%; make CI=1 FILTER=../src/crypto_kem/xwing/% reporter

And there are the following 3 problems:

Execution status - checksumsmall: 
ERROR, 2, crypto_kem/xwing/amd64/ref/.ci/checksumsmall.out.error
Execution status - checksumbig: 
ERROR, 2, crypto_kem/xwing/amd64/ref/.ci/checksumbig.out.error
Execution status - memory: 
ERROR, 1, crypto_kem/xwing/amd64/ref/.ci/memory.out.error

Wrt the checksums executables:

libjade/test$ cat bin/crypto_kem/xwing/amd64/ref/.ci/checksumsmall.out.error
111
jade_kem_keypair - secret_key writes after output
libjade/test$ cat bin/crypto_kem/xwing/amd64/ref/.ci/checksumbig.out.error
111
jade_kem_keypair - secret_key writes after output

They don't finish the execution (and print a checksum, even if there is nothing to compare to) and return an error because there are memory safety issues with the current implementation (same tests as done in SUPERCOP). There might be additional information in file libjade/test/bin/crypto_kem/xwing/amd64/ref/memory.out, where the valgrind output can be checked and several errors are reporter.

A good check to do first is to see if the values in src/crypto_kem/xwing/amd64/ref/include/api.h are OK. If there is no issue, then the problem is likely somewhere in the implementation.

Wrt. producing SUPERCOP checksums with other implementations: if there any other implementation (the more the better) that we can use for this purpose (anything that can be linked to C code should be fine). This can be done, for instance, as in https://github.com/tfaoliveira/supercop/commits/dilithium/ or even inside libjade with a bit of patching.

After the testing issues are figured out, we think about code organization: I see that the paper is focused on MLKEM768, but if there are plans for MLKEM1024, one could consider that at this stage. Replacing Kyber by MLKEM should be fine and can be done later. Other x25519 implementations, such as mulx, might be worth trying.

Then, we can move on to code review/performance analysys/code size/etc; Thanks

JoaoDiogoDuarte · 2024-02-12T13:57:30Z

Thanks for the comments and suggestions! (btw nice formatting, I like the line separators, somehow I never thought of doing that for these kinds of things!)

In terms of the memory problem, the issue seems to be that I stupidly load from the stack a value of 134 bytes into a register that's only been malloced 32 bytes.

This complicates things a little as we cannot malloc in Jasmin and I do not want to break the KEM API by asking the user to give us a buffer pointer that has been malloced 134 bytes. This buffer would hold the input which we then pass to SHA3-256. The current SHA3-256 implementation seems to accept reg ptrs of 32 as input, and for any other sizes, it asks for a register as input, which is why I load the 134 bytes from the stack into a register.

The only solution I can think to solve this is rewriting SHA3-256 to accept stack arrays (well, reg ptrs) of 134 bytes.

My source of confusion in rewriting the SHA3-256 code (besides not being very familiar with the internals), is that for the implementation that accepts reg ptr u8[32] in, we have in lines 216-220:

for i=0 to KYBER_SYMBYTES/8 // KYBER_SYMBYTES/8 = 32/8 = 8
{
  t64 = in[u64 i];
  state[u64 i] = t64;
}

and I am not sure how to go about this since 134 does not divide by 8 and in the SHA3-512 implementation, we have have in line 292, state[u64 i] ^= t64; instead of state[u64 i] = t64;.

From what I understand we are loading this u8 reg ptr input into a u64 reg and then loading it into a stack array, but I am not entirely sure how to do this with a number that is not divisible by 8, and then if we also need to XOR the state for larger reg ptr u8[>32] in, as is done in the SHA3-512 implementation.

Any thoughts that may clarify this whole SHA3 problem and if maybe there is a way around?

In terms of the checksum I have a simple C implementation I can use/adapt to get the checksum values, I'll have a look. Some colleagues also have a Go and Rust implementation IIRC, so I can ask them if it's possible to link their implementation to C.

Lastly, I do not believe we have any plans for ML-KEM-1024, which may simplify code organization.

JoaoDiogoDuarte · 2024-02-12T15:53:10Z

I got the SHA3-256_134 implementation working.

Update: The memory issue is fixed, at least I do not get any errors*

*I always seem to get an error file in the memory test, even if I test the Kyber implementation, so it's a bit difficult to say for sure that it's fixed on my end. Even if I completely comment out all the main body in memory.c, I still get a return code of 1. Maybe it's my valgrind version? I still also get the checksum errors, but I will try to insert valid checksums and rerun the test.

JoaoDiogoDuarte · 2024-02-13T12:15:59Z

@tfaoliveira I quickly made a AVX2 implementation and I am trying to implement SHA3-256 with input reg ptr in[134] using AVX2, but I am having some difficulty with processing the input. For example, for the version that accepts a reg ptr in[32], we have the following code in lines 52-63:

state[0] = #VPBROADCAST_4u64(in[u64 0]);

for i=1 to KYBER_SYMBYTES/8
{
  t = in[u64 i];
  l = a_jagged_p[i];
  s_state[(int) l] = t;
}

l = a_jagged_p[KYBER_SYMBYTES/8];
l <<= 3;
s_state[u8 (int)l] = 0x06;

and since I do not fully understand why we are doing certain things, the best I can do is:

state[0] = #VPBROADCAST_4u64(in[u64 0]);

for i=1 to 16
{
 t = in[u64 i];
 l = a_jagged_p[i];
 s_state[(int) l] = t;
}


l = a_jagged_p[17];
l <<= 3;
s_state[u8 (int)l] = 0x06;

which compiles but is clearly wrong because I am not passing the last couple of bytes of the input into state. Do you know of any tips or tricks for this or who I can contact to help me with this, please?

Thanks!

JoaoDiogoDuarte · 2024-02-13T12:30:44Z

Also, sorry if I have not properly addressed the memory bug, the best I can do is guess because I cannot get memcheck to work on my distro as I get an annoying bug when testing the memory of any component of libjade.

Long annoying bug probably of no interest to you, just to demonstrate I am kind of shooting in the dark

  ==29133== Memcheck, a memory error detector
  ==29133== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
  ==29133== Using Valgrind-3.22.0 and LibVEX; rerun with -h for copyright info
  ==29133== Command: ./bin/crypto_kem/xwing/amd64/ref/memory
  ==29133== Parent PID: 29132
  ==29133== 
  
  valgrind:  Fatal error at startup: a function redirection
  valgrind:  which is mandatory for this platform-tool combination
  valgrind:  cannot be set up.  Details of the redirection are:
  valgrind:  
  valgrind:  A must-be-redirected function
  valgrind:  whose name matches the pattern:      strlen
  valgrind:  in an object with soname matching:   ld-linux-x86-64.so.2
  valgrind:  was not found whilst processing
  valgrind:  symbols from the object with soname: ld-linux-x86-64.so.2
  valgrind:  
  valgrind:  Possible fixes: (1, short term): install glibc's debuginfo
  valgrind:  package on this machine.  (2, longer term): ask the packagers
  valgrind:  for your Linux distribution to please in future ship a non-
  valgrind:  stripped ld.so (or whatever the dynamic linker .so is called)
  valgrind:  that exports the above-named function using the standard
  valgrind:  calling conventions for this platform.  The package you need
  valgrind:  to install for fix (1) is called
  valgrind:  
  valgrind:    On Debian, Ubuntu:                 libc6-dbg
  valgrind:    On SuSE, openSuSE, Fedora, RHEL:   glibc-debuginfo
  valgrind:  
  valgrind:  Note that if you are debugging a 32 bit process on a
  valgrind:  64 bit system, you will need a corresponding 32 bit debuginfo
  valgrind:  package (e.g. libc6-dbg:i386).
  valgrind:  
  valgrind:  Cannot continue -- exiting now.  Sorry.

tfaoliveira · 2024-02-20T11:51:21Z

Hi, thanks for pushing.

My current view:

$ make distclean CI=1; make CI=1 FILTER=../src/crypto_kem/xwing/%; make CI=1 FILTER=../src/crypto_kem/xwing/% reporter

Execution status - checksumsmall: 
ERROR, 2, crypto_kem/xwing/amd64/avx2/.ci/checksumsmall.out.error
ERROR, 2, crypto_kem/xwing/amd64/ref/.ci/checksumsmall.out.error
Execution status - checksumbig: 
ERROR, 2, crypto_kem/xwing/amd64/avx2/.ci/checksumbig.out.error
ERROR, 2, crypto_kem/xwing/amd64/ref/.ci/checksumbig.out.error
Execution status - printparams: 
OK, 0, crypto_kem/xwing/amd64/avx2/.ci/printparams.out.log
OK, 0, crypto_kem/xwing/amd64/ref/.ci/printparams.out.log
Execution status - functest: 
OK, 0, crypto_kem/xwing/amd64/avx2/.ci/functest.out.log
OK, 0, crypto_kem/xwing/amd64/ref/.ci/functest.out.log
Execution status - safetyparams: 
OK, 0, crypto_kem/xwing/amd64/avx2/.ci/safetyparams.out.log
OK, 0, crypto_kem/xwing/amd64/ref/.ci/safetyparams.out.log
Execution status - memory: 
ERROR, 1, crypto_kem/xwing/amd64/avx2/.ci/memory.out.error
ERROR, 1, crypto_kem/xwing/amd64/ref/.ci/memory.out.error

Next (I see that there are unused variables warnings, it would be nice to fixe those):

$ make CI=1 distclean; make JFLAGS=-g bin/crypto_kem/xwing/amd64/ref/checksumsmall.stdout
(...)
jade_kem_keypair - secret_key writes after output

$ make CI=1 distclean; make JFLAGS=-g bin/crypto_kem/xwing/amd64/ref/memory.out
$ cat bin/crypto_kem/xwing/amd64/ref/memory.out | grep "at 0x" | grep "jinc" | cut -d'?' -f4 | sort -u
(kem.jinc:31)
$ mv bin/crypto_kem/xwing/amd64/ref/memory.out bin/crypto_kem/xwing/amd64/ref/memory.txt # to attach here

Looking at line 31 from the corresponding file:

    for i=0 to X25519_PUBLICKEYBYTES {
        [skp + i] = [pkp + i];
    }

This might be missing a couple of (u8): (u8)[skp + i] = (u8)[pkp + i]; Please check the remaining code for this type of issue. Note: I usually prefer to use a temporary register to perform such copies, but we can check for this later in the review. Thanks

Note 2, until 4th of March I'm mostly offline as I'm on holiday, but I will drop by now and then

The valgrind output:
memory.txt

JoaoDiogoDuarte · 2024-02-20T13:33:22Z

Hi, thanks for the really helpful feedback and for the valgrind output! I've fixed that loop you mentioned and with a bit of embarrassment, I admit that I actually had the wrong value for the size of the ciphertext.......that is why the checksum was failing........................................I've fixed that.... (#dumbdumbtime).

I removed the unused variables r in the scalarmult.jinc file. Now the remaining unused files seem to be from Kyber.

After implementing the SHA3 function I was talking about, on my machine, after running:

make distclean CI=1; make CI=1 FILTER=../src/crypto_kem/xwing/%; make CI=1 FILTER=../src/crypto_kem/xwing/% reporter

I get:

Execution status - checksumsmall: 
OK, 0, crypto_kem/xwing/amd64/avx2/.ci/checksumsmall.out.log
OK, 0, crypto_kem/xwing/amd64/ref/.ci/checksumsmall.out.log
Execution status - checksumbig: 
OK, 0, crypto_kem/xwing/amd64/avx2/.ci/checksumbig.out.log
OK, 0, crypto_kem/xwing/amd64/ref/.ci/checksumbig.out.log
Execution status - printparams: 
OK, 0, crypto_kem/xwing/amd64/avx2/.ci/printparams.out.log
OK, 0, crypto_kem/xwing/amd64/ref/.ci/printparams.out.log
Execution status - functest: 
OK, 0, crypto_kem/xwing/amd64/avx2/.ci/functest.out.log
OK, 0, crypto_kem/xwing/amd64/ref/.ci/functest.out.log
Execution status - safetyparams: 
OK, 0, crypto_kem/xwing/amd64/avx2/.ci/safetyparams.out.log
OK, 0, crypto_kem/xwing/amd64/ref/.ci/safetyparams.out.log
Execution status - memory: 
ERROR, 1, crypto_kem/xwing/amd64/avx2/.ci/memory.out.error
ERROR, 1, crypto_kem/xwing/amd64/ref/.ci/memory.out.error
make[1]: Leaving directory '/home/joao/Projects/libjade/test'
make reporter_checksums
make[1]: Entering directory '/home/joao/Projects/libjade/test'
./scripts/checksumsok
./../scripts/ci/reporter/jlog "Checksums status" test/bin checksum*.ok "1";
Checksums status: 
OK, 0, crypto_kem/xwing/amd64/avx2/.ci/checksumbig.ok.log
OK, 0, crypto_kem/xwing/amd64/avx2/.ci/checksumsmall.ok.log
OK, 0, crypto_kem/xwing/amd64/ref/.ci/checksumbig.ok.log
OK, 0, crypto_kem/xwing/amd64/ref/.ci/checksumsmall.ok.log

This is great! The memory error is still there simply because valgrind does not run on my machine, so maybe it's gone?

Have a good holiday!

tfaoliveira · 2024-02-21T22:33:19Z

Cool! I enabled the CI; Were you able to make some progress on getting the checksums from different implementations?

JoaoDiogoDuarte · 2024-02-21T22:42:42Z

The only implementations in C is the one the X-Wing team and I have been working on. I can try with this one!

The current checksums were obtained from running the checksumbig and checksumsmall scripts.

Besides that, there is one in Rust and this one in Go, but I genuinely do not know if I can use supercop and either languages.

tfaoliveira · 2024-02-21T22:44:14Z

The CI failure is related with the jasmin compiler that is being used (no support for spill), which is set in scripts/ci/config/jasmin

If the CI for this PR goes well: #106
Then current libjade is compatible with latest Jasmin
and you can, for instance, cherry-pick that commit

JoaoDiogoDuarte · 2024-02-21T22:53:01Z

All done!

tfaoliveira · 2024-02-21T23:13:41Z

The only implementations in C is the one the X-Wing team and I have been working on. I can try with this one!

Cool!

The current checksums were obtained from running the checksumbig and checksumsmall scripts.

Besides that, there is one in Rust and this one in Go, but I genuinely do not know if I can use supercop and either languages.

This can also be done in libjade (but not to be pushed to main):

create a directory for the implementation (just as you did for the ref/ and avx2/ xwing)
likely there will be a need for some C binding file that, for instance, defines the expected functions (the ones with the long names) and call the Go/Rust functions (or not).
in libjade/test/Makefile, before the following rule:

%/checksumbig: __phony | %/ %/$(CID)
	$(MAKE) -C $(IDIR) || true
	$(CIC)
	$(COMPILE) || true

add:

bin/crypto_kem/xwing/amd64/rust_ref/checksumbig: __phony | %/ %/$(CID)
	$(MAKE) -C $(IDIR) || true
	$(CIC)
	$(COMPILE) || true

where the COMPILE needs to be adjusted for your particular case: for instance, instead of $(ASM) (check compile variable) you set what you need

Ideally, this should be done in a way that it is easy to reproduce (if you push this to some branch, I can check it)
Thanks

JoaoDiogoDuarte · 2024-02-27T17:09:51Z

After messing around with supercop for a while and pulling out my hair because the checksums of the libjade vs C implementation did not match, I then realized that the C implementation uses ML-KEM and the libjade one uses Kyber...

Because of this, is it worth collecting more checksums or should I fork the C implementation to use Kyber instead of ML-KEM? The issue of using Kyber is that the resulting system is technically not X-Wing anymore.

JoaoDiogoDuarte · 2024-02-28T17:47:00Z

Ok, I have implemented X-Wing using Kyber768 in a separate branch of the C repo I mentioned and put all of it into supercop in my fork. The checksums for both implementations match, so when ML-KEM is available in libjade, I am sure they will also match!

JoaoDiogoDuarte and others added 23 commits January 17, 2024 00:12

Basic file layout for xwing

a1a6175

More boilerplate (may be incorrect at this stage)

b75edac

More boilerplate

b5f8f07

Even more boilerplate and params

cdfb876

Added fromstack64u8

4b1629a

Beginnings of xwing keygen

40090b9

Merge branch 'formosa-crypto:main' into main

e142b0c

keygen and enc work without label

f03dc79

Seems to work with label

79593bc

XWing works (untested, but compiles)

2e5770f

X-Wing implemented but either enc or dec does not work properly

7c2e6aa

Buggy but almost done version of xwing

7b108a8

Why does this not work I am crying pls

b142c88

fix

f475346

enc compiling (not tested)

9dd7105

Rewriten enc to use spills and unspills

e374602

dec compiles (untested)

d5307ad

Rewrote dec with spills

dea5e0c

readded return address

d8cd375

Removed commented code from functest

5e88eb3

Added gitkeep to xwing proof folder

17be507

Added .vscode to gitignore and removed .vscode from cache

fdaf05d

Renamed variables to be in line with rest of libjade

c8c59f5

Removed useless spill

4a0e8de

JoaoDiogoDuarte added 3 commits February 12, 2024 15:58

SHA3-256_134 implemented

a5c25af

Forgot to load result of sha to shared key

56cf808

AVX2 with faulty sha avx2 function

8bd006a

JoaoDiogoDuarte changed the title ~~Implemented X-Wing (ref) in libjade~~ Implemented X-Wing (ref and avx2) in libjade Feb 13, 2024

JoaoDiogoDuarte added 3 commits February 20, 2024 12:05

removed unused r variable

57b8754

Removed another unused var

6333226

Potentially fixed memory issue

202f10a

JoaoDiogoDuarte added 2 commits February 20, 2024 13:42

Added checksums, fixed bad param and cleaned up code

242496f

SHA3 accepting blocks of reg ptr of size 34 implemented

b87f78e

update the jasmin compiler from gitlab

131c586

Merge branch 'formosa-crypto:main' into main

c9fc954

libjade

e636ec9

JoaoDiogoDuarte force-pushed the main branch from 46c4235 to e636ec9 Compare March 19, 2024 18:39

tfaoliveira changed the base branch from main to feature/xwing April 16, 2024 08:07

tfaoliveira merged commit d144719 into formosa-crypto:feature/xwing Apr 16, 2024
5 of 6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implemented X-Wing (ref and avx2) in libjade #104

Implemented X-Wing (ref and avx2) in libjade #104

JoaoDiogoDuarte commented Feb 11, 2024

tfaoliveira commented Feb 12, 2024

JoaoDiogoDuarte commented Feb 12, 2024

JoaoDiogoDuarte commented Feb 12, 2024 •

edited

Loading

JoaoDiogoDuarte commented Feb 13, 2024 •

edited

Loading

JoaoDiogoDuarte commented Feb 13, 2024 •

edited

Loading

tfaoliveira commented Feb 20, 2024

JoaoDiogoDuarte commented Feb 20, 2024 •

edited

Loading

tfaoliveira commented Feb 21, 2024

JoaoDiogoDuarte commented Feb 21, 2024 •

edited

Loading

tfaoliveira commented Feb 21, 2024

JoaoDiogoDuarte commented Feb 21, 2024

tfaoliveira commented Feb 21, 2024 •

edited

Loading

JoaoDiogoDuarte commented Feb 27, 2024

JoaoDiogoDuarte commented Feb 28, 2024

Implemented X-Wing (ref and avx2) in libjade #104

Implemented X-Wing (ref and avx2) in libjade #104

Conversation

JoaoDiogoDuarte commented Feb 11, 2024

tfaoliveira commented Feb 12, 2024

JoaoDiogoDuarte commented Feb 12, 2024

JoaoDiogoDuarte commented Feb 12, 2024 • edited Loading

JoaoDiogoDuarte commented Feb 13, 2024 • edited Loading

JoaoDiogoDuarte commented Feb 13, 2024 • edited Loading

tfaoliveira commented Feb 20, 2024

JoaoDiogoDuarte commented Feb 20, 2024 • edited Loading

tfaoliveira commented Feb 21, 2024

JoaoDiogoDuarte commented Feb 21, 2024 • edited Loading

tfaoliveira commented Feb 21, 2024

JoaoDiogoDuarte commented Feb 21, 2024

tfaoliveira commented Feb 21, 2024 • edited Loading

JoaoDiogoDuarte commented Feb 27, 2024

JoaoDiogoDuarte commented Feb 28, 2024

JoaoDiogoDuarte commented Feb 12, 2024 •

edited

Loading

JoaoDiogoDuarte commented Feb 13, 2024 •

edited

Loading

JoaoDiogoDuarte commented Feb 13, 2024 •

edited

Loading

JoaoDiogoDuarte commented Feb 20, 2024 •

edited

Loading

JoaoDiogoDuarte commented Feb 21, 2024 •

edited

Loading

tfaoliveira commented Feb 21, 2024 •

edited

Loading