Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implemented X-Wing (ref and avx2) in libjade #104

Merged
merged 35 commits into from
Apr 16, 2024

Conversation

JoaoDiogoDuarte
Copy link
Collaborator

Hi,

I coded a almost-reference X-Wing implementation, the hybrid KEM defined in [draft-connolly-cfrg-xwing-kem](https://datatracker.ietf.org/doc/draft-connolly-cfrg-xwing-kem/ and analysed in https://eprint.iacr.org/2024/039. The almost-reference is because X-Wing technically uses ML-KEM-768, but as it has not been standardized or implemented in libjade (at the time of writing), I utilised Kyber 768.

IMO, two things need to be done, which are:

  • Code review by the maintainers of libjade as I am relatively new to Jasmin and the code standards for libjade
  • I am not sure how to generate the checksums for the META.yml file as these are not (obviously) on SUPERCOP yet.
  • Decide if we wait adding X-Wing to llibjade before or after MLKEM is added to libjade.

@tfaoliveira
Copy link
Member

Hi! thanks for the contribution!

Wrt testing, I ran a quick test and found some possible issues that I would like you to take a look at before I enable the CI:

libjade/test$ make distclean CI=1; make CI=1 FILTER=../src/crypto_kem/xwing/%; make CI=1 FILTER=../src/crypto_kem/xwing/% reporter

And there are the following 3 problems:

Execution status - checksumsmall: 
ERROR, 2, crypto_kem/xwing/amd64/ref/.ci/checksumsmall.out.error
Execution status - checksumbig: 
ERROR, 2, crypto_kem/xwing/amd64/ref/.ci/checksumbig.out.error
Execution status - memory: 
ERROR, 1, crypto_kem/xwing/amd64/ref/.ci/memory.out.error

Wrt the checksums executables:

libjade/test$ cat bin/crypto_kem/xwing/amd64/ref/.ci/checksumsmall.out.error
111
jade_kem_keypair - secret_key writes after output
libjade/test$ cat bin/crypto_kem/xwing/amd64/ref/.ci/checksumbig.out.error
111
jade_kem_keypair - secret_key writes after output

They don't finish the execution (and print a checksum, even if there is nothing to compare to) and return an error because there are memory safety issues with the current implementation (same tests as done in SUPERCOP). There might be additional information in file libjade/test/bin/crypto_kem/xwing/amd64/ref/memory.out, where the valgrind output can be checked and several errors are reporter.

A good check to do first is to see if the values in src/crypto_kem/xwing/amd64/ref/include/api.h are OK. If there is no issue, then the problem is likely somewhere in the implementation.


Wrt. producing SUPERCOP checksums with other implementations: if there any other implementation (the more the better) that we can use for this purpose (anything that can be linked to C code should be fine). This can be done, for instance, as in https://github.com/tfaoliveira/supercop/commits/dilithium/ or even inside libjade with a bit of patching.


After the testing issues are figured out, we think about code organization: I see that the paper is focused on MLKEM768, but if there are plans for MLKEM1024, one could consider that at this stage. Replacing Kyber by MLKEM should be fine and can be done later. Other x25519 implementations, such as mulx, might be worth trying.

Then, we can move on to code review/performance analysys/code size/etc; Thanks

@JoaoDiogoDuarte
Copy link
Collaborator Author

Thanks for the comments and suggestions! (btw nice formatting, I like the line separators, somehow I never thought of doing that for these kinds of things!)


In terms of the memory problem, the issue seems to be that I stupidly load from the stack a value of 134 bytes into a register that's only been malloced 32 bytes.

This complicates things a little as we cannot malloc in Jasmin and I do not want to break the KEM API by asking the user to give us a buffer pointer that has been malloced 134 bytes. This buffer would hold the input which we then pass to SHA3-256. The current SHA3-256 implementation seems to accept reg ptrs of 32 as input, and for any other sizes, it asks for a register as input, which is why I load the 134 bytes from the stack into a register.

The only solution I can think to solve this is rewriting SHA3-256 to accept stack arrays (well, reg ptrs) of 134 bytes.

My source of confusion in rewriting the SHA3-256 code (besides not being very familiar with the internals), is that for the implementation that accepts reg ptr u8[32] in, we have in lines 216-220:

for i=0 to KYBER_SYMBYTES/8 // KYBER_SYMBYTES/8 = 32/8 = 8
{
  t64 = in[u64 i];
  state[u64 i] = t64;
}

and I am not sure how to go about this since 134 does not divide by 8 and in the SHA3-512 implementation, we have have in line 292, state[u64 i] ^= t64; instead of state[u64 i] = t64;.

From what I understand we are loading this u8 reg ptr input into a u64 reg and then loading it into a stack array, but I am not entirely sure how to do this with a number that is not divisible by 8, and then if we also need to XOR the state for larger reg ptr u8[>32] in, as is done in the SHA3-512 implementation.

Any thoughts that may clarify this whole SHA3 problem and if maybe there is a way around?


In terms of the checksum I have a simple C implementation I can use/adapt to get the checksum values, I'll have a look. Some colleagues also have a Go and Rust implementation IIRC, so I can ask them if it's possible to link their implementation to C.


Lastly, I do not believe we have any plans for ML-KEM-1024, which may simplify code organization.

@JoaoDiogoDuarte
Copy link
Collaborator Author

JoaoDiogoDuarte commented Feb 12, 2024

I got the SHA3-256_134 implementation working.

Update: The memory issue is fixed, at least I do not get any errors*

*I always seem to get an error file in the memory test, even if I test the Kyber implementation, so it's a bit difficult to say for sure that it's fixed on my end. Even if I completely comment out all the main body in memory.c, I still get a return code of 1. Maybe it's my valgrind version? I still also get the checksum errors, but I will try to insert valid checksums and rerun the test.

@JoaoDiogoDuarte
Copy link
Collaborator Author

JoaoDiogoDuarte commented Feb 13, 2024

@tfaoliveira I quickly made a AVX2 implementation and I am trying to implement SHA3-256 with input reg ptr in[134] using AVX2, but I am having some difficulty with processing the input. For example, for the version that accepts a reg ptr in[32], we have the following code in lines 52-63:

state[0] = #VPBROADCAST_4u64(in[u64 0]);

for i=1 to KYBER_SYMBYTES/8
{
  t = in[u64 i];
  l = a_jagged_p[i];
  s_state[(int) l] = t;
}

l = a_jagged_p[KYBER_SYMBYTES/8];
l <<= 3;
s_state[u8 (int)l] = 0x06;

and since I do not fully understand why we are doing certain things, the best I can do is:

state[0] = #VPBROADCAST_4u64(in[u64 0]);

for i=1 to 16
{
 t = in[u64 i];
 l = a_jagged_p[i];
 s_state[(int) l] = t;
}


l = a_jagged_p[17];
l <<= 3;
s_state[u8 (int)l] = 0x06;

which compiles but is clearly wrong because I am not passing the last couple of bytes of the input into state. Do you know of any tips or tricks for this or who I can contact to help me with this, please?

Thanks!

@JoaoDiogoDuarte
Copy link
Collaborator Author

JoaoDiogoDuarte commented Feb 13, 2024

Also, sorry if I have not properly addressed the memory bug, the best I can do is guess because I cannot get memcheck to work on my distro as I get an annoying bug when testing the memory of any component of libjade.

Long annoying bug probably of no interest to you, just to demonstrate I am kind of shooting in the dark
  ==29133== Memcheck, a memory error detector
  ==29133== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
  ==29133== Using Valgrind-3.22.0 and LibVEX; rerun with -h for copyright info
  ==29133== Command: ./bin/crypto_kem/xwing/amd64/ref/memory
  ==29133== Parent PID: 29132
  ==29133== 
  
  valgrind:  Fatal error at startup: a function redirection
  valgrind:  which is mandatory for this platform-tool combination
  valgrind:  cannot be set up.  Details of the redirection are:
  valgrind:  
  valgrind:  A must-be-redirected function
  valgrind:  whose name matches the pattern:      strlen
  valgrind:  in an object with soname matching:   ld-linux-x86-64.so.2
  valgrind:  was not found whilst processing
  valgrind:  symbols from the object with soname: ld-linux-x86-64.so.2
  valgrind:  
  valgrind:  Possible fixes: (1, short term): install glibc's debuginfo
  valgrind:  package on this machine.  (2, longer term): ask the packagers
  valgrind:  for your Linux distribution to please in future ship a non-
  valgrind:  stripped ld.so (or whatever the dynamic linker .so is called)
  valgrind:  that exports the above-named function using the standard
  valgrind:  calling conventions for this platform.  The package you need
  valgrind:  to install for fix (1) is called
  valgrind:  
  valgrind:    On Debian, Ubuntu:                 libc6-dbg
  valgrind:    On SuSE, openSuSE, Fedora, RHEL:   glibc-debuginfo
  valgrind:  
  valgrind:  Note that if you are debugging a 32 bit process on a
  valgrind:  64 bit system, you will need a corresponding 32 bit debuginfo
  valgrind:  package (e.g. libc6-dbg:i386).
  valgrind:  
  valgrind:  Cannot continue -- exiting now.  Sorry.

@JoaoDiogoDuarte JoaoDiogoDuarte changed the title Implemented X-Wing (ref) in libjade Implemented X-Wing (ref and avx2) in libjade Feb 13, 2024
@tfaoliveira
Copy link
Member

Hi, thanks for pushing.

My current view:

$ make distclean CI=1; make CI=1 FILTER=../src/crypto_kem/xwing/%; make CI=1 FILTER=../src/crypto_kem/xwing/% reporter

Execution status - checksumsmall: 
ERROR, 2, crypto_kem/xwing/amd64/avx2/.ci/checksumsmall.out.error
ERROR, 2, crypto_kem/xwing/amd64/ref/.ci/checksumsmall.out.error
Execution status - checksumbig: 
ERROR, 2, crypto_kem/xwing/amd64/avx2/.ci/checksumbig.out.error
ERROR, 2, crypto_kem/xwing/amd64/ref/.ci/checksumbig.out.error
Execution status - printparams: 
OK, 0, crypto_kem/xwing/amd64/avx2/.ci/printparams.out.log
OK, 0, crypto_kem/xwing/amd64/ref/.ci/printparams.out.log
Execution status - functest: 
OK, 0, crypto_kem/xwing/amd64/avx2/.ci/functest.out.log
OK, 0, crypto_kem/xwing/amd64/ref/.ci/functest.out.log
Execution status - safetyparams: 
OK, 0, crypto_kem/xwing/amd64/avx2/.ci/safetyparams.out.log
OK, 0, crypto_kem/xwing/amd64/ref/.ci/safetyparams.out.log
Execution status - memory: 
ERROR, 1, crypto_kem/xwing/amd64/avx2/.ci/memory.out.error
ERROR, 1, crypto_kem/xwing/amd64/ref/.ci/memory.out.error

Next (I see that there are unused variables warnings, it would be nice to fixe those):

$ make CI=1 distclean; make JFLAGS=-g bin/crypto_kem/xwing/amd64/ref/checksumsmall.stdout
(...)
jade_kem_keypair - secret_key writes after output

Next:

$ make CI=1 distclean; make JFLAGS=-g bin/crypto_kem/xwing/amd64/ref/memory.out
$ cat bin/crypto_kem/xwing/amd64/ref/memory.out | grep "at 0x" | grep "jinc" | cut -d'?' -f4 | sort -u
(kem.jinc:31)
$ mv bin/crypto_kem/xwing/amd64/ref/memory.out bin/crypto_kem/xwing/amd64/ref/memory.txt # to attach here

Looking at line 31 from the corresponding file:

    for i=0 to X25519_PUBLICKEYBYTES {
        [skp + i] = [pkp + i];
    }

This might be missing a couple of (u8): (u8)[skp + i] = (u8)[pkp + i]; Please check the remaining code for this type of issue. Note: I usually prefer to use a temporary register to perform such copies, but we can check for this later in the review. Thanks

Note 2, until 4th of March I'm mostly offline as I'm on holiday, but I will drop by now and then

The valgrind output:
memory.txt

@JoaoDiogoDuarte
Copy link
Collaborator Author

JoaoDiogoDuarte commented Feb 20, 2024

Hi, thanks for the really helpful feedback and for the valgrind output! I've fixed that loop you mentioned and with a bit of embarrassment, I admit that I actually had the wrong value for the size of the ciphertext.......that is why the checksum was failing........................................I've fixed that.... (#dumbdumbtime).

I removed the unused variables r in the scalarmult.jinc file. Now the remaining unused files seem to be from Kyber.

After implementing the SHA3 function I was talking about, on my machine, after running:

make distclean CI=1; make CI=1 FILTER=../src/crypto_kem/xwing/%; make CI=1 FILTER=../src/crypto_kem/xwing/% reporter

I get:

Execution status - checksumsmall: 
OK, 0, crypto_kem/xwing/amd64/avx2/.ci/checksumsmall.out.log
OK, 0, crypto_kem/xwing/amd64/ref/.ci/checksumsmall.out.log
Execution status - checksumbig: 
OK, 0, crypto_kem/xwing/amd64/avx2/.ci/checksumbig.out.log
OK, 0, crypto_kem/xwing/amd64/ref/.ci/checksumbig.out.log
Execution status - printparams: 
OK, 0, crypto_kem/xwing/amd64/avx2/.ci/printparams.out.log
OK, 0, crypto_kem/xwing/amd64/ref/.ci/printparams.out.log
Execution status - functest: 
OK, 0, crypto_kem/xwing/amd64/avx2/.ci/functest.out.log
OK, 0, crypto_kem/xwing/amd64/ref/.ci/functest.out.log
Execution status - safetyparams: 
OK, 0, crypto_kem/xwing/amd64/avx2/.ci/safetyparams.out.log
OK, 0, crypto_kem/xwing/amd64/ref/.ci/safetyparams.out.log
Execution status - memory: 
ERROR, 1, crypto_kem/xwing/amd64/avx2/.ci/memory.out.error
ERROR, 1, crypto_kem/xwing/amd64/ref/.ci/memory.out.error
make[1]: Leaving directory '/home/joao/Projects/libjade/test'
make reporter_checksums
make[1]: Entering directory '/home/joao/Projects/libjade/test'
./scripts/checksumsok
./../scripts/ci/reporter/jlog "Checksums status" test/bin checksum*.ok "1";
Checksums status: 
OK, 0, crypto_kem/xwing/amd64/avx2/.ci/checksumbig.ok.log
OK, 0, crypto_kem/xwing/amd64/avx2/.ci/checksumsmall.ok.log
OK, 0, crypto_kem/xwing/amd64/ref/.ci/checksumbig.ok.log
OK, 0, crypto_kem/xwing/amd64/ref/.ci/checksumsmall.ok.log

This is great! The memory error is still there simply because valgrind does not run on my machine, so maybe it's gone?

Have a good holiday!

@tfaoliveira
Copy link
Member

Cool! I enabled the CI; Were you able to make some progress on getting the checksums from different implementations?

@JoaoDiogoDuarte
Copy link
Collaborator Author

JoaoDiogoDuarte commented Feb 21, 2024

The only implementations in C is the one the X-Wing team and I have been working on. I can try with this one!

The current checksums were obtained from running the checksumbig and checksumsmall scripts.

Besides that, there is one in Rust and this one in Go, but I genuinely do not know if I can use supercop and either languages.

@tfaoliveira
Copy link
Member

The CI failure is related with the jasmin compiler that is being used (no support for spill), which is set in scripts/ci/config/jasmin

If the CI for this PR goes well: #106
Then current libjade is compatible with latest Jasmin
and you can, for instance, cherry-pick that commit

@JoaoDiogoDuarte
Copy link
Collaborator Author

All done!

@tfaoliveira
Copy link
Member

tfaoliveira commented Feb 21, 2024

The only implementations in C is the one the X-Wing team and I have been working on. I can try with this one!

Cool!

The current checksums were obtained from running the checksumbig and checksumsmall scripts.

Besides that, there is one in Rust and this one in Go, but I genuinely do not know if I can use supercop and either languages.

This can also be done in libjade (but not to be pushed to main):

  • create a directory for the implementation (just as you did for the ref/ and avx2/ xwing)
  • likely there will be a need for some C binding file that, for instance, defines the expected functions (the ones with the long names) and call the Go/Rust functions (or not).
  • in libjade/test/Makefile, before the following rule:
%/checksumbig: __phony | %/ %/$(CID)
	$(MAKE) -C $(IDIR) || true
	$(CIC)
	$(COMPILE) || true

add:

bin/crypto_kem/xwing/amd64/rust_ref/checksumbig: __phony | %/ %/$(CID)
	$(MAKE) -C $(IDIR) || true
	$(CIC)
	$(COMPILE) || true

where the COMPILE needs to be adjusted for your particular case: for instance, instead of $(ASM) (check compile variable) you set what you need

Ideally, this should be done in a way that it is easy to reproduce (if you push this to some branch, I can check it)
Thanks

@JoaoDiogoDuarte
Copy link
Collaborator Author

After messing around with supercop for a while and pulling out my hair because the checksums of the libjade vs C implementation did not match, I then realized that the C implementation uses ML-KEM and the libjade one uses Kyber...

Because of this, is it worth collecting more checksums or should I fork the C implementation to use Kyber instead of ML-KEM? The issue of using Kyber is that the resulting system is technically not X-Wing anymore.

@JoaoDiogoDuarte
Copy link
Collaborator Author

Ok, I have implemented X-Wing using Kyber768 in a separate branch of the C repo I mentioned and put all of it into supercop in my fork. The checksums for both implementations match, so when ML-KEM is available in libjade, I am sure they will also match!

@tfaoliveira tfaoliveira changed the base branch from main to feature/xwing April 16, 2024 08:07
@tfaoliveira tfaoliveira merged commit d144719 into formosa-crypto:feature/xwing Apr 16, 2024
5 of 6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants