-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implemented X-Wing (ref and avx2) in libjade #104
Implemented X-Wing (ref and avx2) in libjade #104
Conversation
Hi! thanks for the contribution! Wrt testing, I ran a quick test and found some possible issues that I would like you to take a look at before I enable the CI:
And there are the following 3 problems:
Wrt the checksums executables:
They don't finish the execution (and print a checksum, even if there is nothing to compare to) and return an error because there are memory safety issues with the current implementation (same tests as done in SUPERCOP). There might be additional information in file A good check to do first is to see if the values in Wrt. producing SUPERCOP checksums with other implementations: if there any other implementation (the more the better) that we can use for this purpose (anything that can be linked to C code should be fine). This can be done, for instance, as in https://github.com/tfaoliveira/supercop/commits/dilithium/ or even inside libjade with a bit of patching. After the testing issues are figured out, we think about code organization: I see that the paper is focused on MLKEM768, but if there are plans for MLKEM1024, one could consider that at this stage. Replacing Kyber by MLKEM should be fine and can be done later. Other x25519 implementations, such as mulx, might be worth trying. Then, we can move on to code review/performance analysys/code size/etc; Thanks |
Thanks for the comments and suggestions! (btw nice formatting, I like the line separators, somehow I never thought of doing that for these kinds of things!) In terms of the memory problem, the issue seems to be that I stupidly load from the stack a value of 134 bytes into a register that's only been malloced 32 bytes. This complicates things a little as we cannot malloc in Jasmin and I do not want to break the KEM API by asking the user to give us a buffer pointer that has been malloced 134 bytes. This buffer would hold the input which we then pass to SHA3-256. The current SHA3-256 implementation seems to accept reg ptrs of 32 as input, and for any other sizes, it asks for a register as input, which is why I load the 134 bytes from the stack into a register. The only solution I can think to solve this is rewriting SHA3-256 to accept stack arrays (well, reg ptrs) of 134 bytes. My source of confusion in rewriting the SHA3-256 code (besides not being very familiar with the internals), is that for the implementation that accepts for i=0 to KYBER_SYMBYTES/8 // KYBER_SYMBYTES/8 = 32/8 = 8
{
t64 = in[u64 i];
state[u64 i] = t64;
} and I am not sure how to go about this since 134 does not divide by 8 and in the SHA3-512 implementation, we have have in line 292, From what I understand we are loading this u8 reg ptr input into a u64 reg and then loading it into a stack array, but I am not entirely sure how to do this with a number that is not divisible by 8, and then if we also need to XOR the state for larger Any thoughts that may clarify this whole SHA3 problem and if maybe there is a way around? In terms of the checksum I have a simple C implementation I can use/adapt to get the checksum values, I'll have a look. Some colleagues also have a Go and Rust implementation IIRC, so I can ask them if it's possible to link their implementation to C. Lastly, I do not believe we have any plans for ML-KEM-1024, which may simplify code organization. |
I got the SHA3-256_134 implementation working. Update: The memory issue is fixed, at least I do not get any errors* *I always seem to get an error file in the memory test, even if I test the Kyber implementation, so it's a bit difficult to say for sure that it's fixed on my end. Even if I completely comment out all the main body in |
@tfaoliveira I quickly made a AVX2 implementation and I am trying to implement SHA3-256 with input state[0] = #VPBROADCAST_4u64(in[u64 0]);
for i=1 to KYBER_SYMBYTES/8
{
t = in[u64 i];
l = a_jagged_p[i];
s_state[(int) l] = t;
}
l = a_jagged_p[KYBER_SYMBYTES/8];
l <<= 3;
s_state[u8 (int)l] = 0x06; and since I do not fully understand why we are doing certain things, the best I can do is: state[0] = #VPBROADCAST_4u64(in[u64 0]);
for i=1 to 16
{
t = in[u64 i];
l = a_jagged_p[i];
s_state[(int) l] = t;
}
l = a_jagged_p[17];
l <<= 3;
s_state[u8 (int)l] = 0x06; which compiles but is clearly wrong because I am not passing the last couple of bytes of the input into Thanks! |
Also, sorry if I have not properly addressed the memory bug, the best I can do is guess because I cannot get memcheck to work on my distro as I get an annoying bug when testing the memory of any component of libjade. Long annoying bug probably of no interest to you, just to demonstrate I am kind of shooting in the dark ==29133== Memcheck, a memory error detector
==29133== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==29133== Using Valgrind-3.22.0 and LibVEX; rerun with -h for copyright info
==29133== Command: ./bin/crypto_kem/xwing/amd64/ref/memory
==29133== Parent PID: 29132
==29133==
valgrind: Fatal error at startup: a function redirection
valgrind: which is mandatory for this platform-tool combination
valgrind: cannot be set up. Details of the redirection are:
valgrind:
valgrind: A must-be-redirected function
valgrind: whose name matches the pattern: strlen
valgrind: in an object with soname matching: ld-linux-x86-64.so.2
valgrind: was not found whilst processing
valgrind: symbols from the object with soname: ld-linux-x86-64.so.2
valgrind:
valgrind: Possible fixes: (1, short term): install glibc's debuginfo
valgrind: package on this machine. (2, longer term): ask the packagers
valgrind: for your Linux distribution to please in future ship a non-
valgrind: stripped ld.so (or whatever the dynamic linker .so is called)
valgrind: that exports the above-named function using the standard
valgrind: calling conventions for this platform. The package you need
valgrind: to install for fix (1) is called
valgrind:
valgrind: On Debian, Ubuntu: libc6-dbg
valgrind: On SuSE, openSuSE, Fedora, RHEL: glibc-debuginfo
valgrind:
valgrind: Note that if you are debugging a 32 bit process on a
valgrind: 64 bit system, you will need a corresponding 32 bit debuginfo
valgrind: package (e.g. libc6-dbg:i386).
valgrind:
valgrind: Cannot continue -- exiting now. Sorry. |
Hi, thanks for pushing. My current view:
Next (I see that there are unused variables warnings, it would be nice to fixe those):
Next:
Looking at line 31 from the corresponding file:
This might be missing a couple of Note 2, until 4th of March I'm mostly offline as I'm on holiday, but I will drop by now and then The valgrind output: |
Hi, thanks for the really helpful feedback and for the valgrind output! I've fixed that loop you mentioned and with a bit of embarrassment, I admit that I actually had the wrong value for the size of the ciphertext.......that is why the checksum was failing........................................I've fixed that.... (#dumbdumbtime). I removed the unused variables After implementing the SHA3 function I was talking about, on my machine, after running: make distclean CI=1; make CI=1 FILTER=../src/crypto_kem/xwing/%; make CI=1 FILTER=../src/crypto_kem/xwing/% reporter I get:
This is great! The memory error is still there simply because valgrind does not run on my machine, so maybe it's gone? Have a good holiday! |
Cool! I enabled the CI; Were you able to make some progress on getting the checksums from different implementations? |
The only implementations in C is the one the X-Wing team and I have been working on. I can try with this one! The current checksums were obtained from running the checksumbig and checksumsmall scripts. Besides that, there is one in Rust and this one in Go, but I genuinely do not know if I can use supercop and either languages. |
The CI failure is related with the jasmin compiler that is being used (no support for spill), which is set in If the CI for this PR goes well: #106 |
All done! |
Cool!
This can also be done in libjade (but not to be pushed to main):
add:
where the Ideally, this should be done in a way that it is easy to reproduce (if you push this to some branch, I can check it) |
After messing around with supercop for a while and pulling out my hair because the checksums of the libjade vs C implementation did not match, I then realized that the C implementation uses ML-KEM and the libjade one uses Kyber... Because of this, is it worth collecting more checksums or should I fork the C implementation to use Kyber instead of ML-KEM? The issue of using Kyber is that the resulting system is technically not X-Wing anymore. |
Ok, I have implemented X-Wing using Kyber768 in a separate branch of the C repo I mentioned and put all of it into supercop in my fork. The checksums for both implementations match, so when ML-KEM is available in libjade, I am sure they will also match! |
d144719
into
formosa-crypto:feature/xwing
Hi,
I coded a almost-reference X-Wing implementation, the hybrid KEM defined in [draft-connolly-cfrg-xwing-kem](https://datatracker.ietf.org/doc/draft-connolly-cfrg-xwing-kem/ and analysed in https://eprint.iacr.org/2024/039. The almost-reference is because X-Wing technically uses ML-KEM-768, but as it has not been standardized or implemented in libjade (at the time of writing), I utilised Kyber 768.
IMO, two things need to be done, which are: