Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pull SPHINCS+ update #1249

Closed
dstebila opened this issue Jul 7, 2022 · 8 comments · Fixed by #1420
Closed

Pull SPHINCS+ update #1249

dstebila opened this issue Jul 7, 2022 · 8 comments · Fixed by #1420
Milestone

Comments

@dstebila
Copy link
Member

dstebila commented Jul 7, 2022

sphincs/sphincsplus#28 and PQClean/PQClean#446. Note that this changes algorithm specification, so if we include this in liboqs 0.7.2 our policy would actually bump the version number to 0.8.0. Alternatively we could leave this until after the 0.7.2 release cycle.

@thomwiggers thomwiggers changed the title Pull SHPINCS+ update Pull SPHINCS+ update Jul 19, 2022
@baentsch
Copy link
Member

baentsch commented Jul 27, 2022

Isn't the goal for the next release to have all the "latest and greatest" code as per closing of round 3? If so, this should be part of it, no? In that case, shouldn't we skip 0.7.2 and go to 0.8? Edit/Add: If "all the latest and greatest code of round 3" is not the goal, why not also skip the blocking SIKE upgrade to get 0.7.2 out the door and move on beyond round 3?

@dstebila dstebila added this to the 0.8.0 milestone Sep 7, 2022
@dstebila
Copy link
Member Author

dstebila commented Sep 7, 2022

As discussed in today's status meeting, we can begin with algorithm updates like this one.

@baentsch
Copy link
Member

baentsch commented Sep 8, 2022

we can begin with algorithm updates like this one.

ACK. Going through the "chain of issues" (from here to PQClean/PQClean#446 to PQClean/PQClean#378) it seems the task would be to (semi-)manually create copies of SPHINCS+ parameterized code -- 36 of them. Is this understanding correct, @dstebila @thomwiggers ?

Now the question: Do we really want to (keep) do(ing) this? After the NIST decision, wouldn't it be time to consider creating a "real library" where parameters are just that, parameters, to one code base for a given algorithm?

To quantify the issue: A current liboqs build (release type on x64 with avx2) contains 720 object files bearing a name "sphincs*", creating 4.8MB of binary (in a 20MB liboqs.a): Nearly 25% of all code due to what conceptually is one algorithm: Does this still make sense? FWIW, McEliece "contributes" 5.3MB in this metric.

Don't get me wrong: I volunteered taking this issue on (admittedly not knowing how many issues are "behind" it :), so I'll do this, but I wonder whether it truly makes sense? Oh, and of course asking whether someone may have created a script already for doing this "code multiplication" driven by the SPHINCS parameter matrix?

@dstebila
Copy link
Member Author

dstebila commented Sep 8, 2022

we can begin with algorithm updates like this one.

ACK. Going through the "chain of issues" (from here to PQClean/PQClean#446 to PQClean/PQClean#378) it seems the task would be to (semi-)manually create copies of SPHINCS+ parameterized code -- 36 of them. Is this understanding correct, @dstebila @thomwiggers ?

Now the question: Do we really want to (keep) do(ing) this? After the NIST decision, wouldn't it be time to consider creating a "real library" where parameters are just that, parameters, to one code base for a given algorithm?

To quantify the issue: A current liboqs build (release type on x64 with avx2) contains 720 object files bearing a name "sphincs*", creating 4.8MB of binary (in a 20MB liboqs.a): Nearly 25% of all code due to what conceptually is one algorithm: Does this still make sense? FWIW, McEliece "contributes" 5.3MB in this metric.

Don't get me wrong: I volunteered taking this issue on (admittedly not knowing how many issues are "behind" it :), so I'll do this, but I wonder whether it truly makes sense? Oh, and of course asking whether someone may have created a script already for doing this "code multiplication" driven by the SPHINCS parameter matrix?

It's definitely worth revisiting. PQClean took the approach of avoiding generating multiple binaries from the same source code via pre-processor macro magic, leading to the duplication of source code. It made sense for the goals of PQClean, but doesn't necessarily make sense for the goals of liboqs.

An alternative approach for liboqs would be to have a single copy (well, modulo architectural optimizations) of the SPHINCS+ source code (pulling from https://github.com/sphincs/sphincsplus rather than PQClean), and then use compiler macros to generate the binary code for different parameterizations.

Note that we won't be able to eliminate having lots of binary objects for each different variant, because avoiding that would require runtime switching of parameters, which is not present in the source code we have available and would probably lead to worse performance. But we would be able to eliminate having lots of source code for each different variant, at the cost of having more complex CMake files. (And also at the cost of essentially doing an inclusion of SPHINCS+ from scratch, since we'd need to ditch all the existing files, import the ones from the SPHINCS+ repo upstream, do our own namespacing and cleanup, and code generation for the API.)

@baentsch
Copy link
Member

baentsch commented Sep 9, 2022

Note that we won't be able to eliminate having lots of binary objects for each different variant, because avoiding that would require runtime switching of parameters, which is not present in the source code we have available and would probably lead to worse performance.

Yup -- I now looked at the code and the many #define'd parameters and the myriad of locations they're used in the core logic: It should be possible to directly pull sphincs+ as per your suggestion and with quite some effort create liboqs-based patches to change this into runtime-parameter switching. To minimize code changes and avoid losing performance we could use max-(param-)size array allocations combined with globally set variables replacing the current defines. HOWEVER, this would

So... lots of work to create a "mixed-bag" result. Besides, users of embedded platforms could (and probably should) build liboqs only with those (few) algorithms/parameter sets that make sense on those platforms and leave all others away -- also optimizing code size-- without additional work on our side.

But we would be able to eliminate having lots of source code for each different variant, at the cost of having more complex CMake files.

The number of source files is pretty irrelevant in my eyes: They're "invisible" to all users (and most developers). It's the result that counts/is executed/stored much more often (than it is compiled). And adding complexity is nearly always not a good thing.

In sum, I withdraw my suggestion -- particularly for this algorithm :-/ Apologies for the spam in this issue.

@thomwiggers
Copy link
Contributor

To minimize code changes and avoid losing performance we could use max-(param-)size array allocations

The upstream SPHINCS+ code already requires variable-length arrays. The recently merged workaround (sphincs/sphincsplus#26) that should make the upstream code work with MSVC (in theory) is a set of #ifdefs (what else) that turn the SPX_VLA macro into _alloca calls on MSVC. It should be possible to extend this approach to all of SPHINCS+ where necessary to replace lengths currently hardcoded via the parameterset constants.

I'm probably going to adopt the VLA approach in the PQClean impls, by the way, as nobody has presented a compelling argument why we should not have VLAs in the reference code. (Anyone who told me about needing statically-sized stack couldn't handle large/any arrays on the stack anyway – if I'm missing a good argument against VLAs/alloca() please comment at PQClean/PQClean#452)

For comparison, the XMSS reference implementation does work by passing around a xmss_params struct that contains all of the settings for tree height, etc, rather than fixed parameters.

The amount of performance loss that you would get once you take away the compiler's ability to inline and do other optimizations based on hard-coded parameters is a separate, but still interesting question.

@thomwiggers
Copy link
Contributor

I should probably also comment here: I'm trying to upstream a few of the PQClean code changes we've done for SPHINCS+; but that takes a bit of time. Notably sphincs/sphincsplus#40 is a significant number of lines of code not exactly trivially changed.

@baentsch
Copy link
Member

baentsch commented Sep 9, 2022

I'm probably going to adopt the VLA approach in the PQClean impls

That surely sounds reasonable.

For comparison, the XMSS reference implementation does work by passing around a xmss_params struct that contains all of the settings for tree height, etc, rather than fixed parameters.

That sounds like a very reasonable approach, too -- but probably requiring even more code changes/effort/time.

Notably sphincs/sphincsplus#40 is a significant number of lines of code not exactly trivially changed.

So, given all of the above, can I ask for your recommendation how to proceed:

  1. (Try to) Do a PR to bring (properly parameterized copies of) Sphincs+3.1 into PQClean (and then pull into liboqs from there) now
  2. Wait for resolution of Refactor FIPS/SHA2 APIs sphincs/sphincsplus#40 and then do 1)
  3. Wait for resolution of Revisit VLAs? PQClean/PQClean#452 and then do 1)
  4. Do sth different (?)

There's surely no rush -- I just wanted to move this issue "forward" a bit... Resolution target date for me would be end of October.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants