Pull SPHINCS+ update #1249

dstebila · 2022-07-07T01:29:12Z

sphincs/sphincsplus#28 and PQClean/PQClean#446. Note that this changes algorithm specification, so if we include this in liboqs 0.7.2 our policy would actually bump the version number to 0.8.0. Alternatively we could leave this until after the 0.7.2 release cycle.

baentsch · 2022-07-27T05:26:47Z

Isn't the goal for the next release to have all the "latest and greatest" code as per closing of round 3? If so, this should be part of it, no? In that case, shouldn't we skip 0.7.2 and go to 0.8? Edit/Add: If "all the latest and greatest code of round 3" is not the goal, why not also skip the blocking SIKE upgrade to get 0.7.2 out the door and move on beyond round 3?

dstebila · 2022-09-07T20:34:11Z

As discussed in today's status meeting, we can begin with algorithm updates like this one.

baentsch · 2022-09-08T14:15:23Z

we can begin with algorithm updates like this one.

ACK. Going through the "chain of issues" (from here to PQClean/PQClean#446 to PQClean/PQClean#378) it seems the task would be to (semi-)manually create copies of SPHINCS+ parameterized code -- 36 of them. Is this understanding correct, @dstebila @thomwiggers ?

Now the question: Do we really want to (keep) do(ing) this? After the NIST decision, wouldn't it be time to consider creating a "real library" where parameters are just that, parameters, to one code base for a given algorithm?

To quantify the issue: A current liboqs build (release type on x64 with avx2) contains 720 object files bearing a name "sphincs*", creating 4.8MB of binary (in a 20MB liboqs.a): Nearly 25% of all code due to what conceptually is one algorithm: Does this still make sense? FWIW, McEliece "contributes" 5.3MB in this metric.

Don't get me wrong: I volunteered taking this issue on (admittedly not knowing how many issues are "behind" it :), so I'll do this, but I wonder whether it truly makes sense? Oh, and of course asking whether someone may have created a script already for doing this "code multiplication" driven by the SPHINCS parameter matrix?

dstebila · 2022-09-08T19:41:07Z

we can begin with algorithm updates like this one.

ACK. Going through the "chain of issues" (from here to PQClean/PQClean#446 to PQClean/PQClean#378) it seems the task would be to (semi-)manually create copies of SPHINCS+ parameterized code -- 36 of them. Is this understanding correct, @dstebila @thomwiggers ?

Now the question: Do we really want to (keep) do(ing) this? After the NIST decision, wouldn't it be time to consider creating a "real library" where parameters are just that, parameters, to one code base for a given algorithm?

To quantify the issue: A current liboqs build (release type on x64 with avx2) contains 720 object files bearing a name "sphincs*", creating 4.8MB of binary (in a 20MB liboqs.a): Nearly 25% of all code due to what conceptually is one algorithm: Does this still make sense? FWIW, McEliece "contributes" 5.3MB in this metric.

Don't get me wrong: I volunteered taking this issue on (admittedly not knowing how many issues are "behind" it :), so I'll do this, but I wonder whether it truly makes sense? Oh, and of course asking whether someone may have created a script already for doing this "code multiplication" driven by the SPHINCS parameter matrix?

It's definitely worth revisiting. PQClean took the approach of avoiding generating multiple binaries from the same source code via pre-processor macro magic, leading to the duplication of source code. It made sense for the goals of PQClean, but doesn't necessarily make sense for the goals of liboqs.

An alternative approach for liboqs would be to have a single copy (well, modulo architectural optimizations) of the SPHINCS+ source code (pulling from https://github.com/sphincs/sphincsplus rather than PQClean), and then use compiler macros to generate the binary code for different parameterizations.

Note that we won't be able to eliminate having lots of binary objects for each different variant, because avoiding that would require runtime switching of parameters, which is not present in the source code we have available and would probably lead to worse performance. But we would be able to eliminate having lots of source code for each different variant, at the cost of having more complex CMake files. (And also at the cost of essentially doing an inclusion of SPHINCS+ from scratch, since we'd need to ditch all the existing files, import the ones from the SPHINCS+ repo upstream, do our own namespacing and cleanup, and code generation for the API.)

baentsch · 2022-09-09T06:05:10Z

Note that we won't be able to eliminate having lots of binary objects for each different variant, because avoiding that would require runtime switching of parameters, which is not present in the source code we have available and would probably lead to worse performance.

Yup -- I now looked at the code and the many #define'd parameters and the myriad of locations they're used in the core logic: It should be possible to directly pull sphincs+ as per your suggestion and with quite some effort create liboqs-based patches to change this into runtime-parameter switching. To minimize code changes and avoid losing performance we could use max-(param-)size array allocations combined with globally set variables replacing the current defines. HOWEVER, this would

create smaller executables benefiting primarily (users of) embedded platforms
with (in the case of SPHINCS+ particularly high) runtime memory consumption hurting those very same embedded platforms

So... lots of work to create a "mixed-bag" result. Besides, users of embedded platforms could (and probably should) build liboqs only with those (few) algorithms/parameter sets that make sense on those platforms and leave all others away -- also optimizing code size-- without additional work on our side.

But we would be able to eliminate having lots of source code for each different variant, at the cost of having more complex CMake files.

The number of source files is pretty irrelevant in my eyes: They're "invisible" to all users (and most developers). It's the result that counts/is executed/stored much more often (than it is compiled). And adding complexity is nearly always not a good thing.

In sum, I withdraw my suggestion -- particularly for this algorithm :-/ Apologies for the spam in this issue.

thomwiggers · 2022-09-09T07:31:17Z

To minimize code changes and avoid losing performance we could use max-(param-)size array allocations

The upstream SPHINCS+ code already requires variable-length arrays. The recently merged workaround (sphincs/sphincsplus#26) that should make the upstream code work with MSVC (in theory) is a set of #ifdefs (what else) that turn the SPX_VLA macro into _alloca calls on MSVC. It should be possible to extend this approach to all of SPHINCS+ where necessary to replace lengths currently hardcoded via the parameterset constants.

I'm probably going to adopt the VLA approach in the PQClean impls, by the way, as nobody has presented a compelling argument why we should not have VLAs in the reference code. (Anyone who told me about needing statically-sized stack couldn't handle large/any arrays on the stack anyway – if I'm missing a good argument against VLAs/alloca() please comment at PQClean/PQClean#452)

For comparison, the XMSS reference implementation does work by passing around a xmss_params struct that contains all of the settings for tree height, etc, rather than fixed parameters.

The amount of performance loss that you would get once you take away the compiler's ability to inline and do other optimizations based on hard-coded parameters is a separate, but still interesting question.

thomwiggers · 2022-09-09T08:44:50Z

I should probably also comment here: I'm trying to upstream a few of the PQClean code changes we've done for SPHINCS+; but that takes a bit of time. Notably sphincs/sphincsplus#40 is a significant number of lines of code not exactly trivially changed.

baentsch · 2022-09-09T09:33:45Z

I'm probably going to adopt the VLA approach in the PQClean impls

That surely sounds reasonable.

For comparison, the XMSS reference implementation does work by passing around a xmss_params struct that contains all of the settings for tree height, etc, rather than fixed parameters.

That sounds like a very reasonable approach, too -- but probably requiring even more code changes/effort/time.

Notably sphincs/sphincsplus#40 is a significant number of lines of code not exactly trivially changed.

So, given all of the above, can I ask for your recommendation how to proceed:

(Try to) Do a PR to bring (properly parameterized copies of) Sphincs+3.1 into PQClean (and then pull into liboqs from there) now
Wait for resolution of Refactor FIPS/SHA2 APIs sphincs/sphincsplus#40 and then do 1)
Wait for resolution of Revisit VLAs? PQClean/PQClean#452 and then do 1)
Do sth different (?)

There's surely no rush -- I just wanted to move this issue "forward" a bit... Resolution target date for me would be end of October.

thomwiggers changed the title ~~Pull SHPINCS+ update~~ Pull SPHINCS+ update Jul 19, 2022

dstebila added this to the 0.8.0 milestone Sep 7, 2022

xvzcf mentioned this issue Nov 23, 2022

Classify algorithm status #1245

Closed

dstebila mentioned this issue Mar 18, 2023

Update SPHINCS+ #1420

Merged

1 task

baentsch mentioned this issue May 13, 2023

AVX2 support for Sphincs #1458

Closed

xvzcf closed this as completed in #1420 May 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pull SPHINCS+ update #1249

Pull SPHINCS+ update #1249

dstebila commented Jul 7, 2022

baentsch commented Jul 27, 2022 •

edited

Loading

dstebila commented Sep 7, 2022

baentsch commented Sep 8, 2022

dstebila commented Sep 8, 2022

baentsch commented Sep 9, 2022

thomwiggers commented Sep 9, 2022

thomwiggers commented Sep 9, 2022

baentsch commented Sep 9, 2022

Pull SPHINCS+ update #1249

Pull SPHINCS+ update #1249

Comments

dstebila commented Jul 7, 2022

baentsch commented Jul 27, 2022 • edited Loading

dstebila commented Sep 7, 2022

baentsch commented Sep 8, 2022

dstebila commented Sep 8, 2022

baentsch commented Sep 9, 2022

thomwiggers commented Sep 9, 2022

thomwiggers commented Sep 9, 2022

baentsch commented Sep 9, 2022

baentsch commented Jul 27, 2022 •

edited

Loading