Optimize Performance #14

georgwiese · 2023-05-15T08:55:10Z

Depends on #13

I think we should consider some basic performance optimizations, like:

Try to reduce the number of columns (by using more rows and copy constraints)
Check if we can disable equality checks on some advice columns
Use the V1 floor planner instead of SimpleFloorPlanner

I think the number of rows is already pretty optimized with #12.

The text was updated successfully, but these errors were encountered:

georgwiese · 2023-05-26T08:54:12Z

I've done a few experiments (including the changes in #17). This is an intermediate report.

Preliminaries

I added a branch on my Halo2 fork that prints out some useful information with regard to performance.

Here's a run with the "MNIST_small" model (models/model_28input_1024entry_2hash_2bpi.pickle.hdf5):

 === Degree overview:
  Minimum degree required by permutation argument: 3
  Minimum degree required by lookup arguments: 6
    bloom filter lookup: 5
    bit_lookup: 5
    byte_decomposition: 5
    x is byte: 5
    diff is byte: 5
    lookup: 6
  Minimum degree required by gates: 4
  Minimum degree required by circuit: 1
  Total degree: 6

----------------------------------------
create_proof Total: 29.679
  Before synthesize: 0.082 (0.3%)
  synthesize: 0.186 (0.6%)
  Compute advice and challenges: 0.388 (1.3%)
  Compute lookups: 2.652 (8.9%)
  Compute permutations: 1.869 (6.3%)
  Compute lookup products: 4.278 (14.4%)
  Compute vanishing argument: 0.604 (2.0%)
  Compute advice polys: 0.282 (1.0%)
  Compute h_poly: 12.958 (43.7%)
  Before create_proof: 3.642 (12.3%)
  create_proof: 2.732 (9.2%)
----------------------------------------

Took: 29.707698257s

You can see that:

Our degree bound is 6, and it's dominated by the range check gadget by halo2gadgets (they named their lookup lookup 👌)
With the changes of Performance optimization: Instantiate Chips only once #18, circuit synthesis now only takes 0.6% of the time.
Almost half of the proving time is used to compute the quotient polynomial (which is expected I guess)

Using the `V1` floor planner instead of `SimpleFloorPlanner`

It did not reduce rows enough to reduce k (using the model_28input_256entry_1hash_1bpi.pickle.hdf5 model), but increased synthesis time. I think that might be because I didn't Circuit::without_witnesses() such that the witness is actually None, which might lead to more computation.

In summary, I think it might still be worth doing, but will only help for some models.

Reducing the number of columns

I removed the selector_accumulator column in the ByteSelectorChip (also removing some gates), which allowed me to reduce the number of columns from 6 to 5. The speed-up was 3.9%.

In summary, I think there's not much to be gained from reducing the number of columns.

Reducing the number of gates and lookups

We currently define 11 gates and 5 table lookups. According to this lecture, table lookups are at least 4-times as expensive as gates.

To get a feel how much speed-up we can expect, I went to the GreaterThanChip and replaced:

The gate that checks that the result is bit with a range-check in the existing range-check gadget.
The lookup that checks that x is a byte with a range-check in the existing range-check gadget.
The lookup that checks that diff is a byte with a range-check in the existing range-check gadget.

The speed-up was 12%.

Note that for these experiments, I did increase k from 12 to 14! The additional range-checks need many rows, so we can only realize this speed-up if we manage to keep the number of rows the same.

Reducing the degree bound

Ironically, the range-check (implemented by halo2gadgets) also has the lookup with the highest degree (6, all other gates / lookups have degree of at most 5). So, on top of the previous step (removing 1 gate and 2 lookups), I removed the range checks as well (all while keeping k constant, of course).

The speed-up (compared to the previous step) was 27%.

Summary

I'd say there are two possible areas for improvement:

Using the V1 floor planner. It will probably help for some models, and when it does help, it reduces the number of rows by a factor of two (and proving time by about as much). Would be good if it doesn't lead to slower times when it doesn't make a difference.
It's worth investigating if we can implement the range check gadget more efficiently. This includes both row efficiency (perhaps there is a design using more than one column which could be packed more compactly) and reducing the degree (which by itself should give a significant performance boost).

georgwiese added this to Zero Gravity May 15, 2023

georgwiese converted this from a draft issue May 15, 2023

georgwiese moved this from Backlog to Todo in Zero Gravity May 16, 2023

georgwiese moved this from Todo to In Progress in Zero Gravity May 25, 2023

georgwiese self-assigned this May 25, 2023

georgwiese mentioned this issue Jun 5, 2023

Performance Optimizations #26

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize Performance #14

Optimize Performance #14

georgwiese commented May 15, 2023 •

edited

Loading

georgwiese commented May 26, 2023 •

edited

Loading

Optimize Performance #14

Optimize Performance #14

Comments

georgwiese commented May 15, 2023 • edited Loading

georgwiese commented May 26, 2023 • edited Loading

Preliminaries

Using the V1 floor planner instead of SimpleFloorPlanner

Reducing the number of columns

Reducing the number of gates and lookups

Reducing the degree bound

Summary

georgwiese commented May 15, 2023 •

edited

Loading

georgwiese commented May 26, 2023 •

edited

Loading

Using the `V1` floor planner instead of `SimpleFloorPlanner`