Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize Performance #14

Open
georgwiese opened this issue May 15, 2023 · 1 comment
Open

Optimize Performance #14

georgwiese opened this issue May 15, 2023 · 1 comment
Assignees

Comments

@georgwiese
Copy link
Member

georgwiese commented May 15, 2023

Depends on #13

I think we should consider some basic performance optimizations, like:

  • Try to reduce the number of columns (by using more rows and copy constraints)
  • Check if we can disable equality checks on some advice columns
  • Use the V1 floor planner instead of SimpleFloorPlanner

I think the number of rows is already pretty optimized with #12.

@georgwiese georgwiese converted this from a draft issue May 15, 2023
@georgwiese georgwiese moved this from Backlog to Todo in Zero Gravity May 16, 2023
@georgwiese georgwiese moved this from Todo to In Progress in Zero Gravity May 25, 2023
@georgwiese georgwiese self-assigned this May 25, 2023
@georgwiese
Copy link
Member Author

georgwiese commented May 26, 2023

I've done a few experiments (including the changes in #17). This is an intermediate report.

Preliminaries

I added a branch on my Halo2 fork that prints out some useful information with regard to performance.

Here's a run with the "MNIST_small" model (models/model_28input_1024entry_2hash_2bpi.pickle.hdf5):

 === Degree overview:
  Minimum degree required by permutation argument: 3
  Minimum degree required by lookup arguments: 6
    bloom filter lookup: 5
    bit_lookup: 5
    byte_decomposition: 5
    x is byte: 5
    diff is byte: 5
    lookup: 6
  Minimum degree required by gates: 4
  Minimum degree required by circuit: 1
  Total degree: 6

----------------------------------------
create_proof Total: 29.679
  Before synthesize: 0.082 (0.3%)
  synthesize: 0.186 (0.6%)
  Compute advice and challenges: 0.388 (1.3%)
  Compute lookups: 2.652 (8.9%)
  Compute permutations: 1.869 (6.3%)
  Compute lookup products: 4.278 (14.4%)
  Compute vanishing argument: 0.604 (2.0%)
  Compute advice polys: 0.282 (1.0%)
  Compute h_poly: 12.958 (43.7%)
  Before create_proof: 3.642 (12.3%)
  create_proof: 2.732 (9.2%)
----------------------------------------

Took: 29.707698257s

You can see that:

  • Our degree bound is 6, and it's dominated by the range check gadget by halo2gadgets (they named their lookup lookup 👌)
  • With the changes of Performance optimization: Instantiate Chips only once #18, circuit synthesis now only takes 0.6% of the time.
  • Almost half of the proving time is used to compute the quotient polynomial (which is expected I guess)

Using the V1 floor planner instead of SimpleFloorPlanner

It did not reduce rows enough to reduce k (using the model_28input_256entry_1hash_1bpi.pickle.hdf5 model), but increased synthesis time. I think that might be because I didn't Circuit::without_witnesses() such that the witness is actually None, which might lead to more computation.

In summary, I think it might still be worth doing, but will only help for some models.

Reducing the number of columns

I removed the selector_accumulator column in the ByteSelectorChip (also removing some gates), which allowed me to reduce the number of columns from 6 to 5. The speed-up was 3.9%.

In summary, I think there's not much to be gained from reducing the number of columns.

Reducing the number of gates and lookups

We currently define 11 gates and 5 table lookups. According to this lecture, table lookups are at least 4-times as expensive as gates.

To get a feel how much speed-up we can expect, I went to the GreaterThanChip and replaced:

  • The gate that checks that the result is bit with a range-check in the existing range-check gadget.
  • The lookup that checks that x is a byte with a range-check in the existing range-check gadget.
  • The lookup that checks that diff is a byte with a range-check in the existing range-check gadget.

The speed-up was 12%.

Note that for these experiments, I did increase k from 12 to 14! The additional range-checks need many rows, so we can only realize this speed-up if we manage to keep the number of rows the same.

Reducing the degree bound

Ironically, the range-check (implemented by halo2gadgets) also has the lookup with the highest degree (6, all other gates / lookups have degree of at most 5). So, on top of the previous step (removing 1 gate and 2 lookups), I removed the range checks as well (all while keeping k constant, of course).

The speed-up (compared to the previous step) was 27%.

Summary

I'd say there are two possible areas for improvement:

  • Using the V1 floor planner. It will probably help for some models, and when it does help, it reduces the number of rows by a factor of two (and proving time by about as much). Would be good if it doesn't lead to slower times when it doesn't make a difference.
  • It's worth investigating if we can implement the range check gadget more efficiently. This includes both row efficiency (perhaps there is a design using more than one column which could be packed more compactly) and reducing the degree (which by itself should give a significant performance boost).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

No branches or pull requests

1 participant