Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SIMD-0170: Reserve minimal CUs for builtins #170

Merged
120 changes: 120 additions & 0 deletions proposals/simd-0170-builtin-instruction-cost-and-budget.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,120 @@
---
simd: '0170'
title: Reserve minimal CUs for builtins
authors:
- Tao Zhu (Anza)
category: Standard
type: Core
status: Review
created: 2024-08-26
feature: https://github.com/anza-xyz/agave/issues/2562
supersedes:
superseded-by:
extends:
---

## Summary

When transactions do not request a specific compute unit limit, the Solana
runtime conservatively allocates 200,000 compute units for each tx-level
instruction (excluding compute budget instructions) in the transaction's
compute budget. Since builtin program instructions consume far less than 200k
compute units, the runtime will be modified to only allocate 3,000 compute units
in the transaction compute budget for each of these instructions.

## Motivation

When allocating the transaction compute budget for transactions which don't
specify a specific compute unit limit, the Solana runtime over-allocates the
number of compute units needed for executing builtin programs which have a
small (mostly) static execution compute unit cost. The runtime will only release
the over-allocated compute units from the block cost tracker when transaction
execution finishes and the actual consume compute units is known.

Over-allocating CUs for builtin program instructions reduces block density,
lowers network throughput, and can degrade block producer performance by causing
repeated transaction retries (due to repeatedly over allocating cu's and then
releasing them after execution). Avoiding excessive CU allocation is critical
for maximizing block efficiency and minimizing network delays.

## New Terminology

None

## Detailed Design

Establish a static maximum allocation of 3,000 CU's for builtin programs denoted
as `MAX_BUILTIN_ALLOCATION_COMPUTE_UNIT_LIMIT`, which is larger than all actual
compute unit costs of builtin programs as well as the potential Cross-Program
Invocations (CPIs) they do. This uniform limit is applied to each builtin
program instruction and is used to configure both CU meter allocations and block
producer CU limits reservation for all builtin instructions.

tao-stones marked this conversation as resolved.
Show resolved Hide resolved
The static list of builtin program id's that will have 3000 compute units
allocated are:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a line here that also states the builtin must be owned by the native loader for this CU allocation to apply? Avoids any footguns when they move to BPF.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The list can still be static, we just also want that explicit requirement.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added in 0e6e0ca

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should update this list of builtin program id's when each of the builtin-to-bpf feature gates are activated. We're not going to check the owner of the builtin program each time during cost calculation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works for me, as long as an ID can be popped out on feature activation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, @tao-stones please make sure @buffalojoec gets added as a reviewer on your impl pr

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, will do


- Stake11111111111111111111111111111111111111
- Config1111111111111111111111111111111111111
- Vote111111111111111111111111111111111111111
- 11111111111111111111111111111111
- ComputeBudget111111111111111111111111111111
- AddressLookupTab1e1111111111111111111111111
- BPFLoaderUpgradeab1e11111111111111111111111
- BPFLoader1111111111111111111111111111111111
- BPFLoader2111111111111111111111111111111111
- LoaderV411111111111111111111111111111111111
- KeccakSecp256k11111111111111111111111111111
- Ed25519SigVerify111111111111111111111111111

Note that there are a few builtin programs not in the list including the
upcoming zk programs as well as the feature program.

The 3,000 CU threshold is based on analysis of current built-in programs and
reflects the unlikely need for additional complexity. For instance, the
`AddressLookupTable`’s `CreateLookupTable` performs the highest number of CPI
calls (up to three invocations of the System program), resulting in a maximum
CU demand of 1,200 CUs (750 + 3 × 150). Similarly, `UpgradeableLoader`’s
`ExtendProgram`, which may invoke the System program once, requires up to
2,520 CUs (2,370 + 150), representing the most resource-intensive operation
among current built-ins. The proposed 3,000 CU limit slightly exceeds this
requirement, allowing for controlled flexibility without an excessive margin.

If a transaction consists only of builtins, no explicit CU request should be
required. If a CU request is made, the requested limit will override the max
allocation in #2.

## Alternatives Considered

1. Do nothing and continue allocating 200k compute unit's for all non-compute
budget instructions. However, this fails to address over-allocation. Unlike
regular instructions, the execution of builtins is more predictable and usually
requires significantly fewer CUs than BPF instructions. This approach would
allocate more CUs than necessary, undermining the goal of efficient CU usage.

2. Declare both max and default CU values for builtins: A more precise
approach would be to require builtins to declare both a maximum CU
allocation (MAX_BUILTIN_ALLOCATION_COMPUTE_UNIT_LIMIT) and a default CU value
for each instruction (DEFAULT_BUILTIN_INSTRUCTION_COMPUTE_UNITS). This would
allow fine-tuned CU allocation based on each instruction’s potential execution
path. However, this introduces new constraints for existing and future
builtins, requiring updates to comply with these rules, which could
overcomplicate the design.

## Impact

Users who previously relied on including builtin instructions, instead of
explicitly setting compute-unit limits, to allocate budget for their
transactions may experience an increase in transaction failures. To avoid this,
users are encouraged to use set_compute_unit_limit to explicitly request the
necessary budget for their transactions.
tao-stones marked this conversation as resolved.
Show resolved Hide resolved

If those impacted users have issues fitting in the set compute unit limit
instruction into their transactions due to tx data size limits, they can also
migrate to using address lookup tables to fit in the compute budget instruction
call.

## Security Considerations

Both Agave and FD clients should implement this proposal to avoid breaking
consensus.

Loading