Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fix] Selection of transition ID in finalize. #2575

Open
wants to merge 13 commits into
base: staging
Choose a base branch
from

Conversation

d0cd
Copy link
Contributor

@d0cd d0cd commented Nov 14, 2024

Motivation

The motivating issue for this PR is here.

From the original discussion:

Fundamentally, while the call graph is constructed correctly, it is used in an unsound way to find the transition ID that corresponds to Future that is passed in as an operand to an await command (source). The child_transition_id linked above is used to initialize the FinalizeRegisters. The transition_id in FinalizeRegisters is only used to initialize an RNG in the rand instructions. The purpose is the ensure that on-chain logic (in finalize scopes) have a unique seed each time they are invoked.

The current implementation has the following issues:

  1. In the case where an async function calls a mix of async and non-async functions, a non-existent transition is used in the call graph. Specifically it must an async function call that itself makes an async function call that precedes a non-async call. This results in the transaction getting rejected. In this case, the impacted programs are ones that use a specific mix of async and non-async calls.
  2. In the case where only async calls are made, but are awaited in a different order than the call graph, an incorrect transition ID is used for the FinalizeRegisters. The only place where the transition ID is used is in the rand.chacha instruction. In this case, the impacted programs are ones that make async calls to programs with rand.chacha but await the futures in a different order.

There are a number of possible ways this could be addressed:

  1. Update the implementation to get the correct transition ID. This would require more information to be passed in during finalization.
  2. Add a transition ID to the Future. This would be a breaking change in the data format, but conceptually sound.
  3. Fix the use of the transition ID to not require the precise sub-transition, while maintaining the invariant that RNG seeds are unique.

This PR proposes a solution to this issue by introducing a nonce to FinalizeRegisters. This nonce is used to seed the rand commands. Furthermore, instead of attempting to determine the child_transition_id that corresponds to an awaited Future from the call graph, this approach uses the main transition ID to initialize all FinalizeRegisters for a given transaction. The main transition ID along with the nonce ensures that each finalize context uses a unique seed for the RNG, while removing the need to correctly determine the transition ID for a given Future (a complicated process).

Migration

This proposal has been written to migrate at N::CONSENSUS_V2_HEIGHT.
Given timelines, it's more likely that a new N::CONSENSUS_V3_HEIGHT would need to be introduced. The migration would follow the process introduced in ARC-0042.

Test Plan

This PR includes a test, whose expected output demonstrates the the failure and the fix after CONSENSUS_V2_HEIGHT is reached.

Included is the CI branch and the CI diff.

Impact

As stated above, there are two classes of programs that are impacted by this issue:

  1. those that use a mix of async and non-async calls in a specific way.
  2. that make async calls to programs with rand.chacha but await the futures in a different order that the async functions were called.

In scanning all Aleo programs deployed on Mainnet as of 11/12/24 5PM PT:

  1. There are 10 functions among all programs deployed to mainnet, which contain a non-async call, followed by an async call.
puzzle_spinner_v001.aleo/spin - Not impacted b/c puzzle_arcade_ticket_v001.aleo/mint does not make an async call
puzzle_spinner_v002.aleo/spin - Same as above
arcn_puc_in_helper_v2_2_3.aleo/swap_amm_credits_in - Not impacted b/c token_registry.aleo/transfer_private_to_public does not make an async call
arcn_puc_in_helper_v2_2_4.aleo/swap_amm_credits_in - Same as above
arcn_credits_in_helper_v2_2_2.aleo/remove_liq_credits_is_token1 - Same as above
arcn_credits_in_helper_v2_2_3.aleo/remove_liq_credits_is_token1 - Same as above
arcn_pub_v2_2_3.aleo/create_pool - Not impacted b/c arcn_pool_v2_2_2.aleo/transfer_lp_receipt_by_salt does not make an async call
arcn_priv_v2_2_2.aleo/create_pool - Same as above
arcn_priv_v2_2_3.aleo/create_pool - Same as above
arcn_pub_v2_2_2.aleo/create_pool - Same as above
  1. There are 12 programs that use the rand.chacha command and 0 programs that import, and consequently, call them.

This was confirmed by a static analyzer on all programs on mainnet at the above date and time and double-checked by manual audit.

Related PRs

To Reviewers

An important function of this PR is to provide an understanding of how async execution mechanism works and why this PR is needed. If reviewers need context or clarification, please feel to post your questions in this thread. I am also happy to do a call explaining the original and proposed design/code.

@d0cd d0cd requested a review from ljedrz November 19, 2024 16:47
Copy link
Collaborator

@ljedrz ljedrz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not very knowledgeable about this setup, but I don't see any engineering issues.

Copy link
Contributor

@iamalwaysuncomfortable iamalwaysuncomfortable left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple comments about the migration process + seeding the nonce

Comment on lines +107 to +111
let call_graph = if state.block_height() >= N::CONSENSUS_V3_HEIGHT {
Default::default()
} else {
self.construct_call_graph(execution)?
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is a bit tangential to what's being fixed, but my fear is this approach of a CONSENSUS_VX_HEIGHT is quickly creating a situation where we could have double digit consensus versions and migratory logic like this in many places within the the SnarkVM codebase. This could easily lead to nontrivial errors.

At some point soon, the community should come up with some kind of migration model and spec which lays out a standard format for migration.

I haven't thought this out in enough depth yet, but I imagine we might have some kind of migration decorator-style pattern (perhaps a macro? Enums that apply migration functions, etc.?) that clearly denotes a change in consensus rules, specifies the logic at each and does a standard set of checks.

After the higher priority fixes mainnet the community should do some active design to keep future updates and fixes maintainable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a good call out.
Your macro suggestion reminded me of the malice approach, maybe we can borrow the design there.
We could also enforce stricter standards around migration logic, requiring that a README is maintained in this repo, which defines the invariants around each new CONSENSUS_V*_HEIGHT.

Copy link
Contributor

@vicsn vicsn Nov 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 very much something worth thinking about and documenting.

One design principle I'm confident about (which also went wrong in the first design of the Fee lowering PR) is that wherever consensus changes are introduced, they should be independent full self-contained functionality, and not complex derivations of each other's functionality.

A second one: whether we should always use < or >= or a match for the comparison.

let mut call_graph = HashMap::new();
// Insert the fee transition.
call_graph.insert(*fee.transition_id(), Vec::new());
let call_graph = if state.block_height() >= N::CONSENSUS_V3_HEIGHT {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any realistic possibility this can use CONSENSUS_V2 is the chain expected to pass that block by the time this fix is applied?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so, changes for CONSENSUS_V2 were locked in as of last week.

};

// Increment the nonce.
nonce += 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this random enough? It seems like this might be in danger of creating aborted transitions if say, the user accidentally broadcasts twice. Maybe some kind of replay shenanigans are possible? Nothing immediately comes to mind but it's worth thinking out scenarios.

Would it be worth initializing the nonce not as 0 but as an prn within a range that couldn't reach saturation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prior to this change, the on-chain RNG for any given transition was seeded by the bits of

registers.state().random_seed(),
**registers.transition_id(),
stack.program_id(),
registers.function_name(),
self.destination.locator(),
self.destination_type.type_id(),
seeds

In the new model, we do

registers.state().random_seed(),
**registers.transition_id(),
stack.program_id(),
registers.function_name(),
registers.nonce(),
self.destination.locator(),
self.destination_type.type_id(),
seeds

The subtle difference is that in the new model, we use the transition ID of the root transition in a transaction. The nonce is incremented for each child transition, ensuring uniqueness in the seed without requiring the need to determine the child transition ID.

In general, the seed should be deterministic but unique for each on-chain execution of a transition.

@@ -104,7 +104,11 @@ impl<N: Network> Process<N> {
lap!(timer, "Verify the number of transitions");

// Construct the call graph.
let call_graph = self.construct_call_graph(execution)?;
let call_graph = if state.block_height() >= N::CONSENSUS_V3_HEIGHT {
Default::default()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would None be clearer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The value here should be an empty hash map.
Are you suggesting to define call_graph on L188 as an Option<HashMap<N::TransitionID, Vec<N::TransitionID>>>?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is indeed the suggestion!

// Insert the fee transition.
call_graph.insert(*fee.transition_id(), Vec::new());
let call_graph = if state.block_height() >= N::CONSENSUS_V3_HEIGHT {
Default::default()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would None be clearer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above.

synthesizer/process/src/finalize.rs Outdated Show resolved Hide resolved

program outer.aleo;

// A call to `call_mid` should be rejected because the complex non-async call is before the async ones.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain why there's still cases which are rejected after this PR, is it effectively incorrect syntax? Same for the comment in the next function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I realize the comment is not clear. Updated for clarity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants