Atomic execution protocol and primitives #13

adlrocha · 2022-07-15T16:30:34Z

This PR:

Implements all SCA function to handle atomic execution.
Implements the basic atomic primitives and types required for the implementation of the SCA functions.

In the next PR:

We'll finish writing all the required primtives to implement actors that support atomic executions (i.e. hierarchical_sca/src/atomic/mod.rs).
We'll write a sample actor (potentially in another repo) that use these primitives to support atomic executions.

adlrocha · 2022-07-21T15:18:05Z

@aakoshh, may I have a quick review? This PR only includes the SCA functions to orchestrate atomic executions. I will work in the primitives and a sample actor in the next PR (to avoid making this one too long). Thanks 🙏

actors/hierarchical_sca/src/atomic/mod.rs

actors/hierarchical_sca/src/cross.rs

actors/hierarchical_sca/src/exec.rs

actors/hierarchical_sca/src/lib.rs

adlrocha · 2022-07-26T16:34:40Z

Thank you, as always, for your thorough review, @aakoshh. I think I've addressed almost all your comments (and deferred some to future PRs). Let me know if we are happy to merge or if we should address something else 🙏

actors/hierarchical_sca/src/state.rs

aakoshh · 2022-07-26T18:50:27Z

Thanks for following through with all my comments.

I hope that the next PR with the lockable actor will make things clearer, but I was wondering: is the call to init_atomic_exec of the SCA coming from a user directly? Or is it coming as a bottom-up message via checkpoints?

I'm curious how the system can ensure that the same state is not locked in multiple atomic executions that have different levels as their common ancestor. Say we have subnets like this:

     S0
    /  \ 
   S1  S2
  /  \
S3    S4

And there's an actor in S3 that wants to do an atomic swap with one in S4 and another in S2. It locks some state. Then what prevents the user from sending the same locked CID to both S1 and S0? Or sending something that isn't locked at all?

If the "I locked some state for atomic execution X" actually comes from the actor in S3 then it's fine because it can only lock once. But that's not the impression I get from the spec.

Ultimately I want guarantee that if something is locked it's locked, if somebody is committed to the output they are committed, and I can only see this guaranteed if the messages are unforgeable and come from the actors involved, not crafted by users.

You might say that the users can go to each others' subnets and check with the respective actors whether the state is locked. But looking at the model, all I can see is in the LockedMapCid is that the state they claimed was locked is indeed locked now, but there's no guarantee it's locked because of the atomic execution I am interested in. The next block might unlock them if they also locked it for a different atomic execution that I am unaware of and it's already finishing, and then I'll be committed to something they no longer are, and when the output from our atomic execution reaches them they'll just drop it.

aakoshh · 2022-07-26T19:13:19Z

Just theorizing, but maybe the atomic execution protocol could go like this:

Participants call lock on some state of theirs to get the corresponding input CID.
They all agree on the AtomicExecutionParams from the msgs and inputs they now know.
One of them sends the AtomicExecutionParams to the nearest common ancestor, they all learn the resulting CID of the record - although it's a good question how they can trust anyone telling them what it is because of changing the address strings to IDs. Maybe the CID should be that of the original, raw params?
New step: Everyone calls pre_commit(lock_cid, atomic_exec_params_cid, output) in their own subnet. The lockable actor now associates the locked data with this execution, and sends the commitment to output_cid up the chain as a bottom-up message.
The SCA in the atomic execution subnet gathers to pre-commits, and issues the commit(atomic_exec_params_cid) once everyone's vote has arrived.
The actors in the descendant subnets receive the commit message and do the unlock.

So, the change are that the output isn't coming from a top-down message, it's the participants entering it into their respective subnets; doing so triggers a pre-commit message going up, and the locked state being associated with the atomic execution, so it can't be associated with anything else, until abort or commit arrives.

aakoshh · 2022-07-26T20:59:34Z

@adlrocha I see that this force-push undid the changes in my PR: https://github.com/adlrocha/builtin-actors/compare/12d67b98bc7c144b8956a0db119a9792cfa06651..a24c90e249da694a1ef35723b94d26e2479a0155

Not sure if it was intentional or you just forgot to pull?

adlrocha · 2022-07-27T07:07:44Z

@adlrocha I see that this force-push undid the changes in my PR: https://github.com/adlrocha/builtin-actors/compare/12d67b98bc7c144b8956a0db119a9792cfa06651..a24c90e249da694a1ef35723b94d26e2479a0155
Not sure if it was intentional or you just forgot to pull?

It was intentional, sorry. Using addresses broke something in the tests and I had to revert the commit so I could keep addressing your comments (see https://github.com/adlrocha/builtin-actors/pull/13#discussion_r930110428). Did tests passed for you? I actually just tested your branch again and it fails, and I am afraid that this is because Address is not supported as a key for maps for the CBOR serialization (this is why I used Strings in the first place :( ).

adlrocha · 2022-07-27T07:42:18Z

I hope that the next PR with the lockable actor will make things clearer, but I was wondering: is the call to init_atomic_exec of the SCA coming from a user directly? Or is it coming as a bottom-up message via checkpoints?

In the current implementation it is the user the one that directly sends a message in the common parent to initialize or submit the execution, now cross-net messages involved.

I'm curious how the system can ensure that the same state is not locked in multiple atomic executions that have different levels as their common ancestor.

You are totally right. We wouldn't be able right now to identify that a user is using the same locked state for two executions. Actually, in the current implementation we wouldn't even be able to identify that the same locked state is used in two computations in the same common parent (we are not doing that check, and we should).

I opened this discussion to continue there (I don't want comments to get lost in this PR).

aakoshh · 2022-07-27T13:27:59Z

and I am afraid that this is because Address is not supported as a key for maps for the CBOR serialization (this is why I used Strings in the first place :(

How odd. Then maybe we can wrap it with yet another type, an AddressKey type, that overrides the Cbor serialisation to use the String format of the wrapped address.

Sorry, I didn't run the tests, they don't even compile for me because some cases in a match aren't handled. Let me give it another shot then.

adlrocha · 2022-07-27T13:33:35Z

Sorry, I didn't run the tests, they don't even compile for me because some cases in a match aren't handled. Let me give it another shot then.

I've moved a lot of things around, so you may need to rebase if we want to be able to merge the change directly, I am really sorry :( I am starting to work on the sample actor supporting atomic execution, I won't touch anything from this PR so you can do your magic. Thanks 🙏

aakoshh · 2022-07-27T15:11:49Z

Here's a second take, tested this time: https://github.com/adlrocha/builtin-actors/pull/16

Add TAddressKey to get around the CBOR limitation with HashMap

aakoshh · 2022-07-27T16:50:27Z

Okay, so I'm still not 100% convinced about these:

But if you think these are okay, then merge ahead :)

adlrocha · 2022-07-28T09:38:27Z

Okay, so I'm still not 100% convinced about these:

* [Atomic execution protocol and primitives #13 (comment)](https://github.com/adlrocha/builtin-actors/pull/13#discussion_r930284319)

* [Atomic execution protocol and primitives #13 (comment)](https://github.com/adlrocha/builtin-actors/pull/13#discussion_r930287000)

But if you think these are okay, then merge ahead :)

Answered, let me know if it makes sense. In any case, as we are proposing some changes to the overall protocol, maybe it's a good idea to have this merged so we have the core to start including the new changes (once we have the whole design).

adlrocha · 2022-07-28T13:42:42Z

Quick update on this: I think that with the recent discussions we're having, the basic atomic primitives will change a lot, and we will be able to lock several parts of the state using different messages over the same actor. The new architecture should remove many of the simplifications used in the MVP. Parts of the code from this PR will change a bit with this new approach. Let's note down our concerns, merge this PR and review them in the next one. WDYT?

I am already tinkering with these primtives and the states. I'll try to have a draft PR soon so we can discuss over it.

aakoshh

I agree, let's merge this PR, it was a great platform to discuss, but we already know some things will change so there's no reason to keep arguing about the minute details here.

wip: locking primitives

f7709ab

adlrocha mentioned this pull request Jul 19, 2022

🚧 | B3: Turn B1 into production-ready system for core actors and FVM consensus-shipyard/consensuslab#6

Closed

adlrocha added 4 commits July 19, 2022 19:04

wip: sca functions almost done

085ea15

wip: writing unit tests

2a5a89c

atomic execution functions and unit tests

4deb68a

added comments and use right generic types in atomic for next PR

5ca0e7e

adlrocha requested a review from aakoshh July 21, 2022 15:17

adlrocha marked this pull request as ready for review July 21, 2022 15:17