Skip to content
This repository has been archived by the owner on Nov 11, 2022. It is now read-only.

Atomic execution protocol and primitives #13

Merged
merged 25 commits into from
Jul 28, 2022
Merged

Atomic execution protocol and primitives #13

merged 25 commits into from
Jul 28, 2022

Conversation

adlrocha
Copy link

@adlrocha adlrocha commented Jul 15, 2022

This PR:

  • Implements all SCA function to handle atomic execution.
  • Implements the basic atomic primitives and types required for the implementation of the SCA functions.

In the next PR:

  • We'll finish writing all the required primtives to implement actors that support atomic executions (i.e. hierarchical_sca/src/atomic/mod.rs).
  • We'll write a sample actor (potentially in another repo) that use these primitives to support atomic executions.

@adlrocha adlrocha requested a review from aakoshh July 21, 2022 15:17
@adlrocha adlrocha marked this pull request as ready for review July 21, 2022 15:17
@adlrocha
Copy link
Author

@aakoshh, may I have a quick review? This PR only includes the SCA functions to orchestrate atomic executions. I will work in the primitives and a sample actor in the next PR (to avoid making this one too long). Thanks 🙏

@adlrocha
Copy link
Author

Thank you, as always, for your thorough review, @aakoshh. I think I've addressed almost all your comments (and deferred some to future PRs). Let me know if we are happy to merge or if we should address something else 🙏

@aakoshh
Copy link

aakoshh commented Jul 26, 2022

Thanks for following through with all my comments.

I hope that the next PR with the lockable actor will make things clearer, but I was wondering: is the call to init_atomic_exec of the SCA coming from a user directly? Or is it coming as a bottom-up message via checkpoints?

I'm curious how the system can ensure that the same state is not locked in multiple atomic executions that have different levels as their common ancestor. Say we have subnets like this:

     S0
    /  \ 
   S1  S2
  /  \
S3    S4

And there's an actor in S3 that wants to do an atomic swap with one in S4 and another in S2. It locks some state. Then what prevents the user from sending the same locked CID to both S1 and S0? Or sending something that isn't locked at all?

If the "I locked some state for atomic execution X" actually comes from the actor in S3 then it's fine because it can only lock once. But that's not the impression I get from the spec.

Ultimately I want guarantee that if something is locked it's locked, if somebody is committed to the output they are committed, and I can only see this guaranteed if the messages are unforgeable and come from the actors involved, not crafted by users.

You might say that the users can go to each others' subnets and check with the respective actors whether the state is locked. But looking at the model, all I can see is in the LockedMapCid is that the state they claimed was locked is indeed locked now, but there's no guarantee it's locked because of the atomic execution I am interested in. The next block might unlock them if they also locked it for a different atomic execution that I am unaware of and it's already finishing, and then I'll be committed to something they no longer are, and when the output from our atomic execution reaches them they'll just drop it.

@aakoshh
Copy link

aakoshh commented Jul 26, 2022

Just theorizing, but maybe the atomic execution protocol could go like this:

  1. Participants call lock on some state of theirs to get the corresponding input CID.
  2. They all agree on the AtomicExecutionParams from the msgs and inputs they now know.
  3. One of them sends the AtomicExecutionParams to the nearest common ancestor, they all learn the resulting CID of the record - although it's a good question how they can trust anyone telling them what it is because of changing the address strings to IDs. Maybe the CID should be that of the original, raw params?
  4. New step: Everyone calls pre_commit(lock_cid, atomic_exec_params_cid, output) in their own subnet. The lockable actor now associates the locked data with this execution, and sends the commitment to output_cid up the chain as a bottom-up message.
  5. The SCA in the atomic execution subnet gathers to pre-commits, and issues the commit(atomic_exec_params_cid) once everyone's vote has arrived.
  6. The actors in the descendant subnets receive the commit message and do the unlock.

So, the change are that the output isn't coming from a top-down message, it's the participants entering it into their respective subnets; doing so triggers a pre-commit message going up, and the locked state being associated with the atomic execution, so it can't be associated with anything else, until abort or commit arrives.

@aakoshh
Copy link

aakoshh commented Jul 26, 2022

@adlrocha I see that this force-push undid the changes in my PR: https://github.com/adlrocha/builtin-actors/compare/12d67b98bc7c144b8956a0db119a9792cfa06651..a24c90e249da694a1ef35723b94d26e2479a0155

Not sure if it was intentional or you just forgot to pull?

@adlrocha
Copy link
Author

@adlrocha I see that this force-push undid the changes in my PR: https://github.com/adlrocha/builtin-actors/compare/12d67b98bc7c144b8956a0db119a9792cfa06651..a24c90e249da694a1ef35723b94d26e2479a0155
Not sure if it was intentional or you just forgot to pull?

It was intentional, sorry. Using addresses broke something in the tests and I had to revert the commit so I could keep addressing your comments (see https://github.com/adlrocha/builtin-actors/pull/13#discussion_r930110428). Did tests passed for you? I actually just tested your branch again and it fails, and I am afraid that this is because Address is not supported as a key for maps for the CBOR serialization (this is why I used Strings in the first place :( ).

@adlrocha
Copy link
Author

I hope that the next PR with the lockable actor will make things clearer, but I was wondering: is the call to init_atomic_exec of the SCA coming from a user directly? Or is it coming as a bottom-up message via checkpoints?

In the current implementation it is the user the one that directly sends a message in the common parent to initialize or submit the execution, now cross-net messages involved.

I'm curious how the system can ensure that the same state is not locked in multiple atomic executions that have different levels as their common ancestor.

You are totally right. We wouldn't be able right now to identify that a user is using the same locked state for two executions. Actually, in the current implementation we wouldn't even be able to identify that the same locked state is used in two computations in the same common parent (we are not doing that check, and we should).

I opened this discussion to continue there (I don't want comments to get lost in this PR).

@aakoshh
Copy link

aakoshh commented Jul 27, 2022

and I am afraid that this is because Address is not supported as a key for maps for the CBOR serialization (this is why I used Strings in the first place :(

How odd. Then maybe we can wrap it with yet another type, an AddressKey type, that overrides the Cbor serialisation to use the String format of the wrapped address.

Sorry, I didn't run the tests, they don't even compile for me because some cases in a match aren't handled. Let me give it another shot then.

@adlrocha
Copy link
Author

adlrocha commented Jul 27, 2022

Sorry, I didn't run the tests, they don't even compile for me because some cases in a match aren't handled. Let me give it another shot then.

I've moved a lot of things around, so you may need to rebase if we want to be able to merge the change directly, I am really sorry :( I am starting to work on the sample actor supporting atomic execution, I won't touch anything from this PR so you can do your magic. Thanks 🙏

@aakoshh
Copy link

aakoshh commented Jul 27, 2022

Here's a second take, tested this time: https://github.com/adlrocha/builtin-actors/pull/16

Add TAddressKey to get around the CBOR limitation with HashMap
@aakoshh
Copy link

aakoshh commented Jul 27, 2022

Okay, so I'm still not 100% convinced about these:

But if you think these are okay, then merge ahead :)

@adlrocha
Copy link
Author

Okay, so I'm still not 100% convinced about these:

* [Atomic execution protocol and primitives #13 (comment)](https://github.com/adlrocha/builtin-actors/pull/13#discussion_r930284319)

* [Atomic execution protocol and primitives #13 (comment)](https://github.com/adlrocha/builtin-actors/pull/13#discussion_r930287000)

But if you think these are okay, then merge ahead :)

Answered, let me know if it makes sense. In any case, as we are proposing some changes to the overall protocol, maybe it's a good idea to have this merged so we have the core to start including the new changes (once we have the whole design).

@adlrocha
Copy link
Author

adlrocha commented Jul 28, 2022

Quick update on this: I think that with the recent discussions we're having, the basic atomic primitives will change a lot, and we will be able to lock several parts of the state using different messages over the same actor. The new architecture should remove many of the simplifications used in the MVP. Parts of the code from this PR will change a bit with this new approach. Let's note down our concerns, merge this PR and review them in the next one. WDYT?

I am already tinkering with these primtives and the states. I'll try to have a draft PR soon so we can discuss over it.

Copy link

@aakoshh aakoshh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, let's merge this PR, it was a great platform to discuss, but we already know some things will change so there's no reason to keep arguing about the minute details here.

@adlrocha adlrocha merged commit fa871a8 into master Jul 28, 2022
@adlrocha adlrocha deleted the feat/atomic-exec branch July 28, 2022 16:21
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants