MuSig trade protocol #3108

HenrikJannsen · 2025-01-18T00:46:57Z

HenrikJannsen
Jan 18, 2025
Maintainer

I would like to give an overview about the current state machine used in Bisq Easy and discuss the requirements, challenges and architecture for Bisq MuSig trade protocol.

There might be other candidates to model a trade protocol beside a state machine. Petri net was discussed (and implemented?) by the Farcaster Monero cross-chain swap protocol. I am no familiar with that and will discuss below only the state machine approach.

Bisq 2 FSM

Overview

The bisq.common.fsm.FSM is a simple trade-domain specific state machine.

It defines state transitions as a map where the key is a pair of source state and event and the value is a transition.
The Transition has a set of source states, a target state, the event class (which we use as event type) and an optional event handler class which gets executed at the transition.

It supports events arriving out or order by storing events which arrived at the wrong state in a queue, which gets reprocesses at the next state transition. The queued pending events are not persisted.

It adds domain specific restrictions like not allowing to get back to any state with a lower enum ordinal.

It provides a fluent interface which should make the protocol easier to read and understand. Some of those constructs have no functional purpose but only serves readability and structure.

Here a basic transition definition:

.from(INIT) // Source state
.on(BisqEasyTakeOfferRequest.class)  // The event
.run(BisqEasyTakeOfferRequestHandler.class)  // Event handler
.to(MAKER_SENT_TAKE_OFFER_RESPONSE...) // Target state
.then() // Syntactic sugar

The .then() is here only for readability and has no function.

We support also fromAny() which means that the transition can happen from any state. This is used for instance for handling errors.
fromStates(...) allows to transition from a set of source states.

branch() and path() are used to model a branch of the protocol, e.g. an alternative path. The path is only for informational purpose and takes a description parameter. The branch also has only the purpose to layout the protocol in a more visually represented structure of multiple branches.

The current FSM executes the event handler in the same thread as blocking method call.
The handle method (state transition) is triggered for instance by the message handler which runs as dispatcher thread or in case of UI events the UI thread. We could easily wrap that call on a dedicated thread of the caller thread should not get blocked for the time the event gets processed. The FSM is blocked until the the method completes as we use a synchronize on the whole object.

Events usually contain data. When a state transition is defined for the current state and the received event we enter a state transition in which the event handler gets executed.

Here we have 3 phases:

Validation of the data we received from the event. This is specially important if the event comes from the trade peer.
Process the event. We do not apply changes to the data models at that stage but keep the changes locally only. If an exception happens in that phase we do not want that the model has changes it's state but the state machines state transition has not completed as that would represent an inconsistency.
Commit to model: After the successful processing we apply the locally stored data to the underlying model.

In case it is not possible to not effect underlying models during the processing (e.g. the wallet state gets changed at processing but if a later process step fails we have already changed the wallet state) we should consider to split the state transition so that the model state change is reflected by the state machine change.

The underlying model is the Trade class which extends the FsmModel. Changes from the event processing is reflected in the state of that Trade model.

See bisq.trade.bisq_easy.protocol.BisqEasyBuyerAsMakerProtocol as an example.

Design goals

The goal of the trade protocol state machine is to restrict structurally as narrow as possible to avoid invalid transitions which could be exploited or caused by design bugs. E.g. do not permit to go back to states with lower ordinal. Or in case of multiple paths to not permit to move from the new path back to the old path.

A reason why no existing general purpose FSM implementation was used is because those are designed without such restrictions in mind. The Bisq FSM was inspired by https://github.com/j-easy/easy-states.

Another goal is to provide a protocol definition which is easy to read and to reason about it. It should make it for protocol designers easier to cover all potential cases and avoid design bugs and security vulnerabilities.

Open issues

The branch feature does not result in any structural constraint as it is implemented now. If a wrong state transition is defined, the FSM would not prevent to jump from one branch to another.
A way how to enforce such a structural constraint might be to fork the FSM when branching to a new FSM, terminate the first one and from that forked FSM there is no way back to the previous one.

The event handler has no enforced structure for the validate - process - commit phases. The developers are responsible to stick to that pattern. An enforcement by structure would be preferred but no good solution has been found yet. Though that is a minor issue.

There is no support for groups of states (phases) where several detail states are collected. Such grouping would help to give better high level overview. Even for Bisq Easy the amount of states is quite large and not easy to reason about.

Discussion

The current FSM has worked well for Bisq Easy, though this is a much more simple protocol than MuSig and even here we have seen that multiple optional paths can add lot of complexity. It does also not handle the mediation case, which is represented as property in the trade but not as state. This is OK for Bisq Easy, but maybe not the preferred way to design it for other protocols where entering mediation or arbitration has more impact to the trade process.

We do not have error states which are recoverable. Any error will lead to a terminal error state. For other protocols we might want to model certain non critical or recoverable error states as dedicated protocol states, maybe as alternative paths.

To support hierarchical state machines, where a new FSM is created for modelling the non-happy path might be needed.

For an older discussion about the current FSM see also: #938

MuSig protocol

For MuSig we can assume 3 types of events:

User intents (e.g. button click to signal to have received the fiat payment)
P2P network events: e.g. trade message arrived
Blockchain events: e.g. transaction got confirmed

We have to ensure that a event processing is an atomic transaction and cannot cause an inconsistent state in case the processing fails (e.g. not commit data to a wallet before the processing is completed).

Testing

Test protocol consistency

We should find a test setup which comes with minimal integration requirements (wallet, blockchain, p2p network) for a basic protocol test, which is focused to ensure there are no protocol bugs and security vulnerabilities.

Wallet/blockchain data related tests

I guess there is already a test setup in place to test the wallet and crytograhic aspects.

Integration test

I assume a full integration test covering wallet/blockchain and p2p network comes with considerable effort.
@alvasw has developed a test setup for regtest bitcoind which might be useful for integration tests. Maybe he can give some input here.

Rust/Java interaction

An open question is if we should develop the trade protocol in Rust or Java.

Arguments for Java:

Most (all?) involved developers have more Java experience than Rust experience
Protocol is not performance critical, thus would not benefit from better Rust performance
We can easier integrate existing infrastructure and data models
Most events are triggered by UI and P2P network messages which come on the Java layer.

The communication between the 2 processes could happen with protobuf and grpc. ZeroMQ might be another option.
I don't think a REST API with json is the right approach here, as it is a low level inter process communication.

We can assume that the 2 processes are running on the same machine on localhost. With that environment I assume no authentication is required for the communication setup.

ChrisSon15 · 2025-01-21T19:29:46Z

ChrisSon15
Jan 21, 2025

The approach seems exactly what is needed. I like the simplicity, yet the good overview one gets from reading it.
I can see that fit very well with the bisq-musig-rust stuff I am developing right now. The rust code would fit very well in the Handlers. Also Agree the FSM should be in Java for the reasons you mentioned, with may be one exception, the access to the blockchain may be better in Rust, so we can keep all low level bitcoin libraries in rust.
Right now there is some test environment already in the project (nigiri docker with bitcoind and esplora and others). That is needed for testing the rust stuff.
thanks for the writeup seems like a good fit. Will keep the structure of FSM in mind while going forward, so we will have an easy integration.

3 replies

ChrisSon15 Jan 22, 2025

I have been reading a bit in bisq.trade.bisq_easy.protocol.BisqEasyBuyerAsMakerProtocol. The Handlers in the bisq-musig protocol will be significantly more complex and more data needs to be passed to and from Java to Rust. Also the number of rounds per User interaction is much higher. I don't think this is a performance issue, but something we need to keep an eye on.
This way the Rust stuff can be accessed in a library style (which means that Java calls Rust not the other way around). This makes the rust development easier, better testable and small.

Testing

To keep the rust development in sync with the FSM idiomatic pattern, I will write the test code to simulate the FSM calls. The test calls which tests the generation of the DepositTx is here:
https://github.com/ChrisSon15/bisq-musig/blob/8596557981f7f271b18cbba1b47590e8fea55034/bdktest/src/nigiri.rs#L63
I think this very close to how the protocol handlers will look like. Each method call into MusigProtocol-object will be one State in FSM and implemented as one ProtocolHandler to call the rust method.

Its important to test the rust code independently of the Java code, which we do right now by using some docker images of bitcoind, esplora, electrs,.. through nigiri. Integration testing with the FSM calling rust and the rust backend is being tight to the nigiri servers, would be easy as well.

HenrikJannsen Jan 22, 2025
Maintainer Author

...with may be one exception, the access to the blockchain may be better in Rust, so we can keep all low level bitcoin libraries in rust.

Yes, the handler (in java) can call the Rust library via the network protocol we use (e.g. grpc). We have to take into account the asynchronous nature of that call which differs from current use cases, but that should not be an issue. The caller which triggers the state transition need to run that in a thread.

I think all wallet/crypto/blockchain related tasks are that way.

I think this very close to how the protocol handlers will look like. Each method call into MusigProtocol-object will be one State in FSM and implemented as one ProtocolHandler to call the rust method.

I think we even do not need any close correlation. The Java FSM event handler just calls a remote method on the Rust process and gets the result. The Java handler blocks until the result is returned from the remote call and process the result to be applied to the trade model.

HenrikJannsen Jan 22, 2025
Maintainer Author

Do you have any opinion about the data transfer protocol?

Most communication will be remote method calls from the Java app to the Rust process.
But there are also cases where we need it the other way round, like for blockchain confirmations.
We could simply run a grpc server on both sides, so both can communicate in both directions. I guess that would be the easiest way.

Any reasons to not use grpc? There is the daemon module which has grpc dependencies and some POC skeleton for some use case which never got further developed so far.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MuSig trade protocol #3108

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 3 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

MuSig trade protocol #3108

HenrikJannsen Jan 18, 2025 Maintainer

Bisq 2 FSM

Overview

Design goals

Open issues

Discussion

MuSig protocol

Testing

Test protocol consistency

Wallet/blockchain data related tests

Integration test

Rust/Java interaction

Replies: 1 comment · 3 replies

ChrisSon15 Jan 21, 2025

ChrisSon15 Jan 22, 2025

Testing

HenrikJannsen Jan 22, 2025 Maintainer Author

HenrikJannsen Jan 22, 2025 Maintainer Author

HenrikJannsen
Jan 18, 2025
Maintainer

Replies: 1 comment 3 replies

ChrisSon15
Jan 21, 2025

HenrikJannsen Jan 22, 2025
Maintainer Author

HenrikJannsen Jan 22, 2025
Maintainer Author