swap, swap/chain, contracts/swap: add transaction queue #2124

ralph-pichler · 2020-03-03T19:43:15Z

This PR introduces a central component, the TxScheduler, in charge of sending transactions, waiting for their result afterwards and ensuring the result was successfully processed. In the future this component is supposed to take care of more chain-related tasks.

The TxScheduler is an interface currently only implemented by the TxQueue which executes transactions in sequence (see #2006, the next one is only sent after the previous confirmed) in order to avoid most nonce-related issues (#1929 ).

The general idea behind the TxScheduler is as follows:

A component which wishes to send a transaction does not do so directly, instead it creates a chain.TxRequest and schedules it with the TxScheduler which returns an assigned id and takes care of the rest.
Every request has an associated handlerID which specifies which handler to notify of events for this request. A component should register itself as the handler for the handlerIDs it uses on startup.
The TxScheduler will execute the requests at some point and notify the appropriate handler. If the handler function failed, the notification does not count as delivered and will be tried again in the future. This guarantee is also preserved across restarts of the swarm client. The idea behind that is that other places in the code which need to send transaction no longer need to be concerned with issues like io errors, network problems, client restarts, etc.. They just queue the request and are guaranteed to be notified of its result at some point.
When scheduling the request the component can also attach extra data which is stored alongside the request data. The purpose is to provide meta-information about the request that is stored within the same atomic write.

For the TxQueue transactions are processed the following way:

Scheduled requests go into a queue which is persisted to disk to ensure nothing is lost on node shutdown or crashes. For this, the PersistentQueue was introduced as a helper structure.
The queue then processes those requests in a loop:
- It takes a request from the queue and makes it the active request
- It sets gas limit, gas price (if necessary) and the nonce and signs the transaction
- It sends the transaction to the backend
- It waits for a receipt to be available
If the node shuts down it will continue processing the active request the next time.
If anything in the process fails prior to sending, the transaction counts as cancelled, otherwise as "status unknown".
On IO / decode errors the queue terminates. Some of those might be recoverable after a while and future PRs might attempt to restart the queue at some point.

On the cashing out side the CashoutProcessor now accepts a CashoutResultHandler which it will notify of the cashout result. This is usually Swap. During tests this handler is overridden to keep track of cashed out cheques. This mechanism replaces the cashDone channel on the backend and therefore obsoletes the global cashCheque function variable and the setupContractTest function.

In this PR only the cashout transactions for the chequebook transaction go through this mechanism. This was done to keep the PR small. Smaller future PRs should

handle the error cases in the cashout processor (status unkown / cancelled)
move the deposit transactions to this mechanism
add confirmation monitoring
ensure the node is synced when generating transactions
add the ability for requests to expire (so that txs are not suddenly sent months later) OR the easier alternative of just marking all none pending requests as cancelled on startup, then the initiator has to take care of resending or not in the handler.

This PR is quite large so it might be useful to look at commits individually. The PR has been split into 3 commits. The first one is the PersistentQueue, the second one implements the actual queue and the third one integrates it with the cashout transactions.

closes #2006
closes #2005
closes #1929
closes #1634

swap/chain/backend.go

Eknir

Amazing work!!

My comments are mainly for explanation, but at points I also note that you have something defined, but not implemented. Perhaps, in these cases, it is better to leave the definition out and add this when you actually need it (especially since this PR will be used as a reference for developers who are going to implement the queue with other blockchain interactions in Swarm).

I do think that a markdown file, perhaps with diagrams would help to understand the architecture of this PR and assist developers who are going to work with this code. Maybe @vojtechsimetka could help with this.

After you addressed my comments and clarified things for me, I would like to go over this PR once again.

contracts/swap/swap.go

swap/chain/txscheduler.go

swap/cashout.go

swap/chain/persistentqueue.go

swap/chain/persistentqueue_test.go

swap/chain/txqueue.go

janos

This is a quite long PR and most of the implementation is pretty great. 👏 My comments are related to lock handling which I think can be better designed.

janos · 2020-03-05T16:03:28Z

swap/chain/persistentqueue.go

+// A lock must be held and kept until after the trigger function was called or the batch write failed
+func (pq *PersistentQueue) Queue(b *state.StoreBatch, v interface{}) (key string, trigger func(), err error) {
+	// the nonce guarantees keys don't collide if multiple transactions are queued in the same second
+	pq.nonce++


A possible data race on nonce field on concurrent Queue calls. There is a comment about the lock, but it would be nicer to have an api where locking is implicit.

see #2124 (comment)

janos · 2020-03-05T16:05:36Z

swap/chain/persistentqueue.go

+			lock.Lock()
+			key, exists, err = pq.Peek(i)
+			if exists {
+				return key, nil


A deadlock if exists, as the lock is not unlocked.

see #2124 (comment)

janos · 2020-03-05T16:05:45Z

swap/chain/persistentqueue.go

+	lock.Lock()
+	key, exists, err := pq.Peek(i)
+	if exists {
+		return key, nil


A deadlock if exists, as the lock is not unlocked.

see #2124 (comment)

janos · 2020-03-05T16:10:41Z

swap/chain/persistentqueue.go

+// No lock should not be held when this is called. Only a single call to next may be active at any time
+// If the the key is not "", the value exists, the supplied lock was acquired and must be released by the caller after processing the item
+// The supplied lock should be the same that is used for the other functions
+func (pq *PersistentQueue) Next(ctx context.Context, i interface{}, lock *sync.Mutex) (key string, err error) {


Lock handling is a bit strange. The function can return leaving the lock both unlocked or locked. I think that lock should be internal to the implementation, not exposed with the package API.

I think that it would be better to protect the queue with internal lock, then to relay on the queue user to do the locking. It is easy to unlock an already unlocked lock or to have a deadlock, by wrong usage.

Batch processing and writing require lock, but that could be encapsulated by different functions.

This locking design was the consequence of an earlier version of the code where the request queue and a notification queue were modified at the same time and I wanted to avoid having to hold three locks of three different objects at the same time as the risk of deadlock seemed high. Anyway that case does not exist anymore in this version, so perhaps a lock can be put back into pq again. I will attempt a redesign next week.

agree with @janos here.

we can have a separate PR for this

I would suggest to leave it as is for now. Usage of any of the pq functions without holding the main txqueue lock is always wrong. A lock managed by the pq internally is insufficient as in order to make sure that batches don't overlap or trigger signals are not missed locking is required beyond the scope of single functions. I also tried various approaches to put the locking as part of the batch itself but those only further complicated things.

I think for now we should consider persistentQueue (which was unexported, this was never supposed to be used elsewhere anyway) as a helper structure that is exclusively used by the txqueue and therefore have it share its lock. I think we should merge this to finish work on this codebase and if necessary consider any further redesigns when migrating transaction sending to bee.

My opinion is that we should not share the lock this way as it creates a code that is hard to maintain. I would not like to approve it in this state.

If we are focusing on bee and will not add new features to the swarm repo, we do not need to merge this PR now. But to leave it for the bee project.

Fair enough. I'll leave this PR open for further experimentation. Then we can either still merge this at some point in the future or just continue the redesign on a PR on the bee repo (although I assume it will take a while until we get to tx sending there).

janos · 2020-03-05T16:21:02Z

swap/chain/persistentqueue_test.go

+
+	count := 200
+
+	var errout error // stores the last error that occurred in one of the routines


errorout can race as two goroutins may set it at the same time.

janos · 2020-03-06T10:55:49Z

swap/chain/common_test.go

@@ -0,0 +1,7 @@
+package chain


Add copyright header to every new file in this package.

janos · 2020-03-06T11:05:12Z

swap/chain/txqueue.go

+func (txq *TxQueue) waitForNextRequest() (requestMetadata *txRequestData, err error) {
+	var id uint64
+	// get the id of the next request in the queue
+	key, err := txq.requestQueue.Next(txq.ctx, &id, &txq.lock)


This also relates to the lock usage. I have no concrete suggestion how to implement it differently, but tracking lock state across TxQueue and PersistentQueue may be quite hard to debug deadlocks or data races.

see #2124 (comment)

janos · 2020-03-06T11:13:41Z

swap/common_test.go

@@ -126,6 +129,7 @@ func newTestSwap(t *testing.T, key *ecdsa.PrivateKey, backend *swapTestBackend)
 		usedBackend = newTestBackend(t)
 	}
 	swap, dir := newBaseTestSwap(t, key, usedBackend)
+	swap.txScheduler.Start()


Check for retuned error.

It doesn't return an error.

mortelli

a few questions from my first pass through the code:

could you give an conceptual example of another struct that would implement the TxScheduler interface, other than TxQueue? (i want to make sure i understand the difference between these 2 in terms of responsibilities)
why does persistentQueue have a prefix, if all entries have their own separate keys? is it the idea to have multiple persistentQueue structs in the same state.Store? is this the case already?
i understand the situation of having a transaction with an unknown status, but why is there a func to actually notify this? would this take place in the future, when we allow transactions to expire, or is it already happening?
regarding future PRs: can you please explain what the node's actions would be in terms of confirmation monitoring? would this be basically issue Wait for sufficient amount of transaction confirmations #1633?

i definitely will review this PR again (even if it is merged before i manage to do so) as i would like to have a more in-depth understanding of some of the code here.

looks good so far though 👍

contracts/swap/swap.go

swap/chain/persistentqueue.go

mortelli · 2020-03-09T17:41:05Z

swap/chain/persistentqueue.go

+// It returns the generated key and a trigger function which must be called once the batch was successfully written
+// This only returns an error if the encoding fails which is an unrecoverable error
+// A lock must be held and kept until after the trigger function was called or the batch write failed
+func (pq *PersistentQueue) Queue(b *state.StoreBatch, v interface{}) (key string, trigger func(), err error) {


as a developer, not sure this comment

// call trigger function after writing to the batch to prevent undefined behaviour

is really clear about what to do here.

but it would at least be a sign that i would have to be careful when using these functions

swap/chain/persistentqueue.go

swap/chain/txqueue.go

mortelli · 2020-03-09T19:48:30Z

swap/chain/txscheduler.go

+}
+
+// ToSignedTx returns a signed types.Transaction for the given request and nonce
+func (request *TxRequest) ToSignedTx(nonce uint64, opts *bind.TransactOpts) (*types.Transaction, error) {


i am for ToSignedTx since it operates on the receiver request

contracts/swap/swap.go

swap/cashout_test.go

ralph-pichler · 2020-03-10T13:57:12Z

@mortelli

could you give an conceptual example of another struct that would implement the TxScheduler interface, other than TxQueue? (i want to make sure i understand the difference between these 2 in terms of responsibilities)

An alternative would be a scheduler which tracks nonce count locally and allows for parallel requests instead of queueing. Another one might be a simple mock for testing the rest of the code without running the entire queue mechanism.

why does persistentQueue have a prefix, if all entries have their own separate keys? is it the idea to have multiple persistentQueue structs in the same state.Store? is this the case already?

Yes, there are already multiple in the same store now. The request queue plus one notification queue per handler. Also this is the same state store as for swap in production and we want to avoid key collisions.

i understand the situation of having a transaction with an unknown status, but why is there a func to actually notify this? would this take place in the future, when we allow transactions to expire, or is it already happening?

This can already happen now if transactions don't confirm in time or backend.SendTransaction fails. The notification exists so the sending component can react accordingly. What it does then depends (e.g. we would never reattempt a deposit but trying a cashing transaction with unknown status again might be reasonable).

regarding future PRs: can you please explain what the node's actions would be in terms of confirmation monitoring? would this be basically issue #1633?

Not fully sure yet about that. There would at least be a notification once the confirmation number has been reached.

… tests

ralph-pichler added 3 commits March 3, 2020 19:22

swap/chain: add persistent queue

422ae58

swap/chain: add txqueue

65c5899

swap, swap/chain, contract/swap: use txqueue for cashout

145097c

ralph-pichler added the incentives label Mar 3, 2020

ralph-pichler self-assigned this Mar 3, 2020

ralph-pichler mentioned this pull request Mar 3, 2020

swap, swap/chain: add transaction queue (txqueue part 1) #2089

Closed

ralph-pichler added the ready for review label Mar 3, 2020

ralph-pichler requested a review from janos March 3, 2020 20:23

ralph-pichler changed the title ~~swap, contract/swap: add transaction queue~~ swap, swap/chain, contracts/swap: add transaction queue Mar 3, 2020

Eknir reviewed Mar 4, 2020

View reviewed changes

swap/chain/backend.go Show resolved Hide resolved

Eknir reviewed Mar 5, 2020

View reviewed changes

ralph-pichler added 2 commits March 6, 2020 09:09

swap, swap/chain, contracts/swap: address pr comments

f6cdcab

swap: fix a spelling mistake

97ce00b

Eknir reviewed Mar 6, 2020

View reviewed changes

swap/chain/txqueue.go Outdated Show resolved Hide resolved

janos reviewed Mar 6, 2020

View reviewed changes

ralph-pichler removed the ready for review label Mar 6, 2020

ralph-pichler added 3 commits March 6, 2020 21:44

Merge remote-tracking branch 'origin/master' into swap_cashout_8

eea13b6

swap: don't export logger in cashout processor

8a3d48c

swap/chain: don't export persistentQueue type

47ee895

mortelli reviewed Mar 10, 2020

View reviewed changes

ralph-pichler added 9 commits March 10, 2020 17:25

swap/chain: rename queue function to enqueue

7486071

swap/chain: add copyright headers

255927d

contract/swap, swap, swap/chain: refactor gas estimation, estimate in…

8a9ea5e

… tests

swap: add comment about interface

1a59564

swap: log transaction errors

c872d34

swap: fix logs and log pending hash

a67d6ea

swap/chain: update comment regarding id

2deea57

Merge remote-tracking branch 'origin/master' into swap_cashout_8

67fd670

swap/chain: wait for start in handlers

f936138

ralph-pichler added 4 commits March 17, 2020 11:09

swap/chain: fix linting issue

d9e54ba

swap/chain: add line after copyright header

40327f9

swap/chain: add lock to persistentqueue test

be6d28c

Merge remote-tracking branch 'origin/master' into swap_cashout_8

89ceeb5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

swap, swap/chain, contracts/swap: add transaction queue #2124

swap, swap/chain, contracts/swap: add transaction queue #2124

ralph-pichler commented Mar 3, 2020 •

edited

Loading

Eknir left a comment

janos left a comment

janos Mar 5, 2020

ralph-pichler Mar 27, 2020

janos Mar 5, 2020

ralph-pichler Mar 27, 2020

janos Mar 5, 2020

ralph-pichler Mar 27, 2020

janos Mar 5, 2020

ralph-pichler Mar 6, 2020

mortelli Mar 9, 2020

ralph-pichler Mar 27, 2020 •

edited

Loading

janos Mar 27, 2020

ralph-pichler Mar 27, 2020

janos Mar 5, 2020

ralph-pichler Mar 27, 2020

janos Mar 6, 2020

ralph-pichler Mar 10, 2020

janos Mar 6, 2020

ralph-pichler Mar 27, 2020

janos Mar 6, 2020

ralph-pichler Mar 10, 2020

mortelli left a comment

mortelli Mar 9, 2020

mortelli Mar 9, 2020

ralph-pichler commented Mar 10, 2020


		count := 200

		var errout error // stores the last error that occurred in one of the routines

swap, swap/chain, contracts/swap: add transaction queue #2124

Are you sure you want to change the base?

swap, swap/chain, contracts/swap: add transaction queue #2124

Conversation

ralph-pichler commented Mar 3, 2020 • edited Loading

Eknir left a comment

Choose a reason for hiding this comment

janos left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ralph-pichler Mar 27, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mortelli left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ralph-pichler commented Mar 10, 2020

ralph-pichler commented Mar 3, 2020 •

edited

Loading

ralph-pichler Mar 27, 2020 •

edited

Loading