Revisit locking design for State object #17

karlfloersch · 2019-01-09T23:36:56Z

Is your feature request related to a problem? Please describe.
The state object requires locking logic when starting a new block or ingesting transactions & deposits. An example of a race condition without locking is as follows:

Transaction A & B are ingested at the same time.
Transaction A transfers range to Alice in block 5, transaction B transfers the same range to Bob in block 5. Both transactions are valid, but only one can be included as they have the same block number & transfer the same range twice.
Because there is no lock, both transactions make it past the validity checks concurrently.
Both transactions are applied to the state and saved in the tx_log... uh oh!
Consumers of the tx_log now have two transactions effecting the same range in the same block! Oh no--that's invalid!

I am currently solving this by locking the sending account when a transaction is added. That means that each of these txs would attempt to acquire a lock for the same sender & one would fail.

How do locks currently get acquired and released?

Right now, locking is very primitive. Lock acquiring looks like this:

state.js

  attemptAcquireLocks (keywords) {
    keywords.push('all')
    if (keywords.some((val) => { return this.lock.hasOwnProperty(val) })) {
      // Failed to acquire locks
      return false
    }
    // Acquire locks
    for (let i = 0; i < keywords.length - 1; i++) {
      this.lock[keywords[i]] = true
    }
    return true
  }

Where keywords is every sender address in every transfer record of the transaction being added. If a lock fails to be acquired, then we wait some amount of time & try again with:

    while (!this.attemptAcquireLocks([type])) {
      // Wait before attempting again
      await timeout(timeoutAmt())
    }

Problems with this approach

I used this as a quick and dirty solution but I believe it is a bit dangerous in that it may fill up the event queue if there are a large number of callbacks competing for the same lock. I don't actually know how bad this is but I imagine it could be a problem if someone were to intentionally fill up the event queue with tons of txs which are from the same sender.

Solutions?

I don't know the best way to approach this I just intuit there's a solution I haven't yet explored. Maybe to do with queues and promises, but my thoughts in this area are not developed enough to describe at the moment.

I'd love feedback!

Some resources I've taken a look at...

The text was updated successfully, but these errors were encountered:

ben-chain · 2019-01-10T00:21:58Z

For what it's worth, I will note that the race condition you describe does not produce an "invalid" block in the same way that would be used in an invalid history challenge. At the leaf parsing step of the merkle sum tree, all but one of the transactions would be given a sum of 0 and so those branches would not be able to be used in an exit game at all. So, the race condition doesn't actually invalidate anything in the sense of having to exit or lose money.

That said, it obviously seems like horrible practice to not solve this issue--especially if the operator is signing off on inclusion promise attestations to get the "0-conf"-like property.

Another note--locking based on account does have the issue that it restricts the user from sending more than one transaction per block. Ideally, we would want to lock by range... not sure how much overhead this would end up adding.

It does seem like a proper queue module could be useful here depending on how quick locking/unlocking is--from queue.js, "it can be significantly faster than using arrays." I'm not sure whether the DOS issue you described above is preventable without proper anti-DOS measures, though.

On recieving a transaction, you'd add it to the queue, with a callback to include/log the transaction, and then you would just repeatedly dequeue in a loop:

check if any parts of the ranges are locked, if so, reject
lock all the ranges,
accept transaction (run callback)

karlfloersch · 2019-01-31T15:30:02Z

locking based on account does have the issue that it restricts the user from sending more than one transaction per block

These account locks are released upon completion of the addTransaction() function, so there should be no problem with having multiple transactions for the same account in the same block.

It may be worth mentioning that I've implemented a simple promise queue & simply processed transactions in sequence but this approach seemed to have even worse memory limit issues & was slightly slower.

gitcoinbot · 2019-01-31T17:43:49Z

Issue Status: 1. Open 2. Started 3. Submitted 4. Done

This issue now has a funding of 300.0 DAI (300.0 USD @ $1.0/DAI) attached to it.

If you would like to work on this issue you can 'start work' on the Gitcoin Issue Details page.
Want to chip in? Add your own contribution here.
Questions? Checkout Gitcoin Help or the Gitcoin Slack
$53,237.11 more funded OSS Work available on the Gitcoin Issue Explorer

mvayngrib · 2019-02-01T02:08:44Z

@karlfloersch how are you measuring the memory limit / speed issues?

gitcoinbot · 2019-02-01T11:19:35Z

Issue Status: 1. Open 2. Started 3. Submitted 4. Done

Work has been started.

These users each claimed they can complete the work by 1 week, 5 days ago.
Please review their action plans below:

1) famiiz has applied to start work (Funders only: approve worker | reject worker).

Good
Vsjsisbsbsnskzzbbz zip, jinx Xbox, nzkzbz znzkzz z zkzkznznzznnzmzkzkzznzbzbzsbjzjzjzn

Learn more on the Gitcoin Issue Details page.

2) lsaether has applied to start work (Funders only: approve worker | reject worker).

Would love to take a stab at this and see what kind of solution I can implement. I'm proficient in JS/ TS but can't describe a solution to this off the top of my head, would need to play around with the codebase first.

Learn more on the Gitcoin Issue Details page.

3) mvayngrib has been approved to start work.

i can factor out the queueing/locking logic into a separate module. You'll probably be able to get rid of your async-lock dependency too while you're at it.

Learn more on the Gitcoin Issue Details page.

mvayngrib · 2019-02-01T15:52:10Z

is execution in order of invocation important? Currently, if the attempted lock acquisitions are something like 1. ['a', 'b'], 2. ['b', 'c'], 3. ['a', 'c'], 4. ['a', 'b'], 4 might end up executing in parallel with 2, and before 3

if execution in order of invocation is not important, then at every lock release, u can check the enqueued operations and choose the maximum number that can be run in parallel, as opposed to just the luck of the draw

frndxyz · 2019-02-03T07:23:11Z

isnt the solution if this to add timestamp with it to solve concurrency and queue jam issue.

karlfloersch · 2019-02-04T23:45:29Z

@mvayngrib

if execution in order of invocation is not important, then at every lock release, u can check the enqueued operations and choose the maximum number that can be run in parallel, as opposed to just the luck of the draw

This seems reasonable. It makes sense to have a datastructure which makes it easy to "choose the maximum number of parallelizable transactions" and then begin processing those transactions. The tricky part is managing this complex queue.

@frndxyz

isnt the solution if this to add timestamp with it to solve concurrency and queue jam issue.

How does this work? Slightly confused about exactly how to use the timestamp

mvayngrib · 2019-02-05T19:08:21Z

@karlfloersch after playing with it on paper, and discussing with a friend, I'm no longer sure it matters whether parallelizable tasks run earlier or later. I think your locking design is actually fine efficiency-wise, the main issues being:

it's embedded in state.js (lots of lock/unlock)
it uses timeouts
this kind of intersecting-lock-set queue might be useful on its own, in other parts of the project

i see @Sprimage has been approved, but if he changes his mind, i'd like to give this a try

Sprimage · 2019-02-05T19:37:35Z

Yea @mvayngrib i figured the polling loop too was an issue and decided using https://www.npmjs.com/package/promise-queue should wrap it up. If you're really interested, we can work together to come up with the best solution as soon as possible

Sprimage · 2019-02-05T22:11:53Z

Ok so queuing promises can take a lot of memory. What if we move the job queue out of program memory and use something like fs? So each file represents a job (the json of the functions args) and we loop through the dir to execute each job in the queue and delete the file once executed?

mvayngrib · 2019-02-05T23:04:58Z

@Sprimage fs sounds like a big compromise speed-wise. Blocks are processed one a time, what's the max # txs in a block? Also, as far as I understand, queueing is an exception rather than a rule. Txs can be processed in parallel and only queue up when they collide on a lock (as they are currently)

karlfloersch · 2019-02-05T23:14:17Z

@mvayngrib

what's the max # txs in a block?

This depends on the block time & tps as there is no hard upper limit on txs in a block. Right now the block time is set to 60 seconds, and I've benchmarked the state object alone and gotten around 3k tps on my laptop. So if we were to make the express API handle more tps, theoretically with 1 minute block times we could get blocks with 180,000 transactions, which would be approx 18MB.

mvayngrib · 2019-02-05T23:37:59Z

@karlfloersch are you considering using a persistent buffer of some sort, e.g. redis or fs, or is that out of the scope?

Sprimage · 2019-02-06T09:41:27Z

@mvayngrib

Also, as far as I understand, queueing is an exception rather than a rule. Txs can be processed in parallel and only queue up when they collide on a lock (as they are currently)

I know queuing is an exception. But the same is the root of this issue isn't it?

Sprimage · 2019-02-06T10:31:12Z

Also external dependencies and platform specific dependencies would make fs the better option compared to redis. @karlfloersch do you think fs will work?

Sprimage · 2019-02-06T10:32:51Z

If you have slack or something similar so we can discuss this in real time and consider the solution together would be great too

gitcoinbot · 2019-02-07T17:16:55Z