Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gotcha with async prepared statement execution #199

Closed
mjmasn opened this issue Dec 10, 2024 · 9 comments
Closed

Gotcha with async prepared statement execution #199

mjmasn opened this issue Dec 10, 2024 · 9 comments

Comments

@mjmasn
Copy link
Contributor

mjmasn commented Dec 10, 2024

I don't think this is a bug report as such, more of a gotcha that might be worth mentioning in docs.

The scenario is fetching some data in parallel and inserting each batch as it's received over the network.

Previously with the synchronous api, when calling bind / execute on a prepared statement shared between all batches, there was never a case where execute would execute with bound data other than the data bound immediately before the execute call.

Now with the asynchronous API, it seems it's possible for race conditions to occur in this scenario, where I'm guessing the prepared statement is reset or rebound by another network request handler between binding and executing. This causes occasional errors

SQLite code: 19 execution error: NOT NULL constraint failed: table.id

Can be resolved by queuing up the data to be inserted, or probably by not reusing the prepared statement between batches - haven't tried that yet though.

@ospfranco
Copy link
Contributor

Interesting, you should be using transactions anyways to insert data anyways as it is a disk operation that can fail. Does using prepared statements on inserts have a speed benefit? It should be mostly an append operation and maybe an index update?

@ospfranco
Copy link
Contributor

Maybe there is something we can do here @rflopezm. We should put a mutex when a prepared statement is executing to prevent this race condition. This can be done on TypeScript on index.ts no need to modify the C++ code.

@mjmasn
Copy link
Contributor Author

mjmasn commented Dec 10, 2024

This is inside a transaction (manually calling BEGIN TRANSACTION not using db.transaction) so we can do 1 transaction per 'record type' we're syncing down. Was the fastest method when we tested about a year+ ago, but that was quite a few versions of this library ago so maybe it's worth us retesting perf at some point.

@ospfranco
Copy link
Contributor

ospfranco commented Dec 10, 2024

manually calling BEGIN TRANSACTION is not the same as using db.transaction. db.transaction has a built in mutex to prevent race conditions between statements. I guess the speed up you are seeing is not because of the prepared statement, is the avoidance of this mutex but you loose the guarantees of consistency.

Maybe you are better off using executeBatch which wraps multiple statements in the native code.

@ospfranco
Copy link
Contributor

I finally understood the issue. Unfortunately, the changes I was making won't solve this directly by limiting the threads to one. The issue is queueing work to a thread takes time before the next task must be consumed. There is one easy solution though, which is making bind async. I will do this. It will fix any concurrency issues within the same connection when paired with the changes of #204

@mjmasn
Copy link
Contributor Author

mjmasn commented Dec 16, 2024

@ospfranco I tried a few different methods including the ones above and this was the fastest to insert - we're loading data from an API and inserting 100s of thousands of records across around 100 tables in batches of 200-2000. Using executeBatch was significantly slower (and on par with things like queuing up the batches for insertion rather than letting them insert immediately). We can't easily use db.transaction for this use case because we wanted basically 1 transaction per table which is split across multiple HTTP response batches.

We were originally reusing the prepared statement across simultaneous HTTP response handlers when doing the inserts. Removing that (i.e. going to 1 PS per handler) hasn't changed performance too much and has resolved the issue for us as we were already binding/executing records in a loop within each batch, so no concurrency issues there.

Re making bind async - is that so that it can be queued with the execute calls in the correct order? If so makes sense. If not then not sure it'll help - the issue is basically that bind / execute aren't an atomic unit so the most recent bound data isn't necessarily what you expect if you have multiple in-flight async function calls that might try to call bind / execute on a shared prepared statement.

I think I'm leaning towards this belonging in the category of holding it wrong, but if there's a way to make it more foolproof then awesome :)

Thanks for looking into it 👍🏻

@ospfranco
Copy link
Contributor

You are not holding it wrong. It is a built-in race condition

@ospfranco
Copy link
Contributor

New version is out, bind is now async and needs to be awaited and with a single thread you should not face statements getting lost:

let preparedStatement = ...
await preparedStatement.bind(...)
await preparedStatement.execute(...)

@mjmasn
Copy link
Contributor Author

mjmasn commented Dec 24, 2024

Awesome, thanks again @ospfranco ❤️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants