Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prepared statements #14

Open
Azarattum opened this issue Jul 15, 2023 · 9 comments
Open

Prepared statements #14

Azarattum opened this issue Jul 15, 2023 · 9 comments

Comments

@Azarattum
Copy link
Collaborator

Previously discussed here: #4 (review)

We need to think on the ways to implemented prepared statements in typed-sql. The good API should be:

  • simple: user should have to worry about preparing queries by default, yet receive all the performance benefits of the prepared statements
  • flexible: user might want to explicitly control when queries are prepared for niche use cases
  • efficient: preparing statements should not leak memory
  • modern: we might want to consider supporting observable-like reactive patterns in sql parameters instead of manually binding them before each call

Here I want to explicitly discuss prepared statements in the executional part of typed-sql. Meaning if a user only uses query generation, they can already do prepared statements the way they want:

const query = sql`...`;
db.prepare(query.sql).all(query.params); // better-sqlite3 example

We want to provide an opinionated way to prepare and run SQL queries that makes common use cases simple and complex cases possible.

The API

Given that we need to prepare a statement to execute it, we can save this work to not be done twice.

const query = sql`...`;
const data = await query; // prepares & executes
// ...
const data = await query; // just executes

This satisfies simple requirement as there will be no difference in code with prepared statements vs without them from user's perspective. To satisfy flexibility, an advanced user might explicitly request preparing a statement:

const query = sql`...`;
await query.prepare();
// ...
const data = await query; // just executes

Scoped Prepare

In this approach prepared statements are tied to their query and its scope.

{
  const query = sql`SELECT * FROM foo`;
  const data = await query; // prepares & executes
}
{
  const query = sql`SELECT * FROM foo`; // same query
  const data = await query; // prepares (again) & executes
}

This assumes that better-sqlite3 or whatever provider is used does the same. Currently I found no evidence of this being the case. sqlite3_finalize should be called to free the memory from a prepared statement. From looking at the code I've found that it is called only when using exec, closing the database or incorrectly suppling multiple queries. So, maybe SQLite does some kind of automatic cleanup? Or maybe the memory impact from the prepared statements is so negligible it can be overlooked? If this is the case, the next approach might be favoured over this one.

SQL Cache Map

Another thing we can do is to maintain the cache map of all the used queries. This way we can quickly lookup a correct prepared statement from the cache to be used across multiple scopes.

{
  await sql`SELECT ${1}`; // prepares and caches as `SELECT ?` 
}
{
  await sql`SELECT ${2}`; // finds `SELECT ?` is cache, quickly executes
}

Let's look at a common way people write their DB functions:

// ...
async function getUserByID(id: number) {
  return await sql`SELECT * FROM users WHERE id = ${id}`.then(x => x[0]);
}
// ...

Note, that here the query is always in its own scope and is recreated on every call. Putting stuff in the global scope can get really ugly really fast.

const getUserByIDQuery = sql`SELECT * FROM users WHERE id = ${/* what do we put here? */}`; // this one will be discussed below

async function getUserByID(id: number) {
  return await getUserByIDQuery.then(x => x[0]);
}

In this case having a global cache that is handled internally by typed-sql can bring huge benefits.

What shall we do to satisfy efficient requirement then? If the cache is global when (or should at all) we evict it? Maybe LRU or MFU or similar approaches would be appropriate here?

Observable Parameters

Given the current state of JS, observable and reactive patterns are everywhere and for a good reason. We need to think how we can integrate this behaviour with query parameters. This needs to be framework/library agnostic, yet we don't want to write our own RxJS just for this features.

I think we should integrate with the most common observable interfaces without implementing our own (similar to what we do with coercers). Some implementations we should definitely consider supporting: Solid Stores, Svelte Stores, RxJS observables, plain JS functions and maybe there is something similar in React?

const name = observable("Alice"); // this is pseudo-function just for demo purposes 

const query = sql`SELECT * FROM users WHERE name = ${name}`;
await query; // returns Alice's data
//...
name.set("Bob"); // this may be triggered somewhere in UI
//...
await query; // next call to the query returns Bob's data

This declarative approach would play much nicer than the imperative call to the .bind function in many modern scenarios. We will support .bind though. The observables would be in addition to that.


So, @tantaman, what do you think? Maybe we should invite more people to discuss this?

@tantaman
Copy link
Contributor

Scoped Prepare

Currently I found no evidence of this being the case. sqlite3_finalize should be called to free the memory from a prepared statement. From looking at the code I've found that it is called only when using exec, closing the database or incorrectly suppling multiple queries. So, maybe SQLite does some kind of automatic cleanup? Or maybe the memory impact from the prepared statements is so negligible it can be overlooked? If this is the case, the next approach might be favoured over this one.

I remember them mentioning that they hook calls to sqlite3_finalize into the JS garbage collector. E.g., you can do this in the browser with WeakRef and FinalizationRegistry.

Looking through their code, looks like they hook into the garbage collector by extending node::ObjectWrap and implementing a destructor on the statement class.

The call to CloseHandles in the destructor finalizes the statement.

I think this model works well with Scoped Prepare since as long as the query object holds a reference to the prepared statement it'll stay alive and be usable.

SQL Cache Map

Another thing we can do is to maintain the cache map of all the used queries. This way we can quickly lookup a correct prepared statement from the cache to be used across multiple scopes.

This can be problematic for a few reasons.

If we ever allow users to "step" through statements (get one row at a time rather than all rows at once) this caching setup could easily break.

const query1 = sql`SELECT * FROM foo`;
const query2 = sql`SELECT  * FROM foo`;

// under the hood, query1 and 2 will bot point to the same prepared statement
while (query1.step()) {
  while(query2.step()) {
     // query1 and query2 are both stepping through the same result set rather than through independent result sets!
  }
}

Even if we never exposed step to the user, async connections would implicitly expose it since, under the hood, all rows are gathered by calling step repeatedly. An async connection awaits on each step which would allow the above error cases to arise if two references to the same statement were given out at the same time.

If we can figure out how to clone a prepared statement when giving it out from the cache then I think this would be a good approach since:

  • Cloning a statement's bytes would be quick vs actually doing the work to compute the statement from a string
  • By cloning, no two callers would ever share the same reference to the same statement. This prevents callers from clobbering one another's statement cursor.

... Maybe LRU or MFU or similar approaches would be appropriate here?

That or some sort of ref counting. Each query object would bump the ref count on the cached query backing it by 1. When a query object is reclaimed by the garbage collector, we would decrement the count by 1. When we reach 0, reclaim the statement or mark the statement as eligible for removal in some later pass.

If we can't clone statements and users need to use scoped prepare and start throwing things into global scope to keep prepared statements around your example:

const getUserByIDQuery = sql`SELECT * FROM users WHERE id = ${/* what do we put here? */}`

would look like:

const getUserByIDQuery = sql`SELECT * FROM users WHERE id = ?`

Observable Parameters

I like the idea (setting bind params with an index is not very fun) although it is orthogonal to statement preparation, correct? Seems like you'd still implement statement preparation through one of the two options above.

@tantaman
Copy link
Contributor

tantaman commented Jul 16, 2023

Rusqlite has a statement cache. They solve the above issues by removing a statement from the cache while it is in use and putting it back when the caller is done.

https://docs.rs/rusqlite/0.13.0/src/rusqlite/cache.rs.html#127

Idk how well this'll work in the browser where all access to SQLite is async. Seems like you could have many components concurrently asking for the same statement and having to prepare it since it would be removed from the cache by the first caller.

We could get clever with queueing and waiting your turn before using the cached statement. I.e., if the statement you need is in use, you just queue up for it. Since SQLite in the browser is single-threaded it won't slow stuff down to do this.

The TCL bindings also have a statement cache that, according to forum posts (https://users.rust-lang.org/t/sqlite-caching-prepared-statements-again/15626/19), does the same thing as the rust impl. The rust impl being based on the TCL one -- https://www.sqlite.org/src/artifact?ci=trunk&filename=src/tclsqlite.c

@Azarattum
Copy link
Collaborator Author

Queueing sounds interesting, but it would result in a deadlock with the example you showed above:

while (query1.step()) while(query2.step())

Removing statements from cache during execution sounds very clever, though.

Idk how well this'll work in the browser where all access to SQLite is async. Seems like you could have many components concurrently asking for the same statement and having to prepare it since it would be removed from the cache by the first caller.

If only we had more data on how it behaves in a real scenario...

I suggest we mark statements as busy instead of removing them from cache. When a new call comes we can race current cached statement unlocking vs preparing a new one. This would result in the fastest execution, but not necessarily the least amount of work being done.

@tantaman
Copy link
Contributor

tantaman commented Jul 16, 2023

but it would result in a deadlock with the example you showed above:

Right :(

This is only in the scenario where we expose a step api though. If they're doing await query1() I think we're ok since the execution of query won't await another query.

Maybe for stepping would could expose something lower level.

Or hopefully I can figure out how to safely clone these things.

@tantaman
Copy link
Contributor

Had a chat with @schickling today. He's not too interested in anything related to statement execution at the moment given they already have a specific execution model and framework (https://riffle.systems/essays/prelude/) so I don't think he'll be weighing in here.

This is also part of the reason I pushed back on execution since Riffle is the first target use case.


Looks like we can't clone statements so if we want to cache them we'll need to:

  1. Mark the statement as in-use while it is being used
  2. Have other users that want an in-use statement queue for it
  3. Do not expose step so people cannot deadlock themselves

@Azarattum
Copy link
Collaborator Author

So, I take it as you don't like racing the queueing with preparing a new statement?

@tantaman
Copy link
Contributor

So, I take it as you don't like racing the queueing with preparing a new statement?

Well I think we've entered the territory of making lots of assumptions about how things work without measuring or testing :)

For synchronous backends, like better-sqlite3 or in-memory browser DBs, the normal caching mechanism of "remove while in use, put back" shouldn't be an issue.

For browser backends we're assuming the "remove, put back" approach will cause lots of re-preparations of the same statement due to the async interface.

Maybe we should test to see how true that is?

I think racing will end up acting almost the same as queuing with the one advantage of being able to expose step if we'd like.

The one risk with racing is doing extra work so I'd like to know if we can avoid that. In theory we can cancel the losers of the race before they actually start executing.

@tantaman
Copy link
Contributor

tantaman commented Jul 23, 2023

I put execution related stuff into https://github.com/vlcn-io/typed-sql/tree/main/incubator so I can start publishing the type related packages since those are ready for early adopters.

The repo is also public now so the normal forking flow will work again.

@Azarattum
Copy link
Collaborator Author

Sounds great, going public is a big step. :)

I'm still working on prepared statements. I've been a bit busy lately, so things don't move as fast as I would want. I'll make a PR as soon as I have caching fully working (without eviction). Then we can do some tests in different environments and discuss the eviction strategy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants