Allow customizing how SQL rows are unpacked #1187

1st1 · 2025-01-27T23:55:21Z

No description provided.

This commit enables a `client.withCodecs({ sql_row: {...} })` call. Caveat: `toDatabase()` must be specified even though it's not currently possible to send rows as query arguments. Here's an example: res = await client .withCodecs({ sql_row: { fromDatabase(data, desc) { return Object.fromEntries( desc.names.map((key, index) => [key, data[index]]), ); }, toDatabase() { throw "cannot encode SQL record as a query argument"; }, }, }) .querySQL( ... ); Here we decode SQL rows into JS objects {colName: colValue}. Note that toDatabase() is specified, albeit, it's a no op and is there just to satisfy the type checker. Most likely we'll end up adding a more high level API for this, e.g. client.withSQLRowsAsObjects().querySQL(...) or client.withCodecs(gel.SQLRowsAsObjects).querySQL(...) Note that either of these APIs would allow to specialize unpacking SQL rows into objects inside the actual codec, for better performance. In any case, the underlying machinery for those future new APIs will likely be the one that this commit implements.

packages/driver/src/codecs/context.ts

scotttrinh · 2025-01-28T01:19:33Z

packages/driver/src/codecs/record.ts

+    const overload = ctx.getContainerOverload("sql_row");
+    if (overload != null) {
+      return overload.fromDatabase(result, { names: this.names });
+    }


My suggestion in the meeting was to intercept building the result and have the fromDatabase build the data structure directly. So pass it an iterable that returns an "entries-like" [key, value] where the value has already been decoded from the subcodec. That means the fromDatabase looks more like:

const objRecord = client.withCodec({ sql_row: { fromDatabase: (elements: Iterable<[string, any]>): Record<string, any> => { return Object.fromEntries(elements); }, }, });

I think what that looks like is that if you don't specify a container overload, we use this same iterator ourselves to fill the result: any[] array here like:

const results: any[] = Array.from(elements, (e) => e[1]);

Mostly this is a memory/space thing, but I can imagine people requesting pretty large rowsets where you don't want a bunch of allocations. We can probably do a more memory optimized version of the implementation you already have below though. Without a bunch of actual benchmarking, I'm not sure I can make my case much better than "vibes" here. 😂

Another idea would be to return a data structure that has .entries, .values, .keys methods that returned different iterables to avoid the extra allocations for the key name and tuple if they're not needed and make the Array case more like Array.from(elements.values()) instead of needing to define a mapping function.

My suggestion in the meeting was to intercept building the result and have the fromDatabase build the data structure directly. So pass it an iterable that returns an "entries-like" [key, value] where the value has already been decoded from the subcodec.

and

Mostly this is a memory/space thing, but I can imagine people requesting pretty large rowsets where you don't want a bunch of allocations.

Sadly this won't work. In order to build a classic synchronous iterator/generator to pass it to fromDatabase() we'd need to first parse the entire incoming data into an array and then return an iterator over it.

Another idea would be to return a data structure that has .entries, .values, .keys methods that returned different iterables to avoid the extra allocations for the key name and tuple if they're not needed and make the Array case more like Array.from(elements.values()) instead of needing to define a mapping function.

This doesn't change much - the data will still be all allocated for this data structure before fromDatabase is even called.

If we want to progressively push values into fromDatabase() as we parse them, then the only way is to use an async iterator. Sadly, the overhead of that will be so large that we'll just kill performance (compare array[i] = el on a pre-sized array vs. a chain of callbacks and function calls).

If we really want to make it possible for users to have super fast decoding for rows then we have to have a special codec API for collections.

fromDatabase(values, names) needs to be replaced with a pair of makeFromDatabaseContainer(size) and setFromDatabaseContainerIndex(index, name, value).

This way we can efficiently pre-allocate container (if need be) and progressively populate it with values. This is the fastest way possible.

But this is very complex, so let's ask ourselves do we actually need any of this?

Let's step back and analyze what's going on here (doing this also for the sake of finally documenting the decision tree somewhere).

In an ideal world we'd just have ONE data type for querySQL. JavaScript is dynamic enough to let us build a hybrid type, where a field can be accessed either by an index or by its name.

We could do that via extending the array type (creating types dynamically per data descriptor and caching them) or by creating objects with a custom prototype with nice name getters.

Either approach would work, and we even implemented this before for named tuples! Everything worked super fast and was quite ergonomic. But we ended up removing all of that, opting for using pure JS objects or arrays.

Why? Because next-js (and a few other frameworks) for some reasons enforce runtime validation on user data before allowing it to be serialized to JSON. I thought this could be a thing of the past (see the last comment in this thread, but I decided to check the latest nextjs source and found this and that. So nextjs still enforces serialization checks somewhere, and given that nextjs isn't the only offender we probably should just not attempt doing anything smart ever again.

Back to square one: what do users want? They want to access values in a row by index and/or name. With JavaScript we don't have a way to allow both at the same time. So it's one or another and the user has to choose ahead of time.

This is what pglite is offering - rowMode: "object" | "array".

Can we do this? Yes, that's what the current PR almost is. I'll push an update to it now to streamline it for the user further and make it close to pglite.

I still think it's useful to keep the API this PR introduces. Why? Because with the current API you can just wrap the passed values array and implement getters/setters on top of it (no need to re-allocate or copy anything). But I do now this that this would be a super low level API that few people would ever use.

I'm going to do a little bit of experimenting and benchmarking to validate our conflicting vibes here to ensure we don't end up performing poorly versus other drivers when returning SQL results. This may end up "breaking" the interface, but since this isn't yet released, and the actual public interface doesn't involve setting up the codec manually, I think that's fine.

Alright! Did some benchmarking and I was not able to give a nicer interface for consumers that didn't measurably impact the performance of the default case, so I'm going to withdraw my suggestion here. The closest I got was giving the consumer an entries-like array ([key, value][]) which I feel like is a more natural API than (values, { names }), but to get the same performance involves just as much duplication of logic, and if the downstream consumer wants to ignore the names, they suffer a small penalty for doing so.

Yeah, and I also realized I was wrong suggesting that a generator-based API can't be implemented. It sure can be. But it's still going to be much slower than what we have now, so let's not bother.

scotttrinh · 2025-01-28T01:23:28Z

packages/driver/test/client.test.ts

+              // toDatabase is required to be specified (limitation
+              // of TypeScript type system). This isn't super elegant
+              // and most likely we'll need a higher-level nicer API
+              // in top of this.


I assume you already tried just leaving off the toDatabase in the type def, and ran into some errors?

Believe it or not, I spent a considerable amount of time debugging typing here, ending up quite frustrated that I can't solve this puzzle.

Out of complete desperation I asked chatgpt and here's the sacred knowledge it shared with me.

This stems from this:

See the highlighted part. CodecSpec is used to type client.withCodecs. I literally tried everything, so I don't think I can solve this. You're welcome to try.

Well, with the new commit just pushed we don't care about this (for now).

scotttrinh · 2025-01-28T01:25:41Z

packages/driver/test/client.test.ts

+              // and most likely we'll need a higher-level nicer API
+              // in top of this.
+              //
+              // Maybe `client.withCodecs(gel.SQLRowAsObject)`?


Yes, absolutely, 100%. Could even short-cut this with .withSQLObjectCodec(). I almost think that this should be the default and have the array of tuples be the opt-in option.

OK, let's do this. I'll push some updates in a few.

Use it to switch the return type of client.querySQL() from JS array (indexable by position) to JS object (indexable by field name). "Hide" `client.withCodecs({sql_row: ...})` by renaming "sql_row" to "_private_sql_row". If there's ever a use-case to make it possible for users to overload how SQL rows are decoded beyond choosing between 'array' and 'object' we can rename it back to `sql_row`. But I do want to keep the door open to us potentially implementing a different codec API for "collection" types -- an API that would have better performance but look more low level. See [1] for details. [1] #1187 (comment)

1st1 · 2025-01-28T05:59:09Z

@scotttrinh @jaclarke alright, I've pushed another commit to "hide" withCodecs for setting up "SQL row" type and added a new high-level API withSQLRowMode(mode: 'array' | 'object'): Client.

If either of you decide to merge, please don't "squash" but instead "rebase". I'd like both commits of this PR to survive in the history.

1st1 · 2025-01-28T06:33:51Z

Alright, turns out that both node-postgres and pglite by default return arrays of objects, not arrays of arrays. And so should we.

I've just pushed another commit to flip the default.

client.querySQL() -> array<record<string, any>>
client.withSQLRowMode('array').querySQL() -> array<array<any>>

packages/driver/src/codecs/record.ts

Use it to switch the return type of client.querySQL() from JS array (indexable by position) to JS object (indexable by field name). "Hide" `client.withCodecs({sql_row: ...})` by renaming "sql_row" to "_private_sql_row". If there's ever a use-case to make it possible for users to overload how SQL rows are decoded beyond choosing between 'array' and 'object' we can rename it back to `sql_row`. But I do want to keep the door open to us potentially implementing a different codec API for "collection" types -- an API that would have better performance but look more low level. See [1] for details. [1] #1187 (comment)

beerose · 2025-01-28T16:04:24Z

packages/driver/src/options.ts

+      // and it's set to "object mode", the we want the codec mapping to be
+      // empty instead. Why? Empty codec mapping short circuits a lot of


Suggested change

// and it's set to "object mode", the we want the codec mapping to be

// empty instead. Why? Empty codec mapping short circuits a lot of

// and it's set to "object mode", we want the codec mapping to be

// empty instead. Why? Empty codec mapping short circuits a lot of

Was 4 minutes too late 😅

ha! PRs welcome 😂

1st1 force-pushed the sqlobj branch from 3946341 to f6516a8 Compare January 27, 2025 23:56

1st1 requested review from scotttrinh, jaclarke and beerose January 27, 2025 23:56

1st1 force-pushed the sqlobj branch 2 times, most recently from 4f0324e to 9917bfc Compare January 28, 2025 00:04

1st1 force-pushed the sqlobj branch from 9917bfc to b7899ef Compare January 28, 2025 00:08

scotttrinh reviewed Jan 28, 2025

View reviewed changes

Flip the default - make client.querySQL() return an object

1673652

1st1 requested a review from scotttrinh January 28, 2025 06:46

jaclarke reviewed Jan 28, 2025

View reviewed changes

packages/driver/src/codecs/record.ts Outdated Show resolved Hide resolved

jaclarke approved these changes Jan 28, 2025

View reviewed changes

Throw InternalClientError instead of string

6eee2b5

scotttrinh force-pushed the sqlobj branch from 803dd33 to 6eee2b5 Compare January 28, 2025 15:36

scotttrinh merged commit 62a2199 into master Jan 28, 2025
9 checks passed

scotttrinh deleted the sqlobj branch January 28, 2025 15:59

beerose reviewed Jan 28, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow customizing how SQL rows are unpacked #1187

Allow customizing how SQL rows are unpacked #1187

1st1 commented Jan 27, 2025

scotttrinh Jan 28, 2025

scotttrinh Jan 28, 2025 •

edited

Loading

1st1 Jan 28, 2025 •

edited

Loading

scotttrinh Jan 28, 2025

scotttrinh Jan 28, 2025

1st1 Jan 29, 2025

scotttrinh Jan 28, 2025

1st1 Jan 28, 2025 •

edited

Loading

1st1 Jan 28, 2025

scotttrinh Jan 28, 2025

1st1 Jan 28, 2025

1st1 commented Jan 28, 2025

1st1 commented Jan 28, 2025

beerose Jan 28, 2025

beerose Jan 28, 2025

scotttrinh Jan 28, 2025

		// and it's set to "object mode", the we want the codec mapping to be
		// empty instead. Why? Empty codec mapping short circuits a lot of

Allow customizing how SQL rows are unpacked #1187

Allow customizing how SQL rows are unpacked #1187

Conversation

1st1 commented Jan 27, 2025

Choose a reason for hiding this comment

scotttrinh Jan 28, 2025 • edited Loading

Choose a reason for hiding this comment

1st1 Jan 28, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

1st1 Jan 28, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

1st1 commented Jan 28, 2025

1st1 commented Jan 28, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

scotttrinh Jan 28, 2025 •

edited

Loading

1st1 Jan 28, 2025 •

edited

Loading

1st1 Jan 28, 2025 •

edited

Loading