Support for querying DuckDB views #474

naoyak · 2024-12-04T02:27:38Z

naoyak
Dec 4, 2024

I brought this up through another channel as well, but raising here for public visibility: is there any technical reason why querying DuckDB views through pg_duckdb is not yet supported? This would be really key for us.

JelteF · 2024-12-04T10:08:44Z

JelteF
Dec 4, 2024
Collaborator

Thanks for starting this thread. I managed to finish writing down my thoughts on this. Supporting views is not a trivial change, but it's also not that different from supporting tables:

Tables, databases and schemas are currently the only objects that are synced from MotherDuck. This means that if people create views in MD they cannot query them in Postgres.

To query them from postgres we'll need to create Postgres objects that we can query in the same way. There are three viable object types that we could use:

Views
Tables
Foreign tables

All of these would work the same with regards to querying them, but there are other things to consider as well.

Tl;dr conclusion

We should sync view definitions from MotherDuck to Postgres as a Postgres VIEW. The resulting view will look as if it was created using a Postgres command that looks like this:

CREATE VIEW myview AS
    SELECT * FROM duckdb.view($$ FROM 's3://some.parquet' $$) AS (column1 int, column2 text);

Being able to create MotherDuck views from within Postgres would be a follow up task, with less priority. Dropping MotherDuck views should be fairly simple to implement though, and thus should probably be part of the initial syncing support.

Inspecting views

Catalogs in Postgres are also inspected, either using the tables in information_schema, psql it’s meta commands like \d and \dv, or using some visual client application like DBeaver or pgAdmin.

The MotherDuck views will show up in these places as the type of object that we choose.

So creating them as views would feel most natural to Postgres users when inspecting them.

There are two slight problems with VIEWs though. VIEW definitions contain the query string, and these are also shown when inspecting views. But this query string needs to be understandable by postgres. So we need to store some other string than the actual query string. But it seems nice to still embed the actual query string though, so people can see it.

SELECT duckdb.view($$ FROM 's3://some.parquet' $$);

Differentiation from Postgres objects

Another thing that users want to do when inspecting their catalog. Is to differentiate between MotherDuck and Postgres views. With tables they can do so by looking at the table's table access method. If that’s “heap” it’s a Postgres table, if that's “duckdb” it’s an MD table. That’s not the case for duckdb views, because they don’t have a table access method. Using the duckdb.view function inside the query of every view, even for the ones that use syntax that Postgres can parse, allows user to differentiate MotherDuck and Postgres views.

DDL on MotherDuck views

Views can not only be read from, but they can also be created/altered/dropped. Actually implementing this might not be necessary for the first level of support, but it’s something to consider.

Dropping and altering MotherDuck views

Dropping views is easy with all approaches. But for users having it be a view is most natural, because then DROP VIEW works. Otherwise people need to use DROP TABLE or DROP FOREIGN TABLE.

The same is true for altering views. Right now RENAME is the only supported ALTER command by DuckDB. All proposed objects.

Creating MotherDuck views from Postgres

Creating views is the most difficult operation. An approach that would work for all of our objects would be to use a procedure that would build the DuckDB CREATE VIEW command for us.

CALL duckdb.create_view('myview', $$ FROM 's3://some.parquet' $$)

That seems quite straightforward, but it gets more complicated once we want to support some other CREATE VIEW options, because we need to encode all these things in Postgres arguments. Take for instance the following CREATE VIEW command, which uses OR REPLACE and specifies the column names:

CREATE OR REPLACE VIEW v1(a) AS SELECT 42;

Another approach is to use Postgres' dedicated DDL to create the object in question.

If we use views that could look something like this:

CREATE OR REPLACE VIEW myview(a) WITH (using = duckdb) AS SELECT 42;

Note that this uses WITH (using = duckdb) instead of USING duckdb, that is because USING is not a valid keyword for creating views in Postgres. Also the “using” here is open for discussion, we could use any word there only WITH is required, WITH (storage = duckdb) or WITH (duckdb = true) or WITH (motherduck = true)are all options we could choose.

This doesn’t work though if the view query is not valid postgres syntax. For those we’d somehow need to encode the actual query in a valid Postgres query. We could then detect this special query pattern instead of or in addition to detecting using = 'duckdb'. Something like the below would work:

CREATE VIEW myview AS SELECT duckdb.view($$ FROM 's3://some.parquet' $$);

With tables CREATE TABLE something like this would work:

CREATE TABLE myview() USING duckdb WITH (view = $$ FROM 's3://some.parquet' $$);

And with CREATE FOREIGN TABLE we’d need syntax like this:

CREATE FOREIGN TABLE myview() SERVER duckdb 
    OPTIONS (view = $$ FROM 's3://some.parquet' $$);

Both foreign CREATE TABLE and CREATE FOREIGN TABLE would only be able to support OR REPLACE using another WITH option though.

Conclusion

Based on the options above it seems clear that using a VIEW is the most natural option. The only problem is that we need to wrap the query in a duckdb.view call when storing it in Postgres.

0 replies

naoyak · 2024-12-04T20:16:14Z

naoyak
Dec 4, 2024
Author

Basic support for read-only syncing and querying of MD views (i.e. mirroring them with Postgres views) would already be a significant win!

0 replies

Leo-XM-Zeng · 2024-12-12T09:04:10Z

Leo-XM-Zeng
Dec 12, 2024

There is a not-so-good way to query the contents of duckdb.

SELECT duckdb.raw_query($$ SELECT * from main.duckdb_tables $$);

I brought this up through another channel as well, but raising here for public visibility: is there any technical reason why querying DuckDB views through pg_duckdb is not yet supported? This would be really key for us.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for querying DuckDB views #474

{{title}}

Replies: 3 comments

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Support for querying DuckDB views #474

naoyak Dec 4, 2024

Replies: 3 comments

JelteF Dec 4, 2024 Collaborator

Tl;dr conclusion

Inspecting views

Differentiation from Postgres objects

DDL on MotherDuck views

Dropping and altering MotherDuck views

Creating MotherDuck views from Postgres

Conclusion

naoyak Dec 4, 2024 Author

Leo-XM-Zeng Dec 12, 2024

naoyak
Dec 4, 2024

JelteF
Dec 4, 2024
Collaborator

naoyak
Dec 4, 2024
Author

Leo-XM-Zeng
Dec 12, 2024