-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Basic implementation for DuckDB as PG Engine #437
Basic implementation for DuckDB as PG Engine #437
Conversation
83e9dd0
to
5192bcd
Compare
We can use |
5810eaf
to
8835b0e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Appreciate your contribution. I've some point I don’t understand. If you can teach me when you have time, I will grateful to you.
accio-cache/src/main/java/io/accio/cache/DuckdbRecordIterator.java
Outdated
Show resolved
Hide resolved
|
||
import java.util.List; | ||
|
||
public interface PgMetastore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This class will be implemented by many data sources different from PG. Should it not be with the Pg
prefix to avoid confusion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PG
prefix means PG Wire Protocol
which is responsible to execute the metadata query. Metastore
means where we store the meta data.
|
||
public class DuckDBMetadata | ||
implements Metadata | ||
implements Metadata, PgMetastore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you help me know what different between Metadata and PgMetastore? I saw they have same method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Metadata is an interface to access data source. PgMetastore is also use to access data source but the data source is used to execute PG metadata query. In our desgin,level-1
and level-2
will be executed by PgMetastore
. level-3
will be exectued by the normal metadata.
accio-main/src/main/java/io/accio/main/pgcatalog/PgCatalogManager.java
Outdated
Show resolved
Hide resolved
accio-main/src/main/java/io/accio/main/pgcatalog/function/DataSourceFunctionRegistry.java
Show resolved
Hide resolved
accio-main/src/main/java/io/accio/main/pgcatalog/regtype/AbstractRegObjectFactory.java
Outdated
Show resolved
Hide resolved
accio-tests/src/test/java/io/accio/testing/bigquery/TestReloadCache.java
Outdated
Show resolved
Hide resolved
accio-tests/src/test/java/io/accio/testing/TestMDLResource.java
Outdated
Show resolved
Hide resolved
2b440fd
to
f62127a
Compare
45f47fe
to
286531f
Compare
accio-base/src/main/java/io/accio/base/client/duckdb/DuckdbClient.java
Outdated
Show resolved
Hide resolved
accio-base/src/main/java/io/accio/base/client/jdbc/BaseJdbcRecordIterator.java
Show resolved
Hide resolved
accio-base/src/main/java/io/accio/base/sqlrewrite/analyzer/ExpressionTypeAnalyzer.java
Outdated
Show resolved
Hide resolved
accio-base/src/main/java/io/accio/base/sqlrewrite/analyzer/StatementAnalyzer.java
Outdated
Show resolved
Hide resolved
accio-base/src/test/java/io/accio/base/sqlrewrite/analyzer/TestStatementAnalyzer.java
Outdated
Show resolved
Hide resolved
accio-main/src/main/java/io/accio/main/wireprotocol/WireProtocolSession.java
Show resolved
Hide resolved
accio-main/src/main/java/io/accio/main/wireprotocol/WireProtocolSession.java
Outdated
Show resolved
Hide resolved
accio-main/src/main/java/io/accio/main/wireprotocol/WireProtocolSession.java
Outdated
Show resolved
Hide resolved
accio-tests/src/test/java/io/accio/testing/bigquery/AbstractCacheTest.java
Outdated
Show resolved
Hide resolved
accio-base/src/test/java/io/accio/base/sqlrewrite/analyzer/TestStatementAnalyzer.java
Outdated
Show resolved
Hide resolved
accio-main/src/main/java/io/accio/main/wireprotocol/WireProtocolSession.java
Outdated
Show resolved
Hide resolved
accio-tests/src/test/java/io/accio/testing/bigquery/AbstractCacheTest.java
Outdated
Show resolved
Hide resolved
accio-base/src/test/java/io/accio/base/sqlrewrite/analyzer/TestStatementAnalyzer.java
Outdated
Show resolved
Hide resolved
accio-main/src/main/java/io/accio/main/wireprotocol/WireProtocolSession.java
Show resolved
Hide resolved
17c925e
to
e6f1384
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Description
Previously, we executed all SQL on the data source side. Accio serves as a pure SQL conversion layer. However, it introduces challenges when it comes to supporting new data source types. Implementing a data source connector requires extensive SQL rewriting or the implementation of specific SQL dialects. Each data source presents unique issues, making it cumbersome to expand our usage efficiently.
In this PR, we present an architecture to invoke DuckDB as a single PG engine. Run every PG-related query in DuckDB and others in DataSource.
How to do
DuckDB is a highly PG compatibility SQL engine
It implements many pg_catalog view and information_schema view. We directly sync the MDL to DuckDB, then we can invoke those schemas.
3 Level Query Flow
The main purpose to implement PG Wire Protocol is enhancing our client ecosystem. We won't expose those PG usage for user common using. We only care how the SQL behavior of BI tools or PG drivers. In the past experience, the most PG-related SQL is used to get the metadata. It means we don't need to execute all query in the data source side.
Level 1 - Metastore Full supported
The SQL is full supported by DuckDB without any rewrite.
Level 2 - Metastore Semi supported
The SQL isn't supported by DuckDB but it's related to PG Metadata. We should do some rewrite for it.
Leve 3 - Data source
The SQL is used to query real data. We should execute it in Data source side.
New Configuration
duckdb.max-concurrent-metadata-queries
We uses HikariCP as our connection pool to avoid to create connection repeatedly. This config is used to control how many max queries Accio will keep.