BigQuery Connector: Ability to use a different field for partitioning #39001
Replies: 1 comment 1 reply
-
@timorkal Could you describe a little more about why partitioning specifically needs to be on another field in the Airbyte destination table itself? Generally when you're dealing with incremental syncs, you want the destination table to be partitioned by the ingestion time because it means only the current partition is affected, and minimizes the partitions that need to be scanned when merging the data from the temporary table in Typically I don't recommend using the Airbyte destination tables directly in a lot of cases, where it makes more sense to use something like dbt or Dataform (or even scheduled or triggered queries) to materialize a read-optimized version of the data that is partitioned and clustered based on the needs of the use-case (and pre-aggregated to the appropriate granularity by default. Doesn't mean you can't use them, just that they're generally optimized for write-performance and not so much read performance. At least that's my experience. Clustering has a similar problem. While you might be able to add additional fields, the key field ( Which again, takes us back to the fact that most people would use a modeling tool to build reporting models that optimize for read. Curious if you're already doing that with this or other sources, and why or why not. (With that said, if the source is non-incremental, I do think it would be nice to be able to set partitioning and clustering fields, as now these aren't needed to be optimized for deduplication.) |
Beta Was this translation helpful? Give feedback.
-
Currently the table which is created is using "_airbyte_extracted_at" as the field for partitioning, but I would like to use a different field.
This is very imporant for table with historical data, where there is already another field which they need to partitioned by.
The default "_airbyte_extracted_at" is not a good choice because as soon as the syncing happens all historical data are getting into the same partition of the day of the sync.
Beta Was this translation helpful? Give feedback.
All reactions