Add optimized cube materialization job #1294

shangyian · 2025-01-28T05:13:23Z

This will aim to consolidate the different types of cube materialization into a single job that chooses the most efficient way to materialize the cube for downstream use. This will be a new materialization job, to avoid conflict with existing jobs. The job will take advantage of the ability to build queries that compute pre-aggregated measures for a set of metrics and dimensions from #1242

The core DJ API will need to build the following pieces of materialization job metadata:

A list of measures queries: Each query computes the pre-aggregated measures for a specific subset of metrics and dimensions. For each measures query, we will additionally keep track of:
- The node it was generated for
- A list of measures and dimensions that it provides
- Spark configuration (it is unclear how we would configure this at the moment)
A combiner query: This query merges the results of the above measures queries into a single dataset.
Druid ingestion spec: Druid-specific configuration for the combined dataset
The temporal partition: This will need to be a shared dimension for all metrics in the cube.

shangyian self-assigned this Jan 28, 2025

shangyian mentioned this issue Feb 4, 2025

Consolidate to Single Cube Materialization Option #1304

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add optimized cube materialization job #1294

Add optimized cube materialization job #1294

shangyian commented Jan 28, 2025

Add optimized cube materialization job #1294

Add optimized cube materialization job #1294

Comments

shangyian commented Jan 28, 2025