Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add optimized cube materialization job #1294

Open
shangyian opened this issue Jan 28, 2025 · 0 comments
Open

Add optimized cube materialization job #1294

shangyian opened this issue Jan 28, 2025 · 0 comments
Assignees

Comments

@shangyian
Copy link
Contributor

This will aim to consolidate the different types of cube materialization into a single job that chooses the most efficient way to materialize the cube for downstream use. This will be a new materialization job, to avoid conflict with existing jobs. The job will take advantage of the ability to build queries that compute pre-aggregated measures for a set of metrics and dimensions from #1242

The core DJ API will need to build the following pieces of materialization job metadata:

  • A list of measures queries: Each query computes the pre-aggregated measures for a specific subset of metrics and dimensions. For each measures query, we will additionally keep track of:
    • The node it was generated for
    • A list of measures and dimensions that it provides
    • Spark configuration (it is unclear how we would configure this at the moment)
  • A combiner query: This query merges the results of the above measures queries into a single dataset.
  • Druid ingestion spec: Druid-specific configuration for the combined dataset
  • The temporal partition: This will need to be a shared dimension for all metrics in the cube.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant