Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Cost Estimator Using Past Statistics for Schedule Generator #3156

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

Xiao-zhen-Liu
Copy link
Collaborator

@Xiao-zhen-Liu Xiao-zhen-Liu commented Dec 15, 2024

This PR introduces the CostEstimator trait which estimates the cost of a region, given some resource units.

  • The cost estimator is used by CostBasedScheduleGenerator to calculate the cost of a schedule during search.
  • Currently we only consider one type of schedule for each region plan, which is a total order of the regions. The cost of the schedule (and also the cost of the region plan) is thus the summation of the cost of each region.
  • The resource units are currently passed as placeholders because we assume a region will have all the resources when doing the estimation. The units may be used in the future if we consider different methods of schedule-generation. For example, if we allow two regions to run concurrently, the units will be split in half for each region.

A DefaultCostEstimator implementation is also added, which uses past execution statistics to estimate the wall-clock runtime of a region:

  • The runtime of each region is represented by the runtime of its longest-running operator.
  • The runtime of operators are estimated using the statistics from the latest successful execution of the workflow.
  • If such statistics do not exist (e.g., if it is the first execution, or if past executions all failed), we fall back to using number of materialized edges as the cost.
  • Added test cases using mock mysql data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant