Normalize and Improve Operator Runtime Statistics Handling #3171
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR addresses schema normalization and logic improvements for tracking operator runtime statistics in a workflow execution system. It introduces changes to the database schema, migration scripts, and Scala code responsible for inserting and managing runtime statistics. The goal is to reduce redundancy, improve maintainability, and ensure data consistency between
operator_executions
andoperator_runtime_statistics
.Schema Design
operator_executions
: Tracks execution metadata for each operator in a workflow execution. Each row containsoperator_execution_id
,workflow_execution_id
,operator_id
, andnum_workers
. This table ensures that operator executions are uniquely identifiable.operator_runtime_statistics
: Tracks runtime statistics for each operator execution at specific timestamps. It includesoperator_execution_id
as a foreign key, ensuring a direct reference tooperator_executions
.execution_id
andoperator_id
inworkflow_runtime_statistics
with a single foreign keyoperator_execution_id
, pointing tooperator_executions
.workflow_runtime_statistics
table into smaller, more manageable tables, eliminating redundancy and improving data integrity.operator_execution_id
andtime
inoperator_runtime_statistics
to speed up joins and queries ordered by time.Testing
The
core/scripts/sql/update/19.sql
will create the two new tables,operator_executions
andoperator_runtime_statistics
, and migrate the data fromworkflow_runtime_statistics
to those two tables. After the review is approved, I will add adrop table workflow_runtime_statistics
later in the script to remove the table.