Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(cost-model): migration of compute cost parts #32

Closed
wants to merge 52 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
52 commits
Select commit Hold shift + click to select a range
c92a3af
Partial migration of filter
lanlou1554 Nov 14, 2024
7b0158c
Change col to attr
lanlou1554 Nov 14, 2024
81f8d50
implement cost computation for limit
xx01cyx Nov 14, 2024
2a5740e
add author
xx01cyx Nov 14, 2024
f7f6857
introduce ColumnCombValueStats
xx01cyx Nov 14, 2024
be430ac
refactor AttributeCombValueStats and introduce statistic-related data…
xx01cyx Nov 14, 2024
089cfef
Change col to attr in filter
lanlou1554 Nov 14, 2024
69607f1
Complete partial implementation of filter
lanlou1554 Nov 14, 2024
6518e00
Add get_attribute_comb_stats
lanlou1554 Nov 14, 2024
59a8889
Finish first draft version of filter functionality
lanlou1554 Nov 14, 2024
5070a78
Add comment for the guideline of re-designing PredicateNode
lanlou1554 Nov 15, 2024
740ab11
introduce IdPred and make AttributeRefPred store table id and attr index
xx01cyx Nov 15, 2024
85cd0d1
add get method for id pred and add comments
xx01cyx Nov 15, 2024
7775b88
add check for derived column in AttributeRefPred
xx01cyx Nov 15, 2024
b60c632
make get_attributes_comb_statistics return Option
xx01cyx Nov 15, 2024
3646eca
implement agg cost computation
xx01cyx Nov 15, 2024
db555ff
move filter-related constants to stats crate
xx01cyx Nov 15, 2024
64f4a10
fix clippy
xx01cyx Nov 15, 2024
cafd01c
Resolve the optional comb stats, remove table id in filter
lanlou1554 Nov 15, 2024
5c5a40f
Refactor filter implementation
lanlou1554 Nov 15, 2024
dd6598a
Resolve conflict with main
lanlou1554 Nov 16, 2024
03b6ec3
Refactor cost model storage
lanlou1554 Nov 16, 2024
a3b8088
Move storage attribute to mod
lanlou1554 Nov 16, 2024
c07b9fc
Add initial test framework in cost_model.rs
lanlou1554 Nov 16, 2024
86f6fc2
Fix typo in initial test framework
lanlou1554 Nov 16, 2024
2c1f09b
Modify initial test framework
lanlou1554 Nov 17, 2024
ebab829
Finish most tests for filter
lanlou1554 Nov 17, 2024
a8f92c3
Finish all tests for filter
lanlou1554 Nov 17, 2024
2c9240f
Add important tricky todo
lanlou1554 Nov 17, 2024
d6e1825
Improve filter tests
lanlou1554 Nov 17, 2024
082f0be
refine test infra
xx01cyx Nov 17, 2024
e183f02
add test for cost model agg
xx01cyx Nov 17, 2024
0059141
make all data types u64 instead of usize
xx01cyx Nov 17, 2024
303d73c
merge main and resolve conflicts
xx01cyx Nov 18, 2024
ec0afa6
copy paste join cardinality calculation
xx01cyx Nov 18, 2024
6d50843
make join compile
xx01cyx Nov 18, 2024
a4ff526
rename col -> attr
xx01cyx Nov 18, 2024
0ba4132
refactor join to not pass in logical props
xx01cyx Nov 18, 2024
ab15f05
make statistics f64 instead of u64
xx01cyx Nov 18, 2024
b682c73
split join into multiple files
xx01cyx Nov 18, 2024
5d73141
reorganize join
xx01cyx Nov 18, 2024
51f917d
refine test infra
xx01cyx Nov 18, 2024
5197090
add test infra for join
xx01cyx Nov 18, 2024
68b2885
refine mock interface
xx01cyx Nov 18, 2024
36b93b9
make CostModelStorageManagerImpl::get_attribute_info unimplemented
xx01cyx Nov 18, 2024
11a3a4e
modify MemoExt interface
xx01cyx Nov 18, 2024
8c4191f
rename AttrRefPred -> AttrIndexPred and revert back to initial design
xx01cyx Nov 18, 2024
1569fc5
Modify the tests of filter and agg
lanlou1554 Nov 19, 2024
489ff48
add join test
xx01cyx Nov 19, 2024
be71afb
pass group id to join and fix filter-related tests
xx01cyx Nov 19, 2024
624d040
fix all join tests
xx01cyx Nov 19, 2024
f8a0e70
Change filter controller name
lanlou1554 Nov 19, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
708 changes: 705 additions & 3 deletions Cargo.lock

Large diffs are not rendered by default.

675 changes: 656 additions & 19 deletions optd-cost-model/Cargo.lock

Large diffs are not rendered by default.

6 changes: 6 additions & 0 deletions optd-cost-model/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
name = "optd-cost-model"
version = "0.1.0"
edition = "2021"
authors = ["Yuanxin Cao", "Lan Lou", "Kunle Li"]

[dependencies]
optd-persistent = { path = "../optd-persistent", version = "0.1" }
Expand All @@ -10,10 +11,15 @@ serde_json = "1.0"
serde_with = { version = "3.7.0", features = ["json"] }
arrow-schema = "53.2.0"
datafusion-expr = "32.0.0"
datafusion = "32.0.0"
ordered-float = "4.0"
chrono = "0.4"
itertools = "0.13"
assert_approx_eq = "1.1.0"
trait-variant = "0.1.2"
tokio = { version = "1.0.1", features = ["macros", "rt-multi-thread"] }

[dev-dependencies]
crossbeam = "0.8"
rand = "0.8"
test-case = "3.3"
39 changes: 35 additions & 4 deletions optd-cost-model/src/common/nodes.rs
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
use std::sync::Arc;
use core::fmt;
use std::{fmt::Display, sync::Arc};

use arrow_schema::DataType;

Expand All @@ -24,6 +25,12 @@ pub enum JoinType {
RightAnti,
}

impl Display for JoinType {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
write!(f, "{:?}", self)
}
}

/// TODO: documentation
#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)]
pub enum PhysicalNodeType {
Expand All @@ -49,8 +56,7 @@ impl std::fmt::Display for PhysicalNodeType {
pub enum PredicateType {
List,
Constant(ConstantType),
AttributeRef,
ExternAttributeRef,
AttrIndex,
UnOp(UnOpType),
BinOp(BinOpType),
LogOp(LogOpType),
Expand All @@ -77,7 +83,7 @@ pub struct PredicateNode {
/// A generic predicate node type
pub typ: PredicateType,
/// Child predicate nodes, always materialized
pub children: Vec<PredicateNode>,
pub children: Vec<ArcPredicateNode>,
/// Data associated with the predicate, if any
pub data: Option<Value>,
}
Expand All @@ -94,3 +100,28 @@ impl std::fmt::Display for PredicateNode {
write!(f, ")")
}
}

impl PredicateNode {
pub fn child(&self, idx: usize) -> ArcPredicateNode {
self.children[idx].clone()
}

pub fn unwrap_data(&self) -> Value {
self.data.clone().unwrap()
}
}
pub trait ReprPredicateNode: 'static + Clone {
fn into_pred_node(self) -> ArcPredicateNode;

fn from_pred_node(pred_node: ArcPredicateNode) -> Option<Self>;
}

impl ReprPredicateNode for ArcPredicateNode {
fn into_pred_node(self) -> ArcPredicateNode {
self
}

fn from_pred_node(pred_node: ArcPredicateNode) -> Option<Self> {
Some(pred_node)
}
}
42 changes: 42 additions & 0 deletions optd-cost-model/src/common/predicates/attr_index_pred.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
use crate::common::{
nodes::{ArcPredicateNode, PredicateNode, PredicateType, ReprPredicateNode},
values::Value,
};

/// [`AttributeIndexPred`] represents the position of an attribute in a schema or
/// [`GroupAttrRefs`].
///
/// The `data` field holds the index of the attribute in the schema or [`GroupAttrRefs`].
#[derive(Clone, Debug)]
pub struct AttrIndexPred(pub ArcPredicateNode);

impl AttrIndexPred {
pub fn new(attr_idx: u64) -> AttrIndexPred {
AttrIndexPred(
PredicateNode {
typ: PredicateType::AttrIndex,
children: vec![],
data: Some(Value::UInt64(attr_idx)),
}
.into(),
)
}

/// Gets the attribute index.
pub fn attr_index(&self) -> u64 {
self.0.data.as_ref().unwrap().as_u64()
}
}

impl ReprPredicateNode for AttrIndexPred {
fn into_pred_node(self) -> ArcPredicateNode {
self.0
}

fn from_pred_node(pred_node: ArcPredicateNode) -> Option<Self> {
if pred_node.typ != PredicateType::AttrIndex {
return None;
}
Some(Self(pred_node))
}
}
47 changes: 47 additions & 0 deletions optd-cost-model/src/common/predicates/bin_op_pred.rs
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
use crate::common::nodes::{ArcPredicateNode, PredicateNode, PredicateType, ReprPredicateNode};

/// TODO: documentation
#[derive(Copy, Clone, PartialEq, Eq, Hash, Debug)]
pub enum BinOpType {
Expand Down Expand Up @@ -38,3 +40,48 @@ impl BinOpType {
)
}
}

#[derive(Clone, Debug)]
pub struct BinOpPred(pub ArcPredicateNode);

impl BinOpPred {
pub fn new(left: ArcPredicateNode, right: ArcPredicateNode, op_type: BinOpType) -> Self {
BinOpPred(
PredicateNode {
typ: PredicateType::BinOp(op_type),
children: vec![left, right],
data: None,
}
.into(),
)
}

pub fn left_child(&self) -> ArcPredicateNode {
self.0.child(0)
}

pub fn right_child(&self) -> ArcPredicateNode {
self.0.child(1)
}

pub fn op_type(&self) -> BinOpType {
if let PredicateType::BinOp(op_type) = self.0.typ {
op_type
} else {
panic!("not a bin op")
}
}
}

impl ReprPredicateNode for BinOpPred {
fn into_pred_node(self) -> ArcPredicateNode {
self.0
}

fn from_pred_node(pred_node: ArcPredicateNode) -> Option<Self> {
if !matches!(pred_node.typ, PredicateType::BinOp(_)) {
return None;
}
Some(Self(pred_node))
}
}
49 changes: 49 additions & 0 deletions optd-cost-model/src/common/predicates/cast_pred.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
use arrow_schema::DataType;

use crate::common::nodes::{ArcPredicateNode, PredicateNode, PredicateType, ReprPredicateNode};

use super::data_type_pred::DataTypePred;

/// [`CastPred`] casts a column from one data type to another.
///
/// A [`CastPred`] has two children:
/// 1. The original data to cast
/// 2. The target data type to cast to
#[derive(Clone, Debug)]
pub struct CastPred(pub ArcPredicateNode);

impl CastPred {
pub fn new(child: ArcPredicateNode, cast_to: DataType) -> Self {
CastPred(
PredicateNode {
typ: PredicateType::Cast,
children: vec![child, DataTypePred::new(cast_to).into_pred_node()],
data: None,
}
.into(),
)
}

pub fn child(&self) -> ArcPredicateNode {
self.0.child(0)
}

pub fn cast_to(&self) -> DataType {
DataTypePred::from_pred_node(self.0.child(1))
.unwrap()
.data_type()
}
}

impl ReprPredicateNode for CastPred {
fn into_pred_node(self) -> ArcPredicateNode {
self.0
}

fn from_pred_node(pred_node: ArcPredicateNode) -> Option<Self> {
if !matches!(pred_node.typ, PredicateType::Cast) {
return None;
}
Some(Self(pred_node))
}
}
Loading