-
Notifications
You must be signed in to change notification settings - Fork 452
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GLUTEN-3951][CH]Bug fix floor diff #3956
Conversation
Run Gluten Clickhouse CI |
2 similar comments
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
8d21470
to
d4852e4
Compare
Run Gluten Clickhouse CI |
d4852e4
to
22f8c28
Compare
Run Gluten Clickhouse CI |
22f8c28
to
c8c3427
Compare
Run Gluten Clickhouse CI |
3 similar comments
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
300911b
to
fa11505
Compare
Run Gluten Clickhouse CI |
1 similar comment
Run Gluten Clickhouse CI |
@@ -180,6 +179,7 @@ static const std::map<std::string, std::string> SCALAR_FUNCTIONS | |||
{"add_months", "addMonths"}, | |||
{"date_trunc", "dateTrunc"}, | |||
{"floor_datetime", "dateTrunc"}, | |||
{"floor", "spark_floor"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
保持驼峰风格
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Run Gluten Clickhouse CI |
2 similar comments
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
d1718e9
to
e9ce76f
Compare
Run Gluten Clickhouse CI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
What changes were proposed in this pull request?
(Please fill in changes proposed in this fix)
(Fixes: #3951)
How was this patch tested?
TEST BY UT
端到端性能测试
数据类型为Int64,表结构 test_tbl(d Int64), 测试SQL:
select count(1) from test_tbl where floor(d) > 1
数据总量3000W, 测试三次PR 改动前:1.13s, 0.92s, 0.985s
PR 改动后: 1.064s, 1.077s, 0.984s
数据类型为Float64, 表结构为test_tbl(d float64) , 测试SQL
select count(1) from test_tbl where floor(d) > 1
数据总量3000W, 测试三次PR 改动前: 1.417s, 1.386s 1.426s
PR 改动后:1.568s, 1.476s, 1.508s
可见对于Int64类型来说,改动前后性能基本持平;对于float64类型来说,大约有7.6%的性能回退,主要是来自于针对数据中可能出现NaN 以及INF 的情况进行了判断和赋值。
benchmark 性能测试
使用开发的
benchmark_spark_floor_function.cpp
来测试Int64类型测试
对于CH 的Floor函数, 结果如下
对于新开发的Floor函数,结果如下
Float64类型测试
对于CH的Floor函数,结果如下
对于新开发的Floor函数,结果如下
可见对于Int64,大概有 3%左右的回退,对于Float64类型 大概有70%左右的回退
Spark UT 关于Floor 函数的测试,会通过
org.apache.spark.sql.GlutenMathFunctionsSuite
这个测试来完成,已开启