Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] different select express leading to dozes of times performance difference #44353

Closed
2 of 3 tasks
BS490 opened this issue Nov 20, 2024 · 3 comments
Closed
2 of 3 tasks
Assignees

Comments

@BS490
Copy link

BS490 commented Nov 20, 2024

Search before asking

  • I had searched in the issues and found no similar issues.

Version

2.0.8

What's Wrong?

create table sql:
CREATE TABLE dws_mediago_bidder_dsp_multi_measures_hourly ( campaign_id int(11) NULL, asset_id int(11) NULL, ssp int(11) NULL, ad_id int(11) NULL, crid varchar(50) NULL, domain varchar(2000) NULL, ip_country varchar(200) NULL, account_id varchar(32) NULL, account_name varchar(500) NULL, company_id varchar(32) NULL, company_name varchar(500) NULL, am_name varchar(50) NULL, platform_type varchar(10) NULL, account_category varchar(100) NULL, company_region varchar(20) NULL, charge_type varchar(20) NULL, target_cpa DECIMAL(15, 9) NULL, d_s date NULL, h_s int(11) NULL, ad_count bigint(20) SUM NULL, all_req_num bigint(20) SUM NULL, account_gross_click_cost double SUM NULL, click bigint(20) SUM NULL, click_cost double SUM NULL, conversion bigint(20) SUM NULL, cv bigint(20) SUM NULL, imp bigint(20) SUM NULL, imp_cost double SUM NULL, vimp bigint(20) SUM NULL, mcv bigint(20) SUM NULL, flr_sum_fix double SUM NULL, bid_price_sum double SUM NULL, req_num double SUM NULL, prctr double SUM NULL, pclick double SUM NULL, req_ad_num bigint(20) SUM NULL ) ENGINE=OLAP AGGREGATE KEY( campaign_id , asset_id , ssp , ad_id , crid , domain , ip_country , account_id , account_name , company_id , company_name , am_name , platform_type , account_category , company_region , charge_type , target_cpa , d_s , h_s ) COMMENT 'OLAP' PARTITION BY RANGE( d_s , h_s )() DISTRIBUTED BY HASH( campaign_id ) BUCKETS 36 PROPERTIES ( "replication_allocation" = "tag.location.default: 3", "bloom_filter_columns" = "ssp, domain", "is_being_synced" = "false", "storage_format" = "V2", "light_schema_change" = "true", "disable_auto_compaction" = "false", "enable_single_replica_compaction" = "false" );

first sql:
image

second sql:
image

you can see different select express leading to dozens of times performance difference

first sql explain:
image
image

second sql explain:
image
image

difference:
VOlapScanNode PREAGGREGATION of first sql is off, second sql is on

What You Expected?

1.why VOlapScanNode PREAGGREGATION of first sql is off?
2.is VOlapScanNode PREAGGREGATION the key of performance diff?
3.is there any way to solve it?

How to Reproduce?

create table sql:
CREATE TABLE test2(k1date NULL,v2 int(11) SUM NULL DEFAULT "1" ) ENGINE=OLAP AGGREGATE KEY(k1) COMMENT 'OLAP' DISTRIBUTED BY HASH(k1) BUCKETS 32 PROPERTIES ( "replication_allocation" = "tag.location.default: 3", "is_being_synced" = "false", "storage_format" = "V2", "light_schema_change" = "true", "disable_auto_compaction" = "false", "enable_single_replica_compaction" = "false" );

explain select sum(if(k1='12',v2,0)) from test2;

Anything Else?

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@morrySnow
Copy link
Contributor

table is aggregate model. when pre-aggregation mode is OFF, storage layer will merge tuples of same key into one tuple before return data for execution layer.
when pre-aggregation mode is ON, storage will do nothing.

for example, we have a aggreagate model table t1 with

c1 key,
c2 int sum

and then insert data into t1 with

insert into t1 values(1, 1), (1, 2)

when pre-aggregation mode is OFF, execution layer get

1, 3

but when pre-aggregation mode is ON, execution layer get

1, 2
1, 1

So, we set pre-aggregation to ON only when we can ensure not merging data will not leading to wrong result. Currently, we could only support the aggregation function exactly same with aggregation function in table. We will support more pattern in future.

@ixzc
Copy link
Contributor

ixzc commented Dec 5, 2024

you can upgrade your doris to 2.1 latest version. we have fixed this in 2.1 version. for 2.0 version, there is too much change, so we didn't merge this.
#34738

@BS490
Copy link
Author

BS490 commented Dec 5, 2024

thank you for your answer

@BS490 BS490 closed this as completed Dec 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants