Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix](nereids) fix merge_percentile_to_array when has same agg function #44783

Merged
merged 1 commit into from
Dec 2, 2024

Conversation

feiniaofeiafei
Copy link
Contributor

@feiniaofeiafei feiniaofeiafei commented Nov 29, 2024

What problem does this PR solve?

Related PR: #34313

Problem Summary
The original PR did not handle the following scenario:

SELECT SUM(a), PERCENTILE(pk, 0.1) AS c1, PERCENTILE(pk, 0.1) AS c2, PERCENTILE(pk, 0.4) AS c3 FROM test_merge_percentile;

In this case, the aggregate outputs include two identical functions (PERCENTILE(pk, 0.1)). When constructing the LogicalProject, a map was used where the key is the child of an Alias and the value is the Alias itself. However, this approach loses information when two Aliases share the same child.
This PR modifies the map structure to use the child of an Alias as the key and a list of Alias objects as the value. This ensures that all Alias instances with the same child are preserved, resolving the issue of lost information in such cases.

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@feiniaofeiafei
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 40039 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 89069fbdecf22de26e07bf87579ec5c33faa47cb, data reload: false

------ Round 1 ----------------------------------
q1	17588	8175	7308	7308
q2	2082	189	168	168
q3	10586	1102	1149	1102
q4	10588	795	661	661
q5	7593	2733	2726	2726
q6	238	152	148	148
q7	960	613	588	588
q8	9237	1835	1952	1835
q9	6629	6491	6501	6491
q10	6965	2312	2329	2312
q11	475	253	261	253
q12	409	213	211	211
q13	17777	3028	3029	3028
q14	253	210	210	210
q15	567	542	516	516
q16	682	591	598	591
q17	981	552	583	552
q18	7464	6645	6579	6579
q19	1333	990	962	962
q20	461	186	180	180
q21	3986	3311	3321	3311
q22	374	324	307	307
Total cold run time: 107228 ms
Total hot run time: 40039 ms

----- Round 2, with runtime_filter_mode=off -----
q1	7375	7282	7279	7279
q2	333	232	232	232
q3	2927	2902	2933	2902
q4	2029	1833	1822	1822
q5	5647	5646	5622	5622
q6	241	150	152	150
q7	2274	1836	1834	1834
q8	3372	3552	3560	3552
q9	8955	8980	9017	8980
q10	3600	3574	3544	3544
q11	613	510	495	495
q12	790	622	601	601
q13	10600	3324	3281	3281
q14	311	276	291	276
q15	585	522	534	522
q16	677	670	638	638
q17	1835	1622	1616	1616
q18	8290	7740	7474	7474
q19	1689	1487	1540	1487
q20	2142	1879	1854	1854
q21	5562	5363	5468	5363
q22	649	592	626	592
Total cold run time: 70496 ms
Total hot run time: 60116 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 197747 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 89069fbdecf22de26e07bf87579ec5c33faa47cb, data reload: false

query1	1247	975	986	975
query2	6246	2120	2025	2025
query3	11079	4448	4373	4373
query4	68040	28639	23592	23592
query5	4932	468	453	453
query6	395	188	181	181
query7	5442	304	290	290
query8	309	219	243	219
query9	8203	2681	2665	2665
query10	413	250	251	250
query11	16994	15277	16008	15277
query12	165	104	104	104
query13	1405	413	408	408
query14	10627	7847	7342	7342
query15	216	195	197	195
query16	6761	535	498	498
query17	1053	585	592	585
query18	1019	329	321	321
query19	220	163	169	163
query20	118	113	112	112
query21	210	111	112	111
query22	4642	4456	4401	4401
query23	34972	34645	34638	34638
query24	5386	2562	2492	2492
query25	480	389	381	381
query26	642	153	150	150
query27	1808	300	291	291
query28	4194	2436	2480	2436
query29	669	428	431	428
query30	206	150	163	150
query31	1019	845	858	845
query32	65	55	58	55
query33	423	301	293	293
query34	912	533	551	533
query35	845	784	762	762
query36	1109	960	961	960
query37	117	76	81	76
query38	4494	4407	4378	4378
query39	1527	1519	1525	1519
query40	214	105	100	100
query41	44	41	43	41
query42	108	104	105	104
query43	551	509	500	500
query44	1205	848	850	848
query45	195	177	173	173
query46	1182	735	734	734
query47	2023	1981	1895	1895
query48	436	330	317	317
query49	724	402	393	393
query50	834	415	405	405
query51	7561	7270	7121	7121
query52	106	92	89	89
query53	253	182	178	178
query54	519	393	409	393
query55	85	75	77	75
query56	249	243	229	229
query57	1223	1125	1106	1106
query58	210	211	216	211
query59	3269	3094	3027	3027
query60	266	251	246	246
query61	111	106	109	106
query62	795	674	643	643
query63	220	185	193	185
query64	1330	675	635	635
query65	3279	3247	3216	3216
query66	700	303	301	301
query67	16095	15534	15807	15534
query68	3500	577	570	570
query69	433	250	247	247
query70	1141	1151	1146	1146
query71	403	252	249	249
query72	6380	4088	4086	4086
query73	781	357	360	357
query74	9712	8970	9138	8970
query75	3336	2691	2661	2661
query76	1889	1109	1090	1090
query77	546	281	274	274
query78	10491	9409	9315	9315
query79	2060	605	601	601
query80	1281	434	524	434
query81	510	230	233	230
query82	1281	127	115	115
query83	200	149	148	148
query84	285	75	69	69
query85	1040	312	309	309
query86	421	299	302	299
query87	4755	4561	4702	4561
query88	3800	2204	2190	2190
query89	433	305	282	282
query90	1901	194	189	189
query91	140	146	102	102
query92	63	49	54	49
query93	2874	549	543	543
query94	763	300	293	293
query95	346	258	256	256
query96	637	282	273	273
query97	2852	2619	2630	2619
query98	224	190	198	190
query99	1587	1298	1329	1298
Total cold run time: 318456 ms
Total hot run time: 197747 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 33.15 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 89069fbdecf22de26e07bf87579ec5c33faa47cb, data reload: false

query1	0.04	0.03	0.05
query2	0.07	0.03	0.03
query3	0.24	0.07	0.07
query4	1.62	0.10	0.10
query5	0.44	0.42	0.40
query6	1.17	0.66	0.65
query7	0.02	0.01	0.02
query8	0.04	0.03	0.03
query9	0.57	0.51	0.50
query10	0.56	0.54	0.56
query11	0.14	0.10	0.11
query12	0.14	0.11	0.11
query13	0.62	0.59	0.61
query14	2.83	2.76	2.74
query15	0.90	0.83	0.83
query16	0.38	0.39	0.37
query17	1.06	1.00	1.06
query18	0.22	0.21	0.21
query19	1.87	1.90	1.97
query20	0.01	0.01	0.01
query21	15.38	0.59	0.58
query22	2.67	2.80	1.88
query23	17.15	0.86	0.80
query24	3.00	1.33	1.38
query25	0.24	0.11	0.17
query26	0.34	0.14	0.15
query27	0.04	0.05	0.04
query28	10.11	1.10	1.07
query29	12.60	3.26	3.24
query30	0.25	0.07	0.06
query31	2.87	0.39	0.39
query32	3.26	0.47	0.46
query33	3.19	2.99	3.04
query34	17.19	4.49	4.56
query35	4.55	4.55	4.55
query36	0.67	0.49	0.48
query37	0.09	0.06	0.06
query38	0.04	0.04	0.04
query39	0.03	0.02	0.03
query40	0.16	0.13	0.13
query41	0.08	0.02	0.02
query42	0.03	0.02	0.02
query43	0.04	0.03	0.03
Total cold run time: 106.92 s
Total hot run time: 33.15 s

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Dec 2, 2024
Copy link
Contributor

github-actions bot commented Dec 2, 2024

PR approved by at least one committer and no changes requested.

Copy link
Contributor

github-actions bot commented Dec 2, 2024

PR approved by anyone and no changes requested.

@starocean999 starocean999 merged commit 70b0a08 into apache:master Dec 2, 2024
27 of 29 checks passed
github-actions bot pushed a commit that referenced this pull request Dec 2, 2024
…on (#44783)

Related PR: #34313

Problem Summary
The original PR did not handle the following scenario:
```sql
SELECT SUM(a), PERCENTILE(pk, 0.1) AS c1, PERCENTILE(pk, 0.1) AS c2, PERCENTILE(pk, 0.4) AS c3 FROM test_merge_percentile;
```
In this case, the aggregate outputs include two identical functions
(PERCENTILE(pk, 0.1)). When constructing the LogicalProject, a map was
used where the key is the child of an Alias and the value is the Alias
itself. However, this approach loses information when two Aliases share
the same child.
This PR modifies the map structure to use the child of an Alias as the
key and a list of Alias objects as the value. This ensures that all
Alias instances with the same child are preserved, resolving the issue
of lost information in such cases.
github-actions bot pushed a commit that referenced this pull request Dec 2, 2024
…on (#44783)

Related PR: #34313

Problem Summary
The original PR did not handle the following scenario:
```sql
SELECT SUM(a), PERCENTILE(pk, 0.1) AS c1, PERCENTILE(pk, 0.1) AS c2, PERCENTILE(pk, 0.4) AS c3 FROM test_merge_percentile;
```
In this case, the aggregate outputs include two identical functions
(PERCENTILE(pk, 0.1)). When constructing the LogicalProject, a map was
used where the key is the child of an Alias and the value is the Alias
itself. However, this approach loses information when two Aliases share
the same child.
This PR modifies the map structure to use the child of an Alias as the
key and a list of Alias objects as the value. This ensures that all
Alias instances with the same child are preserved, resolving the issue
of lost information in such cases.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. dev/2.1.x dev/3.0.x reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants