Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CH-436] support columns with different nullable type when split for union #437

Open
wants to merge 765 commits into
base: clickhouse_backend
Choose a base branch
from
Open
Changes from 1 commit
Commits
Show all changes
765 commits
Select commit Hold shift + click to select a range
1b298db
Improve black check: show diff in the output
Felixoid Mar 28, 2022
046e742
Fix version string update, fix #35518
Felixoid Mar 24, 2022
3d43bce
Make GITHUB_RUN_URL variable and use it
Felixoid Mar 24, 2022
7b01fe2
Add build-url label to built docker images
Felixoid Mar 24, 2022
e93a06c
Merge pull request #33664 from ClickHouse/release-steps
alesapin Mar 22, 2022
dec0ec8
Merge pull request #35533 from ClickHouse/simplify_strip
alesapin Mar 25, 2022
5b06fba
Push only to the new CI DB
alesapin Mar 25, 2022
5a5a174
Remove outdated links from CI
alesapin Mar 28, 2022
8cd46fa
Merge pull request #35766 from ClickHouse/resurrect_official_flag
alesapin Mar 30, 2022
f2e41bc
Merge pull request #35774 from ClickHouse/ressurect_build_hash_v2
alesapin Mar 31, 2022
9234a3c
Merge pull request #35308 from ClickHouse/clickhouse-keeper
alesapin Mar 28, 2022
49bafdc
Merge pull request #35211 from ClickHouse/release-docker
Felixoid Apr 1, 2022
899d7f2
Merge pull request #35854 from ClickHouse/docker-master-head
Felixoid Apr 1, 2022
b03e8ca
Backport tests/ci/upload_result_helper.py
Felixoid Apr 4, 2022
b4cd668
Backport #35733 to 22.3: Added settings for insert of invalid IPv6, I…
Apr 4, 2022
c9a1d9c
Backport #35820 to 22.3: Avoid processing per-column TTL multiple times
Apr 4, 2022
697dd21
Merge pull request #35909 from ClickHouse/backport/22.3-release
Felixoid Apr 4, 2022
ad0a62d
Merge pull request #35881 from ClickHouse/backport/22.3/35799
alexey-milovidov Apr 4, 2022
82735cb
Merge pull request #35928 from ClickHouse/backport/22.3/35733
kitaisreal Apr 5, 2022
abb756d
Merge pull request #35938 from ClickHouse/backport/22.3/35820
CurtizJ Apr 5, 2022
025a573
remove input rel and support java iter from local files
liuneng1994 Apr 6, 2022
55da56d
Update version to 22.3.4.44
Felixoid Apr 6, 2022
84020f5
Tiny improvements to git and version helpers
Felixoid Apr 6, 2022
0a43cfe
Improve and fix edge cases for docker_server.py
Felixoid Apr 6, 2022
27ee3a4
add coalesce operator
liuneng1994 Apr 7, 2022
574025a
fix memory double free
liuneng1994 Apr 7, 2022
b228517
Merge remote-tracking branch 'origin/21.9' into local_engine_with_col…
liuneng1994 Apr 7, 2022
507e81a
fix compile error in benchmark
liuneng1994 Apr 7, 2022
131f6a6
fix metrics error
liuneng1994 Apr 7, 2022
ee1b577
Fix action for docker images build
Felixoid Apr 7, 2022
7d6fd3d
A temporary fix for artifactory push before multiple architectures
Felixoid Apr 7, 2022
e8168a1
Add python unit tests to backport workflow
Felixoid Apr 7, 2022
a365ef5
Move version_arg to version_helper, add tests
Felixoid Apr 7, 2022
df57f8e
Merge pull request #36028 from ClickHouse/backport/fix-release-workflow
Felixoid Apr 7, 2022
2cf5ddf
add mergetree data generate tool
liuneng1994 Apr 8, 2022
04f61b8
change jni package name
liuneng1994 Apr 8, 2022
995a6fb
add new benchmark
liuneng1994 Apr 11, 2022
8323fd6
Merge remote-tracking branch 'origin/21.9' into local_engine_with_col…
liuneng1994 Apr 11, 2022
2307ada
fix cmake error for googlebenchmark
liuneng1994 Apr 11, 2022
59ebc57
fix rebase error
liuneng1994 Apr 12, 2022
97c4bc2
change benchmark error column
liuneng1994 Apr 12, 2022
401d6ec
add time print in transform
liuneng1994 Apr 12, 2022
502c26c
fix filter actionDag has useless column
liuneng1994 Apr 12, 2022
a1c4e68
fix parse alias function failed
liuneng1994 Apr 12, 2022
5ed3db3
fix compile error in benchmark
liuneng1994 Apr 7, 2022
f212656
fix metrics error
liuneng1994 Apr 7, 2022
69660e5
add mergetree data generate tool
liuneng1994 Apr 8, 2022
bafd8ff
change jni package name
liuneng1994 Apr 8, 2022
f642f82
add new benchmark
liuneng1994 Apr 11, 2022
95f5b44
change benchmark error column
liuneng1994 Apr 12, 2022
69a445f
add time print in transform
liuneng1994 Apr 12, 2022
b9287b6
fix filter actionDag has useless column
liuneng1994 Apr 12, 2022
0d8fce3
fix parse alias function failed
liuneng1994 Apr 12, 2022
43acb9b
Revert "add time print in transform"
liuneng1994 Apr 13, 2022
8c6bb93
Merge branch '22.3' into local_engine_with_columnar_shuffle_no_rebase
liuneng1994 Apr 13, 2022
c342630
fix rebase error
liuneng1994 Apr 13, 2022
3c99723
fix rebase error
liuneng1994 Apr 13, 2022
2c22878
fix cmake rebase error
liuneng1994 Apr 13, 2022
705001a
fix rebase compile error
liuneng1994 Apr 16, 2022
e4d2372
add decompress benchmakr
liuneng1994 Apr 18, 2022
8153afc
resolve conflicts
liuneng1994 Apr 19, 2022
b0cd49c
Merge pull request #2 from liuneng1994/local_engine_with_columnar_shu…
liuneng1994 Apr 19, 2022
74c6a7e
Revert "add time print in transform"
liuneng1994 Apr 19, 2022
04b4e51
Merge pull request #4 from liuneng1994/revert_time_print
liuneng1994 Apr 19, 2022
ee7d3c8
Support TPCH Q1 (#8)
liuneng1994 Apr 27, 2022
1d9ce86
fix hash partition error (#14)
liuneng1994 Apr 28, 2022
c4cbcd6
[CH-18] Supported substrait cast node (#19)
zzcclp May 9, 2022
e4f0339
add blob and s3 read support (#20)
liuneng1994 May 10, 2022
15db61f
Support join (#25)
liuneng1994 May 25, 2022
bfe7a84
support tpch q14 (#26)
liuneng1994 May 25, 2022
0ce6a3a
fix join use nulls and support post join filter (#27)
liuneng1994 Jun 1, 2022
5c2784b
Fix shuffle column error (#29)
liuneng1994 Jun 6, 2022
a4cf20e
add stacktrace log (#30)
liuneng1994 Jun 6, 2022
f99279e
Support extract and substring function (#31)
zzcclp Jun 6, 2022
4765f0b
add function result cast (#32)
liuneng1994 Jun 7, 2022
fd77999
add native c2r (#33)
liuneng1994 Jun 9, 2022
5def534
Support broadCast join (#34)
liuneng1994 Jun 15, 2022
391b322
a lot of optimization (#35)
liuneng1994 Jun 22, 2022
4f14406
Support new shuffle (#36)
liuneng1994 Jun 30, 2022
fa2eaa0
fix mem leak (#37)
liuneng1994 Jul 1, 2022
d59bc9f
Using NewWeakGlobalRef instead of NewGlobalRef (#38)
zzcclp Jul 12, 2022
c695853
fix join duplicate table error (#39)
liuneng1994 Jul 13, 2022
7abd83d
add context clean when unload lib (#40)
liuneng1994 Jul 14, 2022
f4dfe2d
Optimize clickhouse arrow parquet reader (#41)
liuneng1994 Jul 22, 2022
6aac94d
Support nullable datatype (#51)
liuneng1994 Aug 18, 2022
b898c4b
support const column in c2r (#54)
liuneng1994 Aug 19, 2022
f3d754d
fix columnar shuffle split failed (#57)
liuneng1994 Aug 22, 2022
56c4bfc
upgrade substrait (#59)
liuneng1994 Aug 22, 2022
1385675
add BlockIterator to manage memory between java and cpp (#62)
liuneng1994 Aug 24, 2022
47b50f0
Support expr on broadcast (#64)
liuneng1994 Aug 25, 2022
169c1bb
issue #48 Optimize Arrow Parquet Reader (#61)
Aug 25, 2022
121d05a
fix compile error (#66)
liuneng1994 Aug 25, 2022
c0f737e
issue #48 fix null value case (#90)
Aug 29, 2022
1df1d49
Support expr eval and Support NULL literal (#92)
liuneng1994 Aug 29, 2022
b14d26a
[CH-87] fix min max on date32 (#96)
Aug 30, 2022
9feb3d2
fix aggregate nullable boolean column failed (#99)
liuneng1994 Aug 31, 2022
086a081
support singular_or_list (#101)
liuneng1994 Sep 1, 2022
874ad45
skip empty block when read shuffle data (#102)
liuneng1994 Sep 1, 2022
c635380
remove unused column before filter (#103)
liuneng1994 Sep 2, 2022
4b36351
support select constant (#104)
liuneng1994 Sep 2, 2022
6a40950
[CH-107] Support Native RowToColumnar (#114)
Sep 8, 2022
1643fd1
修复编译过程中的Warning (#106)
taiyang-li Sep 14, 2022
db56bf1
add check style (#122)
liuneng1994 Sep 15, 2022
dee2ac6
Multiple processes transfer parquet to mergetree (#110)
zhanglistar Sep 15, 2022
2c0c4cc
catch c++ exceptions and rethrow java exceptions (#126)
lgbo-ustc Sep 16, 2022
2706a4d
fixed : cover more jni interfaces for catching c++ exceptions (#127)
lgbo-ustc Sep 20, 2022
0889631
[CH-129][Followup] Support substrait SingularOrList (#131)
zzcclp Sep 21, 2022
c5ba2bf
Solve conflict symbols of DB::ParquetBlockInputFormat and add some be…
taiyang-li Sep 21, 2022
31a4432
Support functions for clickhouse backend: lower/upper/ltrim/rtrim (#117)
taiyang-li Sep 22, 2022
f5909f2
Support conversion between spark timestamp and ch datetime64 (#119)
taiyang-li Sep 22, 2022
3d1db97
Reduce log output (#134)
liuneng1994 Sep 27, 2022
198ba4e
refactor the file sources (#130)
lgbo-ustc Sep 30, 2022
d6a6cc9
improve the performance of converting row batch to column batch (#136)
lgbo-ustc Sep 30, 2022
1eed312
improve: catch java exception in c++ (#143)
lgbo-ustc Oct 10, 2022
e3b48c1
Support loading setting from config file and improve logging. (#118)
taiyang-li Oct 12, 2022
48322be
revert log level to error (#148)
liuneng1994 Oct 13, 2022
729900c
[CH-123] Support short/byte/binary/decimal/array/map/struct (#128)
taiyang-li Oct 14, 2022
8ba0bed
Revert "[CH-123] Support short/byte/binary/decimal/array/map/struct (…
liuneng1994 Oct 17, 2022
04939a8
support right semi join on substrait (#164)
liuneng1994 Oct 20, 2022
50f3be5
[CH-120] Support memory manager (#168)
liuneng1994 Oct 21, 2022
057f1e9
fixed a bug: coredump caused by transform a row batch with empty requ…
lgbo-ustc Oct 24, 2022
45d115f
[#156]support sort op (#160)
lgbo-ustc Oct 24, 2022
1b53b64
[CH-169] add reserve memory no exception (#173)
liuneng1994 Oct 26, 2022
17bdf88
add micro for jni env (#177)
liuneng1994 Oct 26, 2022
844e7f5
[CH-45]support count(*/count(1) (#175)
lgbo-ustc Oct 31, 2022
ce3bc0b
[CH-180] Support non-HA mode for ClickHouse reading from HDFS (#181)
zzcclp Oct 31, 2022
fc4843e
fix concurrent problem in allocator (#183)
liuneng1994 Nov 2, 2022
34245a0
[CH-170] Implement strings functions between spark and clickhouse: co…
taiyang-li Nov 3, 2022
d2b8933
[CH-123] Support short/byte/binary/decimal/array/map/struct (#163)
taiyang-li Nov 4, 2022
bc28ef0
[CH-187]Support spark math functions (#188)
taiyang-li Nov 11, 2022
1cbbf75
[CH-184] support prewhere (#185)
liuneng1994 Nov 11, 2022
4fda39e
[CH-190] enable tests in GlutenDataFrameAggregateSuite (#192)
Nov 13, 2022
0fc2fae
[CH-197] Fix bug when c2r with const columns (#198)
taiyang-li Nov 18, 2022
af81cb0
close https://github.com/Kyligence/ClickHouse/issues/199 (#200)
taiyang-li Nov 18, 2022
f48541f
change any to string value (#193)
liuneng1994 Nov 18, 2022
60e9258
fixed (#207)
lgbo-ustc Nov 21, 2022
e225f47
[CH-204] Fixed a bug in initializing settings
lgbo-ustc Nov 21, 2022
e5b7449
[CH-186] support `RangePartitioning` (#189)
lgbo-ustc Nov 22, 2022
b67c21b
finish shift left and right (#217)
taiyang-li Nov 30, 2022
294cd16
fix failed ch ut in https://github.com/oap-project/gluten/pull/620 (#…
taiyang-li Dec 1, 2022
97c3f4f
support trim both (#211)
taiyang-li Dec 5, 2022
ad6dc45
finish debug (#209)
taiyang-li Dec 6, 2022
79b4286
[CH-225]Fix decimal bug cased by big-endian encoding in spark row. (#…
taiyang-li Dec 7, 2022
0a7ccba
Improve: reducing the open operation for the same file when reading p…
lgbo-ustc Dec 12, 2022
e571255
[CH-191] Support generate exec (#194)
taiyang-li Dec 13, 2022
dc9b918
Add spark check_overflow function and cast toDecimal32/64/128 (#231)
loneylee Dec 13, 2022
d5af5eb
support map[key] and array[index] operator for gluten (#216)
taiyang-li Dec 15, 2022
0f5b2ac
support orc format files (#214)
lgbo-ustc Dec 19, 2022
a9e77e2
[CH-236] support split/pmod/factorial/rand/ascii/concat_ws function …
taiyang-li Dec 19, 2022
f091941
Follow substrait's naming for bitwise functions (#238)
loneylee Dec 19, 2022
11ecd0e
[CH-219] Support rlike/regexp_replace/regexp_extract/coalesce/DATE_AD…
taiyang-li Dec 20, 2022
5ab0b59
fixed (#243)
lgbo-ustc Dec 26, 2022
de33c20
support window (#235)
lgbo-ustc Dec 26, 2022
d7b372d
optimization for order by limit (#246)
lgbo-ustc Jan 5, 2023
5cd2323
Merge remote-tracking branch 'offical/master' into clickhouse_backend…
liuneng1994 Jan 6, 2023
6d44de1
[CH-253]Fix nullable struct/array/map cast error (#255)
loneylee Jan 9, 2023
5968499
[CH-227] Support more fields of function extract (#252)
taiyang-li Jan 10, 2023
19f2501
[CH-239] Support queries on struct fields (#244)
lgbo-ustc Jan 10, 2023
037ffdf
fixed a bug related to count(1)/count(*) (#260)
lgbo-ustc Jan 11, 2023
a4b75a7
fix problems
liuneng1994 Jan 12, 2023
ad74941
[CH-262] Skip memory copy between onheap and native when using column…
zzcclp Jan 12, 2023
fff7ada
[Gluten-826]fix Inset is empty (#258)
loneylee Jan 13, 2023
e23a88d
Update version to 23.1.2.1
Felixoid Jan 25, 2023
c82a0ee
Backport #45636 to 23.1: Trim refs/tags/ from GITHUB_TAG in release w…
robot-clickhouse Jan 26, 2023
67a7409
Merge pull request #45647 from ClickHouse/backport/23.1/45636
Felixoid Jan 26, 2023
6deedac
Backport #45603 to 23.1: Fix wiping sensitive info in logs
robot-clickhouse Jan 26, 2023
0f4d326
Backport #45630 to 23.1: Fix performance of short queries with `Array…
robot-clickhouse Jan 27, 2023
e5ddaf7
Merge pull request #45705 from ClickHouse/backport/23.1/45630
robot-ch-test-poll4 Jan 27, 2023
05d4087
Backport #45686 to 23.1: Fix key description when encountering duplic…
robot-clickhouse Jan 28, 2023
ce39cf7
[Gluten-867] support base64&&unbase64 functions (#266)
zheniantoushipashi Jan 28, 2023
e5daa08
Merge pull request #45730 from ClickHouse/backport/23.1/45686
alexey-milovidov Jan 29, 2023
8dfb170
Merge pull request #45673 from ClickHouse/backport/23.1/45603
alexey-milovidov Jan 29, 2023
a29c0f0
Update version to 23.1.3.1
alesapin Jan 29, 2023
025bf56
[CH-86] Support Spark 3.3.1 for Clickhouse Backend (#274)
zzcclp Jan 30, 2023
2000c2b
Backport #45818 to 23.1: Get rid of progress timestamps in release pu…
robot-clickhouse Jan 31, 2023
b121ddb
Merge pull request #45853 from ClickHouse/backport/23.1/45818
Felixoid Jan 31, 2023
fb70f6c
[CH-232] Support DataTypeNothing, which close https://github.com/Kyli…
taiyang-li Feb 1, 2023
90a3d69
[Gluten-860]fix insert function result not correct (#265)
loneylee Feb 1, 2023
f472d55
[Gluten-898]Update substrait proto (#277)
loneylee Feb 1, 2023
f9c3fb3
Backport #45871 to 23.1: Fix ipv6 parser
robot-clickhouse Feb 1, 2023
548b494
Merge pull request #45896 from ClickHouse/backport/23.1/45871
alexey-milovidov Feb 2, 2023
d5efbbf
[250] Support grouping sets (#251)
lgbo-ustc Feb 2, 2023
3d2b48f
[CH-264] support more json functions (#264)
lgbo-ustc Feb 7, 2023
f2a277f
[CH-272]Fix union join error (#273)
loneylee Feb 7, 2023
af020b5
[Gluten-926]sync timezone to backend (#286)
loneylee Feb 8, 2023
15d8a81
support function md5/lpad/rpad/reverse (#281)
taiyang-li Feb 8, 2023
8e35706
fixed: count(1)/sum(1) on window (#288)
lgbo-ustc Feb 8, 2023
c18942f
support more struct functions (#269)
lgbo-ustc Feb 10, 2023
83c5c94
Merge branch 'clickhouse_backend' into upgrade_clickhouse
liuneng1994 Feb 13, 2023
7ce9bb0
ignore warning
liuneng1994 Jan 30, 2023
53ae983
Merge branch 'mine-23.1' into upgrade_clickhouse
liuneng1994 Feb 13, 2023
b581eb8
window function, lead/lag/first_value/last_value (#279)
lgbo-ustc Feb 15, 2023
73d2c47
fix problems
liuneng1994 Jan 4, 2023
64dee75
Merge branch 'clickhouse_backend' into upgrade_clickhouse
liuneng1994 Feb 21, 2023
9d38a6b
fix problems
liuneng1994 Feb 21, 2023
0173703
[CH-293] Upgrade clickhouse to 23.1.3.5-stable (#292)
liuneng1994 Feb 22, 2023
36f211e
Revert "[CH-293] Upgrade clickhouse to 23.1.3.5-stable (#292)"
liuneng1994 Feb 22, 2023
6ea4b94
Merge pull request #308 from Kyligence/revert-292-upgrade_clickhouse
liuneng1994 Feb 22, 2023
387335b
Merge branch 'clickhouse_backend' into upgrade_clickhouse
liuneng1994 Feb 22, 2023
82520d0
Merge pull request #309 from liuneng1994/upgrade_clickhouse
liuneng1994 Feb 22, 2023
5928196
[304] Print jvm stacktrace when the exception is caught in C++ (#305)
lgbo-ustc Feb 22, 2023
9ceb2f2
fix write array failed (#312)
liuneng1994 Feb 23, 2023
4267c57
[CH-306]Fix read empty parquet (#307)
loneylee Feb 24, 2023
3adff0b
[CH-300] fix multi aggregation fail due to serialize partial aggregat…
shuai-xu Feb 24, 2023
7adc58c
Support repeat function (#298)
KevinyhZou Feb 24, 2023
94ac14c
[Gluten-934] enhance trim function (#287)
zheniantoushipashi Feb 24, 2023
47580c9
[CH-316] Fix memory leak when executing BHJ (#317)
zzcclp Feb 24, 2023
4942797
[CH-279] Support more window functions, lead/lag/dense_rank (#322)
lgbo-ustc Mar 1, 2023
78c527f
Fix bug when struct contains decimal types (#324)
taiyang-li Mar 2, 2023
7691735
[CH-278] Fix failed uts of lpad/rpad in https://github.com/oap-projec…
taiyang-li Mar 3, 2023
d62ca1e
[CH-294] support collect_list (#295)
taiyang-li Mar 6, 2023
9e78b30
[CH-290] support to_date/date_format/trunc/add_months/map_from_arrays…
taiyang-li Mar 6, 2023
b9f8c8f
[CH-278] part2: support xxhash64/hash (#285)
taiyang-li Mar 6, 2023
b10f642
[CH-256] fully support from_unixtime and arrayJoin (#257)
taiyang-li Mar 7, 2023
89fd53e
[VL-1004]Upgrade substrait proto by velox (#319)
loneylee Mar 9, 2023
fbe5e2b
[CH-325]Support `stddev_samp` (#350)
lgbo-ustc Mar 10, 2023
0abb12f
[CH-349]fix substrait parser to support some singular_or_list cases (…
lhuang09287750 Mar 13, 2023
4f9758d
[CH-355] print pipeline if log level is debug (#356)
taiyang-li Mar 13, 2023
8dbb72f
Fix decimal128 value error (#348)
loneylee Mar 14, 2023
ee8e63a
[CH-328] support function isNaN (#333)
exmy Mar 14, 2023
a518329
Fix failed building of uts and benchmarks (#357)
taiyang-li Mar 14, 2023
99d3415
Fix some asan complain about inside_main (#365)
taiyang-li Mar 16, 2023
09942be
[CH-369] Fix coredump caused by CHByteArrayChunkedRecordReader (#370)
taiyang-li Mar 17, 2023
ea7dc61
[CH-362] fix bad_alloc when converting spark rows to ch column (#363)
shuai-xu Mar 17, 2023
4d8f84e
[CH-359] fix libhdfs3 conf not correctly set bug (#360)
shuai-xu Mar 17, 2023
47f5b40
Add clickhouse dependency module (#371)
liuneng1994 Mar 20, 2023
405e8e3
[CH-313] Support functions position/locate (#314)
exmy Mar 21, 2023
9b896e0
[CH-353] ShuffleSplitter improvement: support multiple subdirs (#354)
exmy Mar 21, 2023
0c8ea4e
Support full join with join condition (#366)
loneylee Mar 23, 2023
ea45be8
[VL] Support Decimal type in Gluten (#378)
loneylee Mar 29, 2023
355a070
[CH-387] Fix error caused by case-sensitive matching of ORC/Parquet, …
taiyang-li Mar 29, 2023
6aad021
[CH-375] Fix create nullable column used ColumnInt8 (#376)
loneylee Mar 29, 2023
ea4a475
[CH-364] Make sure global_context is initialized once at the first in…
taiyang-li Mar 29, 2023
be666e8
Fix join key with alpha ASCII (#390)
loneylee Mar 30, 2023
eaefd7c
Fix convert empty string to null (#389)
liuneng1994 Mar 30, 2023
dcba13d
Support function space (#321)
KevinyhZou Mar 30, 2023
8e5feef
Optimize shuffle write (#315)
liuneng1994 Mar 31, 2023
f57ffab
[CH-383] Fix 'Unknown identifier' exception (#385)
exmy Mar 31, 2023
3eca6b8
[CH-326] Support partitioning with expressions (#332)
lgbo-ustc Apr 4, 2023
5d0e686
fix size of ColumnBuffer not configurable (#396)
shuai-xu Apr 6, 2023
6fc029a
[394]Print exception message (#395)
lgbo-ustc Apr 10, 2023
7d33ae4
[CH-410] Fix coredump issue caused by broadcast join (#411)
taiyang-li Apr 10, 2023
3028f6b
Fix UT of benchmark_local_engine.cpp (#400)
zhanglistar Apr 10, 2023
876a30d
[CH-402]Enable return null and complex type for get_json_object funct…
KevinyhZou Apr 10, 2023
0417ceb
[CH-397] support posexplode/sequence functions (#399)
taiyang-li Apr 12, 2023
bffd750
[405] Eliminate data skew in hash shuffle (#406)
lgbo-ustc Apr 12, 2023
2b37553
[CH-358]Support json_tuple function (#361)
KevinyhZou Apr 13, 2023
ddf2f50
[398]Fixed: `groupArray`'s result could be null (#404)
lgbo-ustc Apr 13, 2023
ea6d8dd
[CH-419] support map type for JSONExtract and fix bugs for explode/po…
taiyang-li Apr 13, 2023
4d8e7fa
[CH-436] support columns with different nullable type when split for …
shuai-xu Apr 14, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Add spark check_overflow function and cast toDecimal32/64/128 (#231)
* add spark check_overflow function and cast toDecimal32/64/128

* fix check overflow allow null
loneylee authored Dec 13, 2022
commit dc9b918244280e132ac68ec5458874e601273b43
84 changes: 83 additions & 1 deletion utils/local-engine/Parser/SerializedPlanParser.cpp
Original file line number Diff line number Diff line change
@@ -181,6 +181,35 @@ std::shared_ptr<DB::ActionsDAG> SerializedPlanParser::expressionsToActionsDAG(
return actions_dag;
}

std::string getDecimalFunction(const substrait::Type_Decimal & decimal, const bool null_on_overflow) {
std::string ch_function_name;
UInt32 precision = decimal.precision();
UInt32 scale = decimal.scale();

if (precision <= DataTypeDecimal32::maxPrecision())
{
ch_function_name = "toDecimal32";
}
else if (precision <= DataTypeDecimal64::maxPrecision())
{
ch_function_name = "toDecimal64";
}
else if (precision <= DataTypeDecimal128::maxPrecision())
{
ch_function_name = "toDecimal128";
}
else
{
throw Exception(ErrorCodes::UNKNOWN_TYPE, "Spark doesn't support decimal type with precision {}", precision);
}

if (null_on_overflow) {
ch_function_name = ch_function_name + "OrNull";
}

return ch_function_name;
}

/// TODO: This function needs to be improved for Decimal/Array/Map/Tuple types.
std::string getCastFunction(const substrait::Type & type)
{
@@ -226,6 +255,10 @@ std::string getCastFunction(const substrait::Type & type)
{
ch_function_name = "toUInt8";
}
else if (type.has_decimal())
{
ch_function_name = getDecimalFunction(type.decimal(), false);
}
else
throw Exception(ErrorCodes::UNKNOWN_TYPE, "doesn't support cast type {}", type.DebugString());

@@ -1046,6 +1079,12 @@ SerializedPlanParser::getFunctionName(const std::string & function_signature, co
else
throw Exception(ErrorCodes::BAD_ARGUMENTS, "The first arg of extract function is wrong.");
}
else if (function_name == "check_overflow")
{
if (args.size() != 2)
throw Exception(ErrorCodes::BAD_ARGUMENTS, "check_overflow function requires two args.");
ch_function_name = getDecimalFunction(output_type.decimal(), args.at(1).value().literal().boolean());
}
else
ch_function_name = SCALAR_FUNCTIONS.at(function_name);

@@ -1120,6 +1159,33 @@ const ActionsDAG::Node * SerializedPlanParser::parseFunctionWithDAG(
args.erase(args.begin());
}

if (function_signature.find("check_overflow:", 0) != function_signature.npos)
{
if (scalar_function.arguments().size() != 2)
throw Exception(ErrorCodes::BAD_ARGUMENTS, "check_overflow function requires two args.");

// if toDecimalxxOrNull, first arg need string type
if (scalar_function.arguments().at(1).value().literal().boolean())
{
std::string check_overflow_args_trans_function = "toString";
DB::ActionsDAG::NodeRawConstPtrs to_string_args({args[0]});

auto to_string_cast = FunctionFactory::instance().get(check_overflow_args_trans_function, context);
std::string to_string_cast_args_name;
join(to_string_args, ',', to_string_cast_args_name);
result_name = check_overflow_args_trans_function + "(" + to_string_cast_args_name + ")";
const auto * to_string_cast_node = &actions_dag->addFunction(to_string_cast, to_string_args, result_name);
args[0] = to_string_cast_node;
}

// delete the latest arg
args.pop_back();
auto type = std::make_shared<DataTypeUInt32>();
UInt32 scale = rel.scalar_function().output_type().decimal().scale();
args.emplace_back(
&actions_dag->addColumn(ColumnWithTypeAndName(type->createColumnConst(1, scale), type, getUniqueName(toString(scale)))));
}

auto function_builder = FunctionFactory::instance().get(function_name, context);
std::string args_name;
join(args, ',', args_name);
@@ -1130,6 +1196,15 @@ const ActionsDAG::Node * SerializedPlanParser::parseFunctionWithDAG(
{
auto cast_function = getCastFunction(rel.scalar_function().output_type());
DB::ActionsDAG::NodeRawConstPtrs cast_args({function_node});

if (cast_function.starts_with("toDecimal"))
{
auto type = std::make_shared<DataTypeUInt32>();
UInt32 scale = rel.scalar_function().output_type().decimal().scale();
cast_args.emplace_back(&actions_dag->addColumn(
ColumnWithTypeAndName(type->createColumnConst(1, scale), type, getUniqueName(toString(scale)))));
}

auto cast = FunctionFactory::instance().get(cast_function, context);
std::string cast_args_name;
join(cast_args, ',', cast_args_name);
@@ -1329,7 +1404,7 @@ const ActionsDAG::Node * SerializedPlanParser::parseArgument(ActionsDAGPtr actio
std::string ch_function_name = getCastFunction(rel.cast().type());
DB::ActionsDAG::NodeRawConstPtrs args;
auto cast_input = rel.cast().input();
if (cast_input.has_selection())
if (cast_input.has_selection() || cast_input.has_literal())
{
args.emplace_back(parseArgument(action_dag, rel.cast().input()));
}
@@ -1348,6 +1423,13 @@ const ActionsDAG::Node * SerializedPlanParser::parseArgument(ActionsDAGPtr actio
{
throw Exception(ErrorCodes::BAD_ARGUMENTS, "unsupported cast input {}", rel.cast().input().DebugString());
}

if (ch_function_name.starts_with("toDecimal"))
{
UInt32 scale = rel.cast().type().decimal().scale();
args.emplace_back(add_column(std::make_shared<DataTypeUInt32>(), scale));
}

const auto * function_node = toFunctionNode(action_dag, ch_function_name, args);
action_dag->addOrReplaceInIndex(*function_node);
return function_node;
1 change: 1 addition & 0 deletions utils/local-engine/Parser/SerializedPlanParser.h
Original file line number Diff line number Diff line change
@@ -89,6 +89,7 @@ static const std::map<std::string, std::string> SCALAR_FUNCTIONS = {
{"quarter", "toQuarter"},
{"shiftleft", "bitShiftLeft"},
{"shiftright", "bitShiftRight"},
{"check_overflow", "check_overflow"},

/// string functions
{"like", "like"},