Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CH] Performance regression when reading files from Hive(HDFS) #7177

Closed
zhanglistar opened this issue Sep 10, 2024 · 1 comment · Fixed by #7187
Closed

[CH] Performance regression when reading files from Hive(HDFS) #7177

zhanglistar opened this issue Sep 10, 2024 · 1 comment · Fixed by #7187
Labels
bug Something isn't working triage

Comments

@zhanglistar
Copy link
Contributor

zhanglistar commented Sep 10, 2024

Backend

CH (ClickHouse)

Bug description

SQL: select count(distinct country) from ttt where day = '2024-08-26' and hour = '00'

Related PR: #6841

371be6f ok
image
image

371d448 NOT OK
image
image

vanila:
image

About 4.6x slower than vanila 332, and 18x slower than last commit.

Spark version

Spark-3.3.x

Spark configurations

No response

System information

No response

Relevant logs

No response

@zhanglistar zhanglistar added bug Something isn't working triage labels Sep 10, 2024
@loneylee
Copy link
Member

Thanks for finding the problem, I will fix it as soon as possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants