Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minor: add "clickbench extended" queries to slt tests #11763

Merged
merged 1 commit into from
Aug 1, 2024

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Aug 1, 2024

Which issue does this PR close?

Related to #11723

Rationale for this change

While working to enable StringView by default I hit an error running the ClickBench "extended" queries:

 Finished `release` profile [optimized] target(s) in 0.77s
     Running `/home/alamb/arrow-datafusion2/target/release/dfbench clickbench --iterations 5 --path /home/alamb/arrow-datafusion/benchmarks/data/hits.par\
quet --queries-path /home/alamb/arrow-datafusion/benchmarks/queries/clickbench/extended.sql -o /home/alamb/arrow-datafusion/benchmarks/results/alamb_stri\
ng_view_default/clickbench_extended.json`
Running benchmarks with the following options: RunOpt { query: None, common: CommonOpt { iterations: 5, partitions: None, batch_size: 8192, debug: false \
}, path: "/home/alamb/arrow-datafusion/benchmarks/data/hits.parquet", queries_path: "/home/alamb/arrow-datafusion/benchmarks/queries/clickbench/extended.\
sql", output_path: Some("/home/alamb/arrow-datafusion/benchmarks/results/alamb_string_view_default/clickbench_extended.json") }
Q0: SELECT COUNT(DISTINCT "SearchPhrase"), COUNT(DISTINCT "MobilePhone"), COUNT(DISTINCT "MobilePhoneModel") FROM hits;
thread 'tokio-runtime-worker' panicked at datafusion/physical-expr-common/src/binary_view_map.rs:220:18:
internal error: entered unreachable code: Utf8/Binary should use `ArrowBytesSet`

What changes are included in this PR?

  1. adds unit test coverage for those queries to slt benchmarks
  2. Drive by typo fix in README

Are these changes tested?

All tests

Are there any user-facing changes?

@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Aug 1, 2024
@alamb alamb marked this pull request as ready for review August 1, 2024 13:35
----
1 1 1

query TIIII
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just to verify is it okay, to benchmark a data that groups everything into a single bucket?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the idea of this test is to ensure that the queries run without errors, rather than verify the results

# This file contains the clickbench schema and queries
# and the first 10 rows of data. Since ClickBench contains case sensitive queries
# this is also a good test of that usecase too
# create.sql came from
# https://github.com/ClickHouse/ClickBench/blob/8b9e3aa05ea18afa427f14909ddc678b8ef0d5e6/datafusion/create.sql
# Data file made with DuckDB:
# COPY (SELECT * FROM 'hits.parquet' LIMIT 10) TO 'clickbench_hits_10.parquet' (FORMAT PARQUET);

The actual input is a 13GB parquet file so I don't think it is feasible to verify the results as part of CI.

Copy link
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@alamb
Copy link
Contributor Author

alamb commented Aug 1, 2024

Thanks for the review @comphead

@alamb alamb merged commit 45b40c7 into apache:main Aug 1, 2024
24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants