Minor: add "clickbench extended" queries to slt tests #11763

alamb · 2024-08-01T13:30:16Z

Which issue does this PR close?

Related to #11723

Rationale for this change

While working to enable StringView by default I hit an error running the ClickBench "extended" queries:

 Finished `release` profile [optimized] target(s) in 0.77s
     Running `/home/alamb/arrow-datafusion2/target/release/dfbench clickbench --iterations 5 --path /home/alamb/arrow-datafusion/benchmarks/data/hits.par\
quet --queries-path /home/alamb/arrow-datafusion/benchmarks/queries/clickbench/extended.sql -o /home/alamb/arrow-datafusion/benchmarks/results/alamb_stri\
ng_view_default/clickbench_extended.json`
Running benchmarks with the following options: RunOpt { query: None, common: CommonOpt { iterations: 5, partitions: None, batch_size: 8192, debug: false \
}, path: "/home/alamb/arrow-datafusion/benchmarks/data/hits.parquet", queries_path: "/home/alamb/arrow-datafusion/benchmarks/queries/clickbench/extended.\
sql", output_path: Some("/home/alamb/arrow-datafusion/benchmarks/results/alamb_string_view_default/clickbench_extended.json") }
Q0: SELECT COUNT(DISTINCT "SearchPhrase"), COUNT(DISTINCT "MobilePhone"), COUNT(DISTINCT "MobilePhoneModel") FROM hits;
thread 'tokio-runtime-worker' panicked at datafusion/physical-expr-common/src/binary_view_map.rs:220:18:
internal error: entered unreachable code: Utf8/Binary should use `ArrowBytesSet`

What changes are included in this PR?

adds unit test coverage for those queries to slt benchmarks
Drive by typo fix in README

Are these changes tested?

All tests

Are there any user-facing changes?

comphead · 2024-08-01T15:54:00Z

datafusion/sqllogictest/test_files/clickbench.slt

+----
+1 1 1
+
+query TIIII


just to verify is it okay, to benchmark a data that groups everything into a single bucket?

I think the idea of this test is to ensure that the queries run without errors, rather than verify the results

datafusion/datafusion/sqllogictest/test_files/clickbench.slt

Lines 19 to 26 in 6e2ff29

# This file contains the clickbench schema and queries

# and the first 10 rows of data. Since ClickBench contains case sensitive queries

# this is also a good test of that usecase too

# create.sql came from

# https://github.com/ClickHouse/ClickBench/blob/8b9e3aa05ea18afa427f14909ddc678b8ef0d5e6/datafusion/create.sql

# Data file made with DuckDB:

# COPY (SELECT * FROM 'hits.parquet' LIMIT 10) TO 'clickbench_hits_10.parquet' (FORMAT PARQUET);

The actual input is a 13GB parquet file so I don't think it is feasible to verify the results as part of CI.

comphead

lgtm

alamb · 2024-08-01T16:36:16Z

Thanks for the review @comphead

Minor: add "clickbench extended" queries to unit tests

c9e49e6

github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Aug 1, 2024

alamb marked this pull request as ready for review August 1, 2024 13:35

comphead reviewed Aug 1, 2024

View reviewed changes

comphead approved these changes Aug 1, 2024

View reviewed changes

alamb merged commit 45b40c7 into apache:main Aug 1, 2024
24 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Minor: add "clickbench extended" queries to slt tests #11763

Minor: add "clickbench extended" queries to slt tests #11763

alamb commented Aug 1, 2024

comphead Aug 1, 2024

alamb Aug 1, 2024

comphead left a comment

alamb commented Aug 1, 2024

	# This file contains the clickbench schema and queries
	# and the first 10 rows of data. Since ClickBench contains case sensitive queries
	# this is also a good test of that usecase too

	# create.sql came from
	# https://github.com/ClickHouse/ClickBench/blob/8b9e3aa05ea18afa427f14909ddc678b8ef0d5e6/datafusion/create.sql
	# Data file made with DuckDB:
	# COPY (SELECT * FROM 'hits.parquet' LIMIT 10) TO 'clickbench_hits_10.parquet' (FORMAT PARQUET);

Minor: add "clickbench extended" queries to slt tests #11763

Minor: add "clickbench extended" queries to slt tests #11763

Conversation

alamb commented Aug 1, 2024

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

comphead Aug 1, 2024

Choose a reason for hiding this comment

alamb Aug 1, 2024

Choose a reason for hiding this comment

comphead left a comment

Choose a reason for hiding this comment

alamb commented Aug 1, 2024