Skip to content

Commit

Permalink
README and cleanup
Browse files Browse the repository at this point in the history
  • Loading branch information
matthewmturner committed Nov 7, 2024
1 parent d45b2a1 commit a0ea7f8
Show file tree
Hide file tree
Showing 3 changed files with 9 additions and 32 deletions.
10 changes: 9 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -98,13 +98,21 @@ Both of the commands above support the `--flightsql` parameter to run the SQL wi

The CLI can also run your configured DDL prior to executing the query by adding the `--run-ddl` parameter.

#### Benchmarking
#### Benchmarking Queries
You can benchmark queries by adding the `--bench` parameter. This will run the query a configurable number of times and output a breakdown of the queries execution time with summary statistics for each component of the query (logical planning, physical planning, execution time, and total time).

Optionally you can use the `--run-before` param to run a query before the benchmark is run. This is useful in cases where you want to hit a temp table or write a file to disk that your benchmark query will use.

To save benchmark results to a file use the `--save` parameter with a file path. Further, you can use the `--append` parameter to append to the file instead of overwriting it.

#### Analyze Queries

The output from `EXPLAIN ANALYZE` provides a wealth of information on a queries execution - however, the amount of information and connecting the dots can be difficult and manual. Further, there is detail in the `MetricSet`'s of the underlying `ExecutionPlan`'s that is lost in the output.

To help with this the `--analyze` flag can used to generate a summary of the underlying `ExecutionPlan` `MetricSet`s. The summary presents the information in a way that is hopefully easier to understand and easier to draw conclusions on a query's performance.

This feature is still in it's early stages and is expected to evolve. Once it has gone through enough real world testing and it has been confirmed the metrics make sense documentation will be added on the exact calculations - until then the source will need to be inspected to see the calculations.

## `dft` FlightSQL Server

The `dft` FlightSQL server (feature flag `experimental-flightsql-server`) is a Flight service that can be used to execute SQL queries against DataFusion. The server is started by running `dft --serve` and can optionally run your configured DDL with the `--run-ddl` parameter.
Expand Down
16 changes: 0 additions & 16 deletions src/execution/local_analyze.rs

This file was deleted.

15 changes: 0 additions & 15 deletions src/execution/stats.rs
Original file line number Diff line number Diff line change
Expand Up @@ -93,17 +93,6 @@ impl ExecutionStats {
}
}

/// A ratio of the selectivity of the query to the effectivness of pruning the parquet file.
///
/// V1: Simply look at row groups pruned by statistics and the row selectivity
/// TODO: Incorporate bloom filter and page index pruning
///
/// Example calculations:
///
/// 1. No pruning and select all rows
/// - row groups pruned: 0
/// - row groups matched: 10
/// - row selectivity: 1.0
pub fn selectivity_efficiency(&self) -> f64 {
if let Some(io) = &self.io {
io.parquet_rg_pruned_stats_ratio() / self.rows_selectivity()
Expand Down Expand Up @@ -496,10 +485,6 @@ impl std::fmt::Display for ExecutionComputeStats {
.unwrap_or("None".to_string()),
)?;
writeln!(f)?;
writeln!(
f,
"=========================================================================================="
)?;
writeln!(f)?;
self.display_compute(f, &self.projection_compute, "Projection")?;
writeln!(f)?;
Expand Down

0 comments on commit a0ea7f8

Please sign in to comment.