README and cleanup

datafusion-contrib · Nov 7, 2024 · a0ea7f8 · a0ea7f8
1 parent d45b2a1
commit a0ea7f8
Show file tree

Hide file tree

Showing 3 changed files with 9 additions and 32 deletions.
diff --git a/README.md b/README.md
@@ -98,13 +98,21 @@ Both of the commands above support the `--flightsql` parameter to run the SQL wi
 
 The CLI can also run your configured DDL prior to executing the query by adding the `--run-ddl` parameter.
 
-#### Benchmarking
+#### Benchmarking Queries
 You can benchmark queries by adding the `--bench` parameter.  This will run the query a configurable number of times and output a breakdown of the queries execution time with summary statistics for each component of the query (logical planning, physical planning, execution time, and total time).
 
 Optionally you can use the `--run-before` param to run a query before the benchmark is run.  This is useful in cases where you want to hit a temp table or write a file to disk that your benchmark query will use.
 
 To save benchmark results to a file use the `--save` parameter with a file path.  Further, you can use the `--append` parameter to append to the file instead of overwriting it.
 
+#### Analyze Queries
+
+The output from `EXPLAIN ANALYZE` provides a wealth of information on a queries execution - however, the amount of information and connecting the dots can be difficult and manual.  Further, there is detail in the `MetricSet`'s of the underlying `ExecutionPlan`'s that is lost in the output.
+
+To help with this the `--analyze` flag can used to generate a summary of the underlying `ExecutionPlan` `MetricSet`s.  The summary presents the information in a way that is hopefully easier to understand and easier to draw conclusions on a query's performance.
+
+This feature is still in it's early stages and is expected to evolve.  Once it has gone through enough real world testing and it has been confirmed the metrics make sense documentation will be added on the exact calculations - until then the source will need to be inspected to see the calculations.
+
 ## `dft` FlightSQL Server
 
 The `dft` FlightSQL server (feature flag `experimental-flightsql-server`) is a Flight service that can be used to execute SQL queries against DataFusion.  The server is started by running `dft --serve` and can optionally run your configured DDL with the `--run-ddl` parameter.

diff --git a/src/execution/local_analyze.rs b/src/execution/local_analyze.rs
diff --git a/src/execution/stats.rs b/src/execution/stats.rs
@@ -93,17 +93,6 @@ impl ExecutionStats {
         }
     }
 
-    /// A ratio of the selectivity of the query to the effectivness of pruning the parquet file.
-    ///
-    /// V1: Simply look at row groups pruned by statistics and the row selectivity
-    /// TODO: Incorporate bloom filter and page index pruning
-    ///
-    /// Example calculations:
-    ///
-    /// 1. No pruning and select all rows
-    ///    - row groups pruned: 0
-    ///    - row groups matched: 10
-    ///    - row selectivity: 1.0
     pub fn selectivity_efficiency(&self) -> f64 {
         if let Some(io) = &self.io {
             io.parquet_rg_pruned_stats_ratio() / self.rows_selectivity()
@@ -496,10 +485,6 @@ impl std::fmt::Display for ExecutionComputeStats {
                 .unwrap_or("None".to_string()),
         )?;
         writeln!(f)?;
-        writeln!(
-            f,
-            "=========================================================================================="
-        )?;
         writeln!(f)?;
         self.display_compute(f, &self.projection_compute, "Projection")?;
         writeln!(f)?;