Skip to content

Commit

Permalink
Include IMDB in benchmark README (#13107)
Browse files Browse the repository at this point in the history
2010YOUY01 authored Oct 25, 2024
1 parent 813220d commit bdcf822
Showing 1 changed file with 10 additions and 0 deletions.
10 changes: 10 additions & 0 deletions benchmarks/README.md
Original file line number Diff line number Diff line change
@@ -330,6 +330,16 @@ steps.
The tests sort the entire dataset using several different sort
orders.

## IMDB

Run Join Order Benchmark (JOB) on IMDB dataset.

The Internet Movie Database (IMDB) dataset contains real-world movie data. Unlike synthetic datasets like TPCH, which assume uniform data distribution and uncorrelated columns, the IMDB dataset includes skewed data and correlated columns (which are common for real dataset), making it more suitable for testing query optimizers, particularly for cardinality estimation.

This benchmark is derived from [Join Order Benchmark](https://github.com/gregrahn/join-order-benchmark).

See paper [How Good Are Query Optimizers, Really](http://www.vldb.org/pvldb/vol9/p204-leis.pdf) for more details.

## TPCH

Run the tpch benchmark.

0 comments on commit bdcf822

Please sign in to comment.