This benchmark is designed to measure the performance of various search engines for logs and traces use cases and more generally for append-only semi-structured data.
The benchmark makes use of two datatsets:
- A 1TB dataset sampled from the GitHub Archive dataset.
- A 1TB log datasets generated with the https://github.com/elastic/elastic-integration-corpus-generator-tool
We plan to add a trace dataset soon.
The supported engines are:
- Quickwit
- Elasticsearch
- Loki (only for generated logs)
- Make to ease the running of the benchmark.
- Docker to run the benchmarked engines, including the Python API.
- Python3 to download the dataset and run queries against the benchmarked engines.
- Rust and
openssl-devel
to build the ingestion toolqbench
. - gcloud to download datasets.
- Various python packages installed with
pip install -r requirements.txt
cd qbench
cargo build --release
For the generated logs dataset:
mkdir -p datasets
gcloud storage cp "gs://quickwit-datasets-public/benchmarks/generated-logs/generated-logs-v1-????.ndjson.gz" datasets/
Go to desired engines subdirs engines/<engine_name>
and run make start
.
python3 run.py --engine quickwit --storage SSD --track generated-logs --instance m1 --tags my-bench-run --indexing-only
By default this will export results to the benchmark
service accessible at this
address.
The first time this runs, you will be re-directed to a web page where
you should login with you Google account and pass back a token to run.py (just follow the
instructions the tool prints).
Exporting to the benchmark service can be disabled by passing the flag --export-to-endpoint ""
After indexing (and if exporting to the service was not disabled), the tool will print a URL to access results, e.g.: https://qw-benchmarks.104.155.161.122.nip.io/?run_ids=678
Results will also be saved to a results/{track}.{engine}.{tag}.{instance}/indexing-results.json
file.
{
"doc_per_second": 8752.761519421289,
"engine": "quickwit",
"index": "generated-logs",
"indexing_duration_secs": 1603.68884367,
"mb_bytes_per_second": 22.77175235654048,
"num_indexed_bytes": 18840178633,
"num_indexed_docs": 14036706,
"num_ingested_bytes": 36518805205,
"num_ingested_docs": 14036706,
"num_splits": 12
}
python3 run.py --engine quickwit --storage SSD --track generated-logs --instance m1 --tags my-bench-run --search-only
The results will also be exported to the service and saved to a results/{track}.{engine}.{tag}.{instance}/search-results.json
file.
{
"engine": "quickwit",
"index": "generated-logs",
"queries": [
{
"id": 0,
"query": {
"query": "payload.description:the",
"sort_by_field": "-created_at",
"max_hits": 10
},
"tags": [
"search"
],
"count": 138290,
"duration": [
8843,
9131,
9614
],
"engine_duration": [
7040,
7173,
7508
]
}
]
}
Use the Benchmark Service web page.
The default page allows selecting and comparing runs: example.
Runs are identified by a numerical ID and are automatically named
<engine>.<storage>.<instance>.<short_commit_hash>.<tag>
.
For now, names are allowed to collide, i.e. a given name can refer to
multiple runs. In that case, selecting a name in the list of runs to
compare will show the most recent indexing run with that name, and the
most recent search run with that name.
Tips:
- The URL of the page is a permanent link to the runs shown. This is convenient way to share results.
- Clicking on the run name in the comparison shows the raw run results with additional information.
- It's fine if a run only has indexing or search results.
- The full list of runs is loaded when the web page is loaded, so you may need to reload it to see your latest runs.
The graphs
page
allows plotting graphs of indexing and search run results over time
(example). Only
runs with source
continuous_benchmarking
or github_workflow
are
shown there. Runs are identified by a string
<engine>.<storage>.<instance>.<tag>
(note the absence of commit
hash) which refers to a series of indexing and search runs over time.
Tip:
- The URL of the page is a permanent link to the series of runs shown. Later visits can contain additional data points.
- Clicking on a point in any graph opens the comparison page between the run that contributed the point to the run that contributed the previous point.
See here for running the benchmark service.
Details of the comparison can be found here.