Skip to content

Latest commit

 

History

History
69 lines (43 loc) · 3.63 KB

README.md

File metadata and controls

69 lines (43 loc) · 3.63 KB

Façade-based Data Access Benchmark

This folder provides a benchmark derived from GTFS-Madrid-Bench for evaluating Façade-based Data Access (FBDA) engines, such as SPARQL Anything.

The extension consists of:

  • a set of query templates that translate the GTFS-Madrid-Bench's queries and RML mappings into FBDA queries;
  • a query executor which fires the queries and measures the performance of the FBDA engines under four experimental regimes:
    • In-memory execution over a complete materialised view (in-memory+complete);
    • In-memory execution optimised by a triple-filtering approach (in-memory+triple-filtering);
    • In-memory execution over a sliced materialised view and optimised by triple-filtering (sliced+triple-filtering);
    • On-disk execution optimised by triple-filtering (on-disk+triple-filtering).

More details can be found in this article.

Requirements for the use

To have locally installed Java 11 (or later versions).

Using FBDA Benchmark

  1. Generate data using GTFS-Madrid-Bench and move the result folder generated by GTFS within experiments folder. At the moment only csv, json and xml formats are allowed.

  2. Generate FBDA queries for the scales passed to GTFS-Madrid-Bench (e.g. 1, 10, 100)

./generate_queries.sh "1 10 100" "TMP_FOLDER" "xml csv json"

where:

  • TMP_FOLDER is the path to a temporary folder that will be used during the experiments
  • "xml csv json" are the formats passed to GTFS-Madrid-Bench
  1. Download the executable jar file of the FBDA engine to evaluate (e.g. SPARQL Anything v0.9.0)

  2. Run the the queries

./execute_queries.sh /path/to/fbda_engine.jar "1 10 100" "xml csv json" "/path/to/results" "TMP_FOLDER"

where:

  • "1 10 100" are the scales passed to GTFS-Madrid-Bench
  • "xml csv json" are the formats passed to GTFS-Madrid-Bench
  • "/path/to/results" is the path to a folder where the results of the execution of the queries (i.e. measures) will be stored
  • TMP_FOLDER is the path to a temporary folder that will be used during the experiments

Analysing the results

The execution of the queries generates two TSV files for each query executed on a given format, namely time_q<query_id>_<format>.tsv and mem_q<query_id>_<format>.tsv. These files trace the execution of the queries in terms of computational resources used by the engine (i.e. memory footprint, CPU and time).

The files are stored in the directory /path/to/results passed as argument of execute_queries.sh.

The time_q<query_id>_<format>.tsv file keeps track of the execution time of the queries on a experimenting format. The table has the following structure:

Query InputSize Strategy Slice Ondisk MemoryLimit Run Time Unit Status STDErr

The mem_q<query_id>_<format>.tsv file keeps track of the usage by the engine of the CPU and memory during the evaluation of the queries. The table has the following structure:

Query InputSize Strategy Slice Ondisk MemoryLimit Run PID %cpu %mem vsz rss