This folder provides a benchmark derived from GTFS-Madrid-Bench for evaluating Façade-based Data Access (FBDA) engines, such as SPARQL Anything.
The extension consists of:
- a set of query templates that translate the GTFS-Madrid-Bench's queries and RML mappings into FBDA queries;
- a query executor which fires the queries and measures the performance of the FBDA engines under four experimental regimes:
- In-memory execution over a complete materialised view (in-memory+complete);
- In-memory execution optimised by a triple-filtering approach (in-memory+triple-filtering);
- In-memory execution over a sliced materialised view and optimised by triple-filtering (sliced+triple-filtering);
- On-disk execution optimised by triple-filtering (on-disk+triple-filtering).
More details can be found in this article.
To have locally installed Java 11 (or later versions).
-
Generate data using GTFS-Madrid-Bench and move the result folder generated by GTFS within experiments folder. At the moment only csv, json and xml formats are allowed.
-
Generate FBDA queries for the scales passed to GTFS-Madrid-Bench (e.g. 1, 10, 100)
./generate_queries.sh "1 10 100" "TMP_FOLDER" "xml csv json"
where:
TMP_FOLDER
is the path to a temporary folder that will be used during the experiments- "xml csv json" are the formats passed to GTFS-Madrid-Bench
-
Download the executable jar file of the FBDA engine to evaluate (e.g. SPARQL Anything v0.9.0)
-
Run the the queries
./execute_queries.sh /path/to/fbda_engine.jar "1 10 100" "xml csv json" "/path/to/results" "TMP_FOLDER"
where:
- "1 10 100" are the scales passed to GTFS-Madrid-Bench
- "xml csv json" are the formats passed to GTFS-Madrid-Bench
- "/path/to/results" is the path to a folder where the results of the execution of the queries (i.e. measures) will be stored
TMP_FOLDER
is the path to a temporary folder that will be used during the experiments
The execution of the queries generates two TSV files for each query executed on a given format, namely time_q<query_id>_<format>.tsv
and mem_q<query_id>_<format>.tsv
.
These files trace the execution of the queries in terms of computational resources used by the engine (i.e. memory footprint, CPU and time).
The files are stored in the directory /path/to/results
passed as argument of execute_queries.sh
.
The time_q<query_id>_<format>.tsv
file keeps track of the execution time of the queries on a experimenting format. The table has the following structure:
Query | InputSize | Strategy | Slice | Ondisk | MemoryLimit | Run | Time | Unit | Status | STDErr |
---|---|---|---|---|---|---|---|---|---|---|
The mem_q<query_id>_<format>.tsv
file keeps track of the usage by the engine of the CPU and memory during the evaluation of the queries. The table has the following structure:
Query | InputSize | Strategy | Slice | Ondisk | MemoryLimit | Run | PID | %cpu | %mem | vsz | rss |
---|---|---|---|---|---|---|---|---|---|---|---|