Skip to content

Commit

Permalink
Add the stream-compression task
Browse files Browse the repository at this point in the history
  • Loading branch information
Ostrzyciel committed May 10, 2024
1 parent 91c4ec9 commit 777197d
Show file tree
Hide file tree
Showing 5 changed files with 70 additions and 3 deletions.
33 changes: 33 additions & 0 deletions tasks/stream-compression/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
A benchmark task measuring the compression efficiency of serializations for grouped RDF streams.

## Methodology

### Data

Stream distributions of any dataset in the [`stream` category](../../categories/stream/index.md) of RiverBench may be used for this task.

### Workload

The task consists of serializing RDF data in a grouped form (that is, as a stream of RDF graphs or RDF datasets) to bytes and measuring the size of the obtained representation.

In this task, the time taken to serialize and deserialize the data is not considered – see the [`stream-serialization-throughput`](../stream-serialization-throughput/index.md) and [`stream-deserialization-throughput`](../stream-deserialization-throughput/index.md) tasks for that aspect.

### Metrics

- The primary metric is the serialized representation size of the RDF data, in bytes.
- Additionally, the compression ratio can be calculated as the ratio of the reference size to the compressed size. The reference size is the size of the same data serialized using a baseline method, e.g., the N-Triples serialization format.
- In the RDF literature, the "compression ratio" is often defined as the inverse of the above definition and expressed as a percentage. For example, a compression ratio of (50%) means that the compressed data is half the size of the reference data.

## Results

There are no results with RiverBench available for this task yet.

## Examples and references

- In the paper about the Jelly streaming protocol, such a benchmark is performed in Section IV.C. The authors have measured the output size of several methods. The presented "Compression ratio" metric there refers to the ratio between the compressed data size and the reference data size, with N-Triples used as the reference.
- Sowiński, P., Wasielewska-Michniewska, K., Ganzha, M., & Paprzycki, M. (2022, October). Efficient RDF streaming for the edge-cloud continuum. In 2022 IEEE 8th World Forum on Internet of Things (WF-IoT) (pp. 1-8). IEEE.
- https://doi.org/10.1109/WF-IoT54382.2022.10152225

## See also

- Version of this task for flat RDF streams: [`flat-compression`](../flat-compression/index.md)
22 changes: 22 additions & 0 deletions tasks/stream-compression/metadata.ttl
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
@prefix : <https://w3id.org/riverbench/temp#> .
@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix rb: <https://w3id.org/riverbench/schema/metadata#> .
@prefix rbdoc: <https://w3id.org/riverbench/schema/documentation#> .

:task
# General information
a rb:Task ;
dcterms:conformsTo <https://w3id.org/riverbench/schema/metadata> ;
dcterms:identifier "stream-compression" ;
dcterms:title "Grouped RDF stream compression"@en ;
dcterms:description "A benchmark task measuring the compression efficiency of serializations for grouped RDF streams."@en ;

# Authors
dcterms:creator [
foaf:name "Piotr Sowiński" ;
foaf:nick "Ostrzyciel" ;
foaf:homepage <https://github.com/Ostrzyciel>, <https://orcid.org/0000-0002-2543-9461> ;
rbdoc:hasDocWeight 1 ;
]
.
8 changes: 7 additions & 1 deletion tasks/stream-deserialization-throughput/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ A benchmark task measuring the througput of deserializing a grouped RDF stream (

### Data

Stream distributions of any dataset in the `stream` category of RiverBench may be used for this task.
Stream distributions of any dataset in the [`stream` category](../../categories/stream/index.md) of RiverBench may be used for this task.

### Workload

Expand All @@ -29,3 +29,9 @@ There are no results with RiverBench available for this task yet.
- In the paper about the Jelly streaming protocol, such a benchmark is performed in Section IV.B. The corresponding task in the paper is named "Raw deserialization throughput" and the performance in measured in terms of the number of triples deserialized per second.
- Sowiński, P., Wasielewska-Michniewska, K., Ganzha, M., & Paprzycki, M. (2022, October). Efficient RDF streaming for the edge-cloud continuum. In 2022 IEEE 8th World Forum on Internet of Things (WF-IoT) (pp. 1-8). IEEE.
- https://doi.org/10.1109/WF-IoT54382.2022.10152225


## See also

- Version of this task for flat RDF streams: [`flat-deserialization-throughput`](../flat-deserialization-throughput/index.md)
- The inverse task: [`stream-serialization-throughput`](../stream-serialization-throughput/index.md)
8 changes: 7 additions & 1 deletion tasks/stream-serialization-throughput/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ A benchmark task measuring the througput of serializing a grouped RDF stream (th

### Data

Stream distributions of any dataset in the `stream` category of RiverBench may be used for this task.
Stream distributions of any dataset in the [`stream` category](../../categories/stream/index.md) of RiverBench may be used for this task.

### Workload

Expand All @@ -30,3 +30,9 @@ There are no results with RiverBench available for this task yet.
- In the paper about the Jelly streaming protocol, such a benchmark is performed in Section IV.B. The corresponding task in the paper is named "Raw serialization throughput" and the performance in measured in terms of the number of triples serialized per second.
- Sowiński, P., Wasielewska-Michniewska, K., Ganzha, M., & Paprzycki, M. (2022, October). Efficient RDF streaming for the edge-cloud continuum. In 2022 IEEE 8th World Forum on Internet of Things (WF-IoT) (pp. 1-8). IEEE.
- https://doi.org/10.1109/WF-IoT54382.2022.10152225


## See also

- Version of this task for flat RDF streams: [`flat-serialization-throughput`](../flat-serialization-throughput/index.md)
- The inverse task: [`stream-deserialization-throughput`](../stream-deserialization-throughput/index.md)
2 changes: 1 addition & 1 deletion tasks/stream-serialization-throughput/metadata.ttl
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
a rb:Task ;
dcterms:conformsTo <https://w3id.org/riverbench/schema/metadata> ;
dcterms:identifier "stream-serialization-throughput" ;
dcterms:title "Grouped streaming serialization throughput"@en ;
dcterms:title "Grouped RDF stream serialization throughput"@en ;
dcterms:description "A benchmark task measuring the througput of serializing a grouped RDF stream (that is, a stream in which the elements are either RDF graphs or RDF datasets)."@en ;

# Authors
Expand Down

0 comments on commit 777197d

Please sign in to comment.