Skip to content

Commit

Permalink
Client-side chunks 1: introduce Chunk and its suffle/sort routines (#…
Browse files Browse the repository at this point in the history
…6438)

Introduces the new `re_chunk` crate:
> A chunk of Rerun data, encoded using Arrow. Used for logging,
transport, storage and compute.

Specifically, it introduces the `Chunk` type itself, and all methods and
helpers related to sorting.
A `Chunk` is self-describing: it contains all the data _and_ metadata
needed to index it into storage.

There are a lot of things that need to be sorted within a `Chunk`, and
as such we must make sure to keep track of what is or isn't sorted at
all times, to avoid needlessly re-sorting things everytime a chunk
changes hands.
This necessitates a bunch of sanity checking all over the place to make
sure we never end up in undefined states.

`Chunk` is not about transport, it's about providing a nice-to-work with
representation when manipulating a chunk in memory.
Transporting a `Chunk` happens in the next PR.

- Fixes #1981

---

Part of a PR series to implement our new chunk-based data model on the
client-side (SDKs):
- #6437
- #6438
- #6439
- #6440
- #6441
  • Loading branch information
teh-cmc authored May 31, 2024
1 parent defefae commit 6d94947
Show file tree
Hide file tree
Showing 12 changed files with 1,017 additions and 9 deletions.
13 changes: 7 additions & 6 deletions ARCHITECTURE.md
Original file line number Diff line number Diff line change
Expand Up @@ -168,12 +168,13 @@ Update instructions:

### Low-level store

| Crate | Description |
|-----------------|-----------------------------------------------------------------------------|
| re_data_store | An in-memory time series database for Rerun log data, based on Apache Arrow |
| re_log_types | The basic building blocks of the Rerun data types and tables. |
| re_types_core | The core traits and types that power Rerun's data model. |
| re_format_arrow | Formatting of Apache Arrow tables |
| Crate | Description |
|-----------------|-----------------------------------------------------------------------------------------------|
| re_chunk | A chunk of Rerun data, encoded using Arrow. Used for logging, transport, storage and compute. |
| re_data_store | An in-memory time series database for Rerun log data, based on Apache Arrow. |
| re_log_types | The basic building blocks of the Rerun data types and tables. |
| re_types_core | The core traits and types that power Rerun's data model. |
| re_format_arrow | Formatting of Apache Arrow tables. |


### Data flow
Expand Down
30 changes: 30 additions & 0 deletions Cargo.lock
Original file line number Diff line number Diff line change
Expand Up @@ -4273,6 +4273,36 @@ dependencies = [
"walkdir",
]

[[package]]
name = "re_chunk"
version = "0.17.0-alpha.3"
dependencies = [
"ahash",
"anyhow",
"backtrace",
"criterion",
"crossbeam",
"document-features",
"itertools 0.13.0",
"mimalloc",
"nohash-hasher",
"rand",
"re_arrow2",
"re_build_info",
"re_format",
"re_format_arrow",
"re_log",
"re_log_types",
"re_string_interner",
"re_tracing",
"re_tuid",
"re_types_core",
"similar-asserts",
"smallvec",
"static_assertions",
"thiserror",
]

[[package]]
name = "re_context_menu"
version = "0.17.0-alpha.3"
Expand Down
1 change: 1 addition & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ re_analytics = { path = "crates/re_analytics", version = "=0.17.0-alpha.3", defa
re_blueprint_tree = { path = "crates/re_blueprint_tree", version = "=0.17.0-alpha.3", default-features = false }
re_build_info = { path = "crates/re_build_info", version = "=0.17.0-alpha.3", default-features = false }
re_build_tools = { path = "crates/re_build_tools", version = "=0.17.0-alpha.3", default-features = false }
re_chunk = { path = "crates/re_chunk", version = "=0.17.0-alpha.3", default-features = false }
re_context_menu = { path = "crates/re_context_menu", version = "=0.17.0-alpha.3", default-features = false }
re_crash_handler = { path = "crates/re_crash_handler", version = "=0.17.0-alpha.3", default-features = false }
re_data_loader = { path = "crates/re_data_loader", version = "=0.17.0-alpha.3", default-features = false }
Expand Down
66 changes: 66 additions & 0 deletions crates/re_chunk/Cargo.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
[package]
name = "re_chunk"
authors.workspace = true
description = "A chunk of Rerun data, encoded using Arrow. Used for logging, transport, storage and compute."
edition.workspace = true
homepage.workspace = true
include.workspace = true
license.workspace = true
publish = true
readme = "README.md"
repository.workspace = true
rust-version.workspace = true
version.workspace = true

[lints]
workspace = true

[package.metadata.docs.rs]
all-features = true


[features]
default = []


[dependencies]

# Rerun
re_build_info.workspace = true
re_format.workspace = true
re_format_arrow.workspace = true
re_log.workspace = true
re_log_types.workspace = true
re_string_interner.workspace = true
re_tracing.workspace = true
re_tuid.workspace = true
re_types_core.workspace = true

# External
ahash.workspace = true
anyhow.workspace = true
arrow2 = { workspace = true, features = [
"io_ipc",
"io_print",
"compute_comparison",
"compute_concatenate",
] }
backtrace.workspace = true
document-features.workspace = true
itertools.workspace = true
nohash-hasher.workspace = true
rand = { workspace = true, features = ["std_rng"] }
similar-asserts.workspace = true
smallvec.workspace = true
static_assertions.workspace = true
thiserror.workspace = true

# Native dependencies:
[target.'cfg(not(target_arch = "wasm32"))'.dependencies]
crossbeam.workspace = true


[dev-dependencies]
criterion.workspace = true
mimalloc.workspace = true
similar-asserts.workspace = true
10 changes: 10 additions & 0 deletions crates/re_chunk/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# re_chunk

Part of the [`rerun`](https://github.com/rerun-io/rerun) family of crates.

[![Latest version](https://img.shields.io/crates/v/re_chunk.svg)](https://crates.io/crates/re_chunk?speculative-link)
[![Documentation](https://docs.rs/re_chunk/badge.svg)](https://docs.rs/re_chunk?speculative-link)
![MIT](https://img.shields.io/badge/license-MIT-blue.svg)
![Apache](https://img.shields.io/badge/license-Apache-blue.svg)

A chunk of Rerun data, encoded using Arrow. Used for logging, transport, storage and compute.
Loading

0 comments on commit 6d94947

Please sign in to comment.