Skip to content

Commit

Permalink
Client-side chunks 2: introduce TransportChunk (#6439)
Browse files Browse the repository at this point in the history
A `TransportChunk` is a `Chunk` that is ready for transport and/or
storage.
It is very cheap to go from `Chunk` to a `TransportChunk` and
vice-versa.

A `TransportChunk` maps 1:1 to a native Arrow `RecordBatch`. It has a
stable ABI, and can be cheaply send across process boundaries.
`arrow2` has no `RecordBatch` type; we will get one once we migrate to
`arrow-rs`.

A `TransportChunk` is self-describing: it contains all the data _and_
metadata needed to index it into storage.

We rely heavily on chunk-level and field-level metadata to communicate
Rerun-specific semantics over the wire, e.g. whether some columns are
already properly sorted.

The Arrow metadata system is fairly limited -- it's all untyped strings
--, but for now that seems good enough. It will be trivial to switch to
something else later, if need be.

- Fixes #1760
- Fixes #1692
- Fixes #3360 
- Fixes #1696

---

Part of a PR series to implement our new chunk-based data model on the
client-side (SDKs):
- #6437
- #6438
- #6439
- #6440
- #6441
  • Loading branch information
teh-cmc authored May 31, 2024
1 parent 6d94947 commit b4b7ec4
Show file tree
Hide file tree
Showing 6 changed files with 835 additions and 5 deletions.
13 changes: 11 additions & 2 deletions crates/re_chunk/src/chunk.rs
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ pub type ChunkId = re_tuid::Tuid;
/// Its time columns might or might not be ascendingly sorted, depending on how the data was logged.
///
/// This is the in-memory representation of a chunk, optimized for efficient manipulation of the
/// data within.
/// data within. For transport, see [`crate::TransportChunk`] instead.
#[derive(Debug, Clone)]
pub struct Chunk {
pub(crate) id: ChunkId,
Expand Down Expand Up @@ -337,7 +337,16 @@ impl Chunk {
}
}

// TODO(cmc): display impl
impl std::fmt::Display for Chunk {
#[inline]
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
let chunk = self.to_transport().map_err(|err| {
re_log::error_once!("couldn't display Chunk: {err}");
std::fmt::Error
})?;
chunk.fmt(f)
}
}

// TODO(cmc): sizebytes impl + sizebytes caching + sizebytes in transport metadata

Expand Down
2 changes: 2 additions & 0 deletions crates/re_chunk/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,11 @@
mod chunk;
mod shuffle;
mod transport;
mod util;

pub use self::chunk::{Chunk, ChunkError, ChunkId, ChunkResult, ChunkTimeline};
pub use self::transport::TransportChunk;
pub use self::util::arrays_to_list_array;

pub mod external {
Expand Down
6 changes: 3 additions & 3 deletions crates/re_chunk/src/shuffle.rs
Original file line number Diff line number Diff line change
Expand Up @@ -297,7 +297,7 @@ mod tests {
components.clone().into_iter().collect(),
)?;

// eprintln!("{chunk_sorted}");
eprintln!("{chunk_sorted}");

assert!(chunk_sorted.is_sorted());
assert!(chunk_sorted.is_sorted_uncached());
Expand All @@ -308,7 +308,7 @@ mod tests {
chunk_shuffled
};

// eprintln!("{chunk_shuffled}");
eprintln!("{chunk_shuffled}");

assert!(!chunk_shuffled.is_sorted());
assert!(!chunk_shuffled.is_sorted_uncached());
Expand All @@ -320,7 +320,7 @@ mod tests {
chunk_resorted
};

// eprintln!("{chunk_resorted}");
eprintln!("{chunk_resorted}");

assert!(chunk_resorted.is_sorted());
assert!(chunk_resorted.is_sorted_uncached());
Expand Down
Loading

0 comments on commit b4b7ec4

Please sign in to comment.