-
Notifications
You must be signed in to change notification settings - Fork 385
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tracking issue: end-to-end batching #1619
Comments
Copy pasting the discord thread regarding === Hey folks, In practice, this already translates into very significant compute & memory performance improvements across the stack starting today, and paves the way for even more of those in the future (ingestion speed, query speed, memory usage, network bandwidth, garbage collection throughput & latency...). These changes are available right now on latest The first big chunk of this work was the introduction of new core data types to abstract over raw Arrow data: These new abstractions make it much more manageable to work efficiently with raw Arrow data across the entire stack (SDK, transport, datastore, query layer... all the way from the clients up to the renderer!), as well as guard against common Arrow pitfalls. Then comes the new datastore itself (#1727, #1735, #1739, #1785, #1791, #1795, #1801), which builds upon these new types and gets the store's internals closer to the overarching data model. The result is much faster query speeds and drastically reduced memory usage. Applications that put the most stress on the store will of course be the one benefiting the most from these changes. To demonstrate all of this we can use our official Running the simulation for 50'000 frames, then replaying it at 180x speed with 1000 frames of visible history buffer for the minute hand of the clock 👇 Before: ~15ms per frame / ~4.5GiB of RAM required: 23-04-13_144446.patched.mp4After: ~7ms per frame / ~920MiB of RAM required: 23-04-13_144915.patched.mp4So, roughly a ~2x improvement in frame times and ~5x in memory usage! |
The win from not logging |
RFC
Move
DataStore
sanity checks and formatting tools to separate filesstore.rs
is supposed to be the place where one can get an overview of all the datastructures involved in the store, except it has slowly become a mess over time and is now pretty much unreadable.Implement all the needed tests & benchmarks
We need to be able to check for regressions at every step, so make sure we have all the tests and benchmarks we need for that.
We should already be 95% of the way there at this point.
Replace
MsgBundle
&ComponentBundle
with the new types (DataCell
,DataRow
,DataTable
,EventId
,BatchId
...)No actual batching features nor any kind of behavior changes of any sort: just define the new types and use them everywhere.
Pass entity path as a column rather than as metadata
Replace the current entity_path that is passed in the metadata map with an actual column instead. This will also requires us to make
EntityPath
a proper arrow datatype (..datatype, not component!!).Make sure implicit instance counts have been wiped everywhere #1892
Issue created; not blocking for batching.
Eliminate legacy splats #1893
Issue created; not blocking for batching.
Get rid of component buckets altogether
Update the store implementation to remove component tables, remove the
get
APIs, introduce slicing on the write path, etc. Still no batching in sight!SDK-side log batching #1880
Implement the coalescing/accumulation logic in the SDK
Add the required logic/thread/timers/whatever-else in the SDKs to accumulate data and just send it all as many
LogMsg
s (i.e. no batching yet).Implement full-on batching
End-to-end: transport, storage, the whele shebang.
Sort the batch before sending (
(event_id, entity_path)
)Keep that in its own PR to keep track of the benchmarks.
Implement new GC
The complete implementation; should close all existing GC issues.
Dump directly from the store into an rrd file
No rebatching yet, just dump every event in its own
LogMsg
.Remove
LogMsg
s fromLogDb
We shouldn't need to keep track of events outside the store past this point: clean it all up.
Reminder: the timeline widget keeps track of timepoints directly, not events.
Rebatch aggressively while dumping the store to a stream of
LogMsg
#1894Issue created; not blocking for batching.
Make
log_time
column implicit and potentially introduceingest_time
#1891Issue created; not blocking for batching.
A
Component
'sDataType
should embed its metadata #1696Issue created; not blocking for batching.
re_datastore: replace anyhow::Error usage with a thiserror derived Error type #527
The text was updated successfully, but these errors were encountered: