POC/Arrow evaluation #61

chrisbc · 2024-04-12T05:34:02Z

as we delve deeper into the EPIC #50 it becomes apparent that maybe dig0data tech like arrow can help. So, can we do this...

basic questions

convert THS objects into a arrow/parquet dataset that can be worked on easily using just regular FileSystemLike storage (including local and S3)
compare performance querying and process large task (eg hazard aggregation in THP
enumerate the pros/cons
is parquet the preferred serialisation format

can we use arrows in-memory features and/or IPC techniques to boost performance and minimise file IO Plasma?
can we do partitioning in arrow (not just parquet) how does that work see
can we easily reshape datasets to optimise for different use-cases (3rd party , internal heavy compute)
can we use SQL-like querys
also here SELECT ...

chrisbc · 2024-05-27T02:27:51Z

completed in #62

chrisbc added toshi-hazard-store LIB EPIC labels Apr 12, 2024

chrisbc self-assigned this Apr 12, 2024

chrisbc changed the title ~~POC/Arrow evaluatoin~~ POC/Arrow evaluation Apr 12, 2024

chrisbc closed this as completed May 27, 2024