-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Concurrent streams + Object Store POC #400
Concurrent streams + Object Store POC #400
Conversation
…rn values for StreamExt::Buffered
…ed (somewhat overly forgiving) select_columns interface
There's definitely multiple dependencies that have crept in - a few are for more verbose interfaces (e.g. serde wasm bindgen for grabbing a JS manipulable map representation of the schema), others were blind alleys (I'm pretty sure satisfying the Send requirement by skipping the use of reqwest's bytes_stream method obviates the need for the feature). |
One further thing on the object_store front - to check if I was foregoing any performance wins from using byte streams (vs collecting and piping over a channel, to be shoved into a suitably Send stream), I systematically ripped out all the Send boundaries (including those hiding behind BoxFuture, BoxStream) in a local copy of the object_store repo - comparison PR (purely for academic interest) over at H-Plus-Time/arrow-rs#1. No apparent loss in the Send-compliant approach, at least not from the way the ParquetRecordBatchReader is using it (this I suspect is the crux of the apparent non-importance of streaming vs awaited responses - at some point, you need the entire buffer on hand, and the reader chooses to remain idle until that happens). |
object_store = { version = "0.8.0", default-features = false, features = [] } | ||
async-trait = "0.1.74" | ||
tokio = { version = "1.34.0", default-features = false } | ||
itertools = "0.12.0" | ||
url = "2.5.0" | ||
chrono = { version = "0.4.31", default-features = false, features = [ "wasmbind" ] } | ||
snafu = "0.7.5" | ||
async-std = "1.12.0" | ||
backon = "0.4.1" | ||
instant = { version = "0.1.12", features = ["wasm-bindgen"] } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that is a lot of deps... do you know how much the bundle size changes by?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now it's sitting at 5.20MB, so about 100kB relative to main (5.09MB).
snafu, async-std and itertools could probably go, though because the latter two are unused, they're likely not contributing to the bundle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah interesting; that's less of an addition than I expected (brotli-encoding that should bring it down to about 1MB; try brotli -Z
). What's the compressed delta?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting - 1.22MB up from 1.15MB (1.16MB for the changes in kyle/async-arrow1), so a relatively incompressible increase.
This kind of makes me want to add PR package size reports 🤔 .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This kind of makes me want to add PR package size reports 🤔 .
That sounds awesome!
I've tended to lean towards a more class-based approach, e.g. with the options builder and such, rather than JS objects; also in the hope that they give better typing and API docs in the IDE |
Oh no, by that I meant for inspection purposes (spitting out schema descriptions for debugging purposes), the sort that I would err on the side of ripping out once all the kinks are ironed out. |
Oh in that case, you might want to leave them in and set the |
Is there anything left in this PR that we should look into? Or should we close this? |
Yep, nothing left to learn/integrate from it, definitely. |
Derived from the refactor branch, at the moment I've extended the file struct a bit for easy toggling between the two backing readers.