Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Concurrent streams + Object Store POC #400

Closed

Conversation

H-Plus-Time
Copy link
Contributor

Derived from the refactor branch, at the moment I've extended the file struct a bit for easy toggling between the two backing readers.

@H-Plus-Time
Copy link
Contributor Author

There's definitely multiple dependencies that have crept in - a few are for more verbose interfaces (e.g. serde wasm bindgen for grabbing a JS manipulable map representation of the schema), others were blind alleys (I'm pretty sure satisfying the Send requirement by skipping the use of reqwest's bytes_stream method obviates the need for the feature).

@H-Plus-Time
Copy link
Contributor Author

One further thing on the object_store front - to check if I was foregoing any performance wins from using byte streams (vs collecting and piping over a channel, to be shoved into a suitably Send stream), I systematically ripped out all the Send boundaries (including those hiding behind BoxFuture, BoxStream) in a local copy of the object_store repo - comparison PR (purely for academic interest) over at H-Plus-Time/arrow-rs#1.

No apparent loss in the Send-compliant approach, at least not from the way the ParquetRecordBatchReader is using it (this I suspect is the crux of the apparent non-importance of streaming vs awaited responses - at some point, you need the entire buffer on hand, and the reader chooses to remain idle until that happens).

Comment on lines +114 to +123
object_store = { version = "0.8.0", default-features = false, features = [] }
async-trait = "0.1.74"
tokio = { version = "1.34.0", default-features = false }
itertools = "0.12.0"
url = "2.5.0"
chrono = { version = "0.4.31", default-features = false, features = [ "wasmbind" ] }
snafu = "0.7.5"
async-std = "1.12.0"
backon = "0.4.1"
instant = { version = "0.1.12", features = ["wasm-bindgen"] }
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that is a lot of deps... do you know how much the bundle size changes by?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now it's sitting at 5.20MB, so about 100kB relative to main (5.09MB).

snafu, async-std and itertools could probably go, though because the latter two are unused, they're likely not contributing to the bundle.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah interesting; that's less of an addition than I expected (brotli-encoding that should bring it down to about 1MB; try brotli -Z). What's the compressed delta?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting - 1.22MB up from 1.15MB (1.16MB for the changes in kyle/async-arrow1), so a relatively incompressible increase.

This kind of makes me want to add PR package size reports 🤔 .

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This kind of makes me want to add PR package size reports 🤔 .

That sounds awesome!

@kylebarron
Copy link
Owner

a few are for more verbose interfaces

I've tended to lean towards a more class-based approach, e.g. with the options builder and such, rather than JS objects; also in the hope that they give better typing and API docs in the IDE

@H-Plus-Time
Copy link
Contributor Author

a few are for more verbose interfaces

I've tended to lean towards a more class-based approach, e.g. with the options builder and such, rather than JS objects; also in the hope that they give better typing and API docs in the IDE

Oh no, by that I meant for inspection purposes (spitting out schema descriptions for debugging purposes), the sort that I would err on the side of ripping out once all the kinks are ironed out.

@kylebarron
Copy link
Owner

Oh no, by that I meant for inspection purposes (spitting out schema descriptions for debugging purposes), the sort that I would err on the side of ripping out once all the kinks are ironed out.

Oh in that case, you might want to leave them in and set the debug feature flag on them

@kylebarron
Copy link
Owner

Is there anything left in this PR that we should look into? Or should we close this?

@H-Plus-Time
Copy link
Contributor Author

Yep, nothing left to learn/integrate from it, definitely.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants