-
Notifications
You must be signed in to change notification settings - Fork 161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Polars benchmark are very suboptimal. #268
Comments
There should also be polars from parquet and sql like in datafusion and use |
@ritchie46 You are welcomed to send a patch to correct usage of Polars😀 |
I don't know what you mean? What do we not support? |
We can write the queries from parquet. I can adapt the PR to do that. Our polars-cli is heavily outdated. I would recommend using Python Polars. That's our latest binary. We also can run SQL (but not everything as SQL is more an afterthought). |
I mean for #222, it a "Query on Pandas Dataframe benchmark". |
It is not meant to query on pandas DataFrames. If pandas can query on it's own Dataframes, I think Polars should be able to do that as well. |
They benchmark the eager API which forces Polars to materialize every result and doesn't allow any optimizations. We state in our docs that if you benchmark Polars you should use the Lazy API.
I also wonder why data must be loaded via Pandas?
I could take a stab at doing correct Polars queries. Does IO time need to be included or must queries load from DataFrames. I also wonder why pandas queries are mixed with the Polars queries file?
The text was updated successfully, but these errors were encountered: