Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Polars benchmark are very suboptimal. #268

Closed
ritchie46 opened this issue Nov 25, 2024 · 6 comments · Fixed by #270
Closed

Polars benchmark are very suboptimal. #268

ritchie46 opened this issue Nov 25, 2024 · 6 comments · Fixed by #270

Comments

@ritchie46
Copy link
Contributor

They benchmark the eager API which forces Polars to materialize every result and doesn't allow any optimizations. We state in our docs that if you benchmark Polars you should use the Lazy API.

I also wonder why data must be loaded via Pandas?

I could take a stab at doing correct Polars queries. Does IO time need to be included or must queries load from DataFrames. I also wonder why pandas queries are mixed with the Polars queries file?

@ritchie46 ritchie46 mentioned this issue Nov 25, 2024
@rluvaton
Copy link

There should also be polars from parquet and sql like in datafusion and use polars-cli for that

@auxten
Copy link
Member

auxten commented Nov 26, 2024

@ritchie46
For this patch, it is a test of 'Query on Dataframe'. As polars do not support query on dataframe directly, there is a load time.

You are welcomed to send a patch to correct usage of Polars😀

@ritchie46
Copy link
Contributor Author

As polars do not support query on dataframe directly, there is a load time.

I don't know what you mean? What do we not support?

@ritchie46
Copy link
Contributor Author

There should also be polars from parquet and sql like in datafusion and use polars-cli for that

We can write the queries from parquet. I can adapt the PR to do that. Our polars-cli is heavily outdated. I would recommend using Python Polars. That's our latest binary. We also can run SQL (but not everything as SQL is more an afterthought).

@auxten
Copy link
Member

auxten commented Nov 26, 2024

As polars do not support query on dataframe directly, there is a load time.

I don't know what you mean? What do we not support?

I mean for #222, it a "Query on Pandas Dataframe benchmark".
Forgive my ignorance, I don't know how Polars can query on a Pandas DataFrame directly without converting it to a Polars DataFrame.

@ritchie46
Copy link
Contributor Author

It is not meant to query on pandas DataFrames. If pandas can query on it's own Dataframes, I think Polars should be able to do that as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants