[Enh]: Add Support for Snowpark #1419

rwhitten577 · 2024-11-21T20:55:35Z

We would like to learn about your use case. For example, if this feature is needed to adopt Narwhals in an open source project, could you please enter the link to it below?

I'm building a platform for feature engineering across several contexts (batch, stream, on-demand). Today using pandas for on-demand and often Snowpark for batch, but I need to run large backfills of the on-demand feature logic. I am doing the heavy lifting (joins & agg) from data sources in Snowpark before pulling into pandas, but still limited by RAM and switching between 2 APIs within the platform. Supporting Snowpark in narwhals would allow using the same expressions for both on-demand and backfills, and keep the platform's internal code much simpler by removing all the conditional checks between Snowpark & pandas.

Please describe the purpose of the new feature or describe the problem to solve.

I've been following the progress in #333 and would love to see Snowpark support added, building upon the PySpark implementation. Snowpark follows the PySpark API so hopefully can leverage all the great work done adding PySpark.

Suggest a solution if possible.

No response

If you have tried alternatives, please describe them below.

Ibis promises to do this, but has high overhead when converting from a native pyarrow table or pandas df to an Ibis table, especially when you have 1000+ thousand columns. It may take 10s just to create one Ibis table, when I need the feature calculations completed in <50ms. Narwhals seems like it may fit by using the supplied native df directly with minimal overhead.

Briefly tried DuckDB's Spark API but it too had high overhead when mapping from a large pandas df to a DuckDB dataframe.

Additional information that may help us understand your needs.

No response

MarcoGorelli · 2024-11-22T08:45:24Z

thanks @rwhitten577 for the request

sure, if it follows the pyspark api then it shouldn't be a major lift once we get pyspark in

MarcoGorelli added enhancement New feature or request blocked labels Nov 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Enh]: Add Support for Snowpark #1419

[Enh]: Add Support for Snowpark #1419

rwhitten577 commented Nov 21, 2024 •

edited

Loading

MarcoGorelli commented Nov 22, 2024

[Enh]: Add Support for Snowpark #1419

[Enh]: Add Support for Snowpark #1419

Comments

rwhitten577 commented Nov 21, 2024 • edited Loading

We would like to learn about your use case. For example, if this feature is needed to adopt Narwhals in an open source project, could you please enter the link to it below?

Please describe the purpose of the new feature or describe the problem to solve.

Suggest a solution if possible.

If you have tried alternatives, please describe them below.

Additional information that may help us understand your needs.

MarcoGorelli commented Nov 22, 2024

rwhitten577 commented Nov 21, 2024 •

edited

Loading