You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We would like to learn about your use case. For example, if this feature is needed to adopt Narwhals in an open source project, could you please enter the link to it below?
I'm building a platform for feature engineering across several contexts (batch, stream, on-demand). Today using pandas for on-demand and often Snowpark for batch, but I need to run large backfills of the on-demand feature logic. I am doing the heavy lifting (joins & agg) from data sources in Snowpark before pulling into pandas, but still limited by RAM and switching between 2 APIs within the platform. Supporting Snowpark in narwhals would allow using the same expressions for both on-demand and backfills, and keep the platform's internal code much simpler by removing all the conditional checks between Snowpark & pandas.
Please describe the purpose of the new feature or describe the problem to solve.
I've been following the progress in #333 and would love to see Snowpark support added, building upon the PySpark implementation. Snowpark follows the PySpark API so hopefully can leverage all the great work done adding PySpark.
Suggest a solution if possible.
No response
If you have tried alternatives, please describe them below.
Ibis promises to do this, but has high overhead when converting from a native pyarrow table or pandas df to an Ibis table, especially when you have 1000+ thousand columns. It may take 10s just to create one Ibis table, when I need the feature calculations completed in <50ms. Narwhals seems like it may fit by using the supplied native df directly with minimal overhead.
Briefly tried DuckDB's Spark API but it too had high overhead when mapping from a large pandas df to a DuckDB dataframe.
Additional information that may help us understand your needs.
No response
The text was updated successfully, but these errors were encountered:
We would like to learn about your use case. For example, if this feature is needed to adopt Narwhals in an open source project, could you please enter the link to it below?
I'm building a platform for feature engineering across several contexts (batch, stream, on-demand). Today using pandas for on-demand and often Snowpark for batch, but I need to run large backfills of the on-demand feature logic. I am doing the heavy lifting (joins & agg) from data sources in Snowpark before pulling into pandas, but still limited by RAM and switching between 2 APIs within the platform. Supporting Snowpark in narwhals would allow using the same expressions for both on-demand and backfills, and keep the platform's internal code much simpler by removing all the conditional checks between Snowpark & pandas.
Please describe the purpose of the new feature or describe the problem to solve.
I've been following the progress in #333 and would love to see Snowpark support added, building upon the PySpark implementation. Snowpark follows the PySpark API so hopefully can leverage all the great work done adding PySpark.
Suggest a solution if possible.
No response
If you have tried alternatives, please describe them below.
Ibis promises to do this, but has high overhead when converting from a native pyarrow table or pandas df to an Ibis table, especially when you have 1000+ thousand columns. It may take 10s just to create one Ibis table, when I need the feature calculations completed in <50ms. Narwhals seems like it may fit by using the supplied native df directly with minimal overhead.
Briefly tried DuckDB's Spark API but it too had high overhead when mapping from a large pandas df to a DuckDB dataframe.
Additional information that may help us understand your needs.
No response
The text was updated successfully, but these errors were encountered: