[Enh]: Support polars.Expr.rank #1323

adamblake · 2024-11-05T02:47:19Z

We would like to learn about your use case. For example, if this feature is needed to adopt Narwhals in an open source project, could you please enter the link to it below?

I am abstracting a library for computing teaching metrics so that researchers can use their data processing library of choice. Narwhals seems like a good bet (also shout-out to @mikeckennedy for having you on the podcast!). I can't share the specific repository because it contains internal scripts, but this would be supporting CourseKata, a low-cost textbook platform dedicated to continuous improvement based on learning science principles.

Please describe the purpose of the new feature or describe the problem to solve.

I would like support for the polars.Expr.rank method. One example of how it could be used is to count how often an instructor teaches, given some grouping variable (window). In Polars it might look like this:

df.sort("academic_year").with_columns(
  years_taught=pl.col("academic_year")
    .rank(method="dense")
    .over("instructor_id")
)

This would window over instructor_id and get the rank by academic_year. Essentially, we will get a count of how many academic years an instructor has taught in, and because we are using the "dense" ranking, teaching multiple classes in a year counts as a single year taught.

Suggest a solution if possible.

No response

If you have tried alternatives, please describe them below.

I could probably achieve this by making an intermediate data frame where I filter down academic_year using unique(), and then make some kind of counter variable based on instructor_id, and then join() that back to the initial table.

Instead I would rather just go back to using Polars until this feature is supported (if it is on your roadmap!).

Additional information that may help us understand your needs.

No response

The text was updated successfully, but these errors were encountered:

FBruzzesi · 2024-11-05T08:29:05Z

Hey @adamblake , thanks for the feature request. This is definitly in scope 👌 we are currently finalizing an integration, but we will get soon back to expanding the API 😁

mikeckennedy · 2024-11-07T16:49:05Z

also shout-out to @mikeckennedy for having you on the podcast!

Thanks @adamblake.

FBruzzesi · 2024-11-07T22:31:09Z

Hey @adamblake , I started to take a look. Just for context I would like to mention that we will be able to fully support rank for pandas and polars, while for pyarrow there could be some shortcomings. Namely:

the default method for polars method="average" is the only one not supported in arrow
pyarrow TableGroupBy.aggregate does not support ranking in any form. I see in your example that you would like to use in a over context, which for pandas and pyarrow is equivalent to performing a group by and join, therefore this won't be supported for pyarrow.

adamblake · 2024-11-07T23:35:12Z

@FBruzzesi thanks for the context. We use polars / pandas so this would be great for us

FBruzzesi added the enhancement New feature or request label Nov 5, 2024

MarcoGorelli added the accepted label Nov 5, 2024

FBruzzesi self-assigned this Nov 6, 2024

FBruzzesi mentioned this issue Nov 9, 2024

feat: add Series|Expr.rank #1342

Draft

10 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Enh]: Support polars.Expr.rank #1323

[Enh]: Support polars.Expr.rank #1323

adamblake commented Nov 5, 2024 •

edited

Loading

FBruzzesi commented Nov 5, 2024 •

edited

Loading

mikeckennedy commented Nov 7, 2024

FBruzzesi commented Nov 7, 2024

adamblake commented Nov 7, 2024

[Enh]: Support polars.Expr.rank #1323

[Enh]: Support polars.Expr.rank #1323

Comments

adamblake commented Nov 5, 2024 • edited Loading

We would like to learn about your use case. For example, if this feature is needed to adopt Narwhals in an open source project, could you please enter the link to it below?

Please describe the purpose of the new feature or describe the problem to solve.

Suggest a solution if possible.

If you have tried alternatives, please describe them below.

Additional information that may help us understand your needs.

FBruzzesi commented Nov 5, 2024 • edited Loading

mikeckennedy commented Nov 7, 2024

FBruzzesi commented Nov 7, 2024

adamblake commented Nov 7, 2024

adamblake commented Nov 5, 2024 •

edited

Loading

FBruzzesi commented Nov 5, 2024 •

edited

Loading