-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[data] Default value of key
in sort(key=None) raises IndexError
#48926
Comments
Can reproduce this behavior and I think current API doc didn't specify the exact behavior under None. I think its more reasonable to follow Pandas way of sort_values(by=None) which will simply raise an error, its more explicit to the pythonic way. |
Thanks for the quick PR, but I haven't thought through raising error for None yet. I think the option was meant to sort all columns. At least one other place uses this "conceptually" is I am also curious whether it was working before or not. |
The Ray Data groupby docs seems to be broken in formatting https://docs.ray.io/en/latest/data/api/doc/ray.data.Dataset.groupby.html, it doesn't clarify how it works. But I assume that grouping all columns meaning grouping everything together right, essentially === doing nothing ? If we apply the same idea to sort, then it also means we do nothing when it is set to None, wdyt? Both are explicit instead of implicitly grouping/sorting based on some hidden factors. It might be counterintuitive for Pandas users if we do the opposite behavior against Pandas. |
After reading a little bit in Let's ping the ray team after the thanksgiving week.. |
Hey @wingkitlee0, I think at this point we don't support sort(None), and probably don't need to support it (for example, Pyarrow forces an explicit parameter, Pandas as well) |
Okay, it sounds like it was never working. we should remove the default value of None then? (and raise error, update doc, etc...) |
What happened + What you expected to happen
Dataset.sort() has a default
key
value ofNone
, which should mean it sorts all columns. But it raisesIndexError
Also, the doc does not say what
None
does.Versions / Dependencies
Reproduction script
It also does not work for
sort(None)
Issue Severity
Medium: It is a significant difficulty but I can work around it.
The text was updated successfully, but these errors were encountered: