-
Notifications
You must be signed in to change notification settings - Fork 908
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement GroupBy.value_counts
to match pandas API
#14114
Conversation
3e77f75
to
5fa821b
Compare
/ok to test |
5fa821b
to
7c91283
Compare
/ok to test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this feature! I have a few questions.
Can we make sure to cover errors raised by pandas?
- pandas raises an error about "clashing" if the subset of columns and groupby columns have a common element: https://github.com/pandas-dev/pandas/blob/f4f598fb36c0809da01cade2d5d832ee09564101/pandas/core/groupby/groupby.py#L2761-L2763
- pandas raises an error about "doesn't exist" if the subset keys don't exist in the dataframe. https://github.com/pandas-dev/pandas/blob/f4f598fb36c0809da01cade2d5d832ee09564101/pandas/core/groupby/groupby.py#L2767-L2769
- pandas raises if there is already a column with the name
count
orproportion
in the dataframe. https://github.com/pandas-dev/pandas/blob/f4f598fb36c0809da01cade2d5d832ee09564101/pandas/core/groupby/groupby.py#L2851-L2852
Can we cover cases like non-observed categorical values? https://github.com/pandas-dev/pandas/blob/f4f598fb36c0809da01cade2d5d832ee09564101/pandas/core/groupby/groupby.py#L2805
This is a great first pass -- but I think I'd like to see some coverage of at least some of the above features before approving (it's probably okay to leave some as TODO).
Thank you for the review @bdice 😃 |
/ok to test |
@stmio Awesome, thanks. Can you add tests for the error cases, too? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks fine to me!
/ok to test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests show some errors. Try replacing .columns
with ._column_names
.
/ok to test |
/merge |
Nice work, @stmio! |
Thank you! 😄 |
Description
This PR implements
GroupBy.value_counts
, matching the pandas equivalent method.Tests currently ignore the returned Series/DataFrame's name, as this was added to pandas in v2.0.0. This can be removed if tests are against
pandas>=2.0.0
.Closes #12789
Checklist