-
Notifications
You must be signed in to change notification settings - Fork 867
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(arrow-ord): support boolean in rank
and add tests for sorting lists of booleans
#6912
Conversation
Rather than sorting booleans directly, it would be orders of magnitude faster to count the number of bits and use this to compute the result. Nulls can be handled by first computing the bitwise and with the values. |
Great idea, did not think about it (I was just trying to implement before trying to make it fast as you said) |
Fair, I'll try to find some time over the next few weeks to review this, but I'm afraid I have very little time to review PRs, especially of this size |
@tustvold Updated to improve performance, please let me know if that's what you meant |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to improve performance, please let me know if that's what you meant
I think he was referring to the sort logic for booleans, which doesn't actually seem to be implemented in this PR? It might be worth splitting this PR as its doing two separate things and might cause confusion.
Marking as draft as I think this PR is no longer waiting on feedback. Please mark it as ready for review when it is ready for another look |
64864b4
to
1bcfdee
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM; I think just adding a few last test cases to get complete coverage should push this over the line.
Something interesting about the rank function in arrow-rs is that it seems to behave differently compared to others.
Postgres:
rank () → bigint
Returns the rank of the current row, with gaps; that is, the row_number of the first row in its peer group.
DuckDB:
Description The rank of the current row with gaps; same as row_number of its first peer.
SQL Server:
If two or more rows tie for a rank, each tied row receives the same rank. For example, if the two top salespeople have the same SalesYTD value, they are both ranked one.
It seems these other ones would have ties take the lower values, whereas for arrow-rs the existing rank function (and subsequently what is introduced here) takes the higher value; I guess this might be worth a broader discussion, as this PR itself currently follows the existing behaviour
🤔
rank
and sorting list of booleansrank
and add tests for sorting lists of booleans
@Jefffrey done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
I'll merge in next few days if no further comments
Thanks @rluvaton & co |
Raised #7024 |
Which issue does this PR close?
Part of #6911
Rationale for this change
I want to sort list of of booleans
What changes are included in this PR?
added rank function for boolean and added tests for sorting list of booleans and ranking booleans
Are there any user-facing changes?
Yes, the
rank
function now supportBooleanArray
and sort support sorting list of booleansthis build on top of:
can_rank
to therank
file #6910