Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-43985: [Python][C++] Fixed pyarrow table equality function not comparing table data when comparing against itself #44497

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

Yimche
Copy link

@Yimche Yimche commented Oct 22, 2024

Rationale for this change

This change is to address issue #43985 in which a bug in the behaviour of equality on pyarrow tables occurs when comparing a table containing a NaN value against itself.

What changes are included in this PR?

  • Removed if block in arrow/cpp/src/arrow/table.cc that forced a return true if memory address of the two items being compared were equal, no matter the data stored inside.
  • Added test case.

Are these changes tested?

Yes. I put a test in the arrow/python/pyarrow/tests/test_table.py under the test_table_from_pydict function, as I wasn't sure and couldn't find where else to put the test.

Are there any user-facing changes?

No

Comment on lines +2304 to +2306
# With dicts containing NaNs
table = cls.from_pydict({"foo": [float("nan")]})
assert table != table
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also add a C++ level test to detect this case only in C++ level?

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting review Awaiting review labels Oct 22, 2024
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Oct 26, 2024
@@ -2301,6 +2301,10 @@ def test_table_from_pydict(cls):
with pytest.raises(TypeError):
cls.from_pydict({'a': [1, 2, 3]}, schema={})

# With dicts containing NaNs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test isn't related to from_pydict, we should either make it a purely C++ test as @kou suggested, or add it to an equality related test case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants