Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-117657: fix race:sock_recv_impl suppressions for free-thread building #123697

Merged
merged 6 commits into from
Sep 6, 2024

Conversation

Zheaoli
Copy link
Contributor

@Zheaoli Zheaoli commented Sep 4, 2024

Signed-off-by: Manjusaka <[email protected]>
@mpage
Copy link
Contributor

mpage commented Sep 4, 2024

LGTM!

@mpage mpage requested review from colesbury and DinoV September 4, 2024 21:48
@Zheaoli Zheaoli changed the title gh-123695: fix race:sock_recv_impl suppressions for free-thread building gh-117657: fix race:sock_recv_impl suppressions for free-thread building Sep 5, 2024
Copy link
Contributor

@colesbury colesbury left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you diagnosed the issue correctly, but this changes the test so that it no longer actually tests the intended behavior of close().

Here is the (migrated) bug report that led to this test:

The point is to test that conn.close() actually closes the underlying socket, so that a recv/select on the other end (self.cli) with a short timeout returns immediately with an empty buffer.

  • Restore the deleted conn.close()
  • Move the select/recv checks to the other thread (_testClose) so that each thread only accesses their own end of the connection.

We might need a few more tweaks after that.

@colesbury
Copy link
Contributor

Something like:

    def testClose(self):
        conn, addr = self.serv.accept()
        conn.close()

        # Calling close() many times should be safe.
        conn.close()
        conn.close()

    def _testClose(self):
        self.cli.connect((HOST, self.port))
        read, write, err = select.select([self.cli], [], [], support.SHORT_TIMEOUT)
        self.assertEqual(read, [self.cli])
        self.assertEqual(self.cli.recv(1), b'')

        # The other end should be closed now, so select should immediately
        # return with the socket ready for reading.
        read, write, err = select.select([self.cli], [], [], 0)
        self.assertEqual(read, [self.cli])
        self.assertEqual(self.cli.recv(1), b'')

@Zheaoli
Copy link
Contributor Author

Zheaoli commented Sep 5, 2024

I think you diagnosed the issue correctly, but this changes the test so that it no longer actually tests the intended behavior of close().

Here is the (migrated) bug report that led to this test:

The point is to test that conn.close() actually closes the underlying socket, so that a recv/select on the other end (self.cli) with a short timeout returns immediately with an empty buffer.

  • Restore the deleted conn.close()
  • Move the select/recv checks to the other thread (_testClose) so that each thread only accesses their own end of the connection.

We might need a few more tweaks after that.

Thanks for the tips, I'll update the patch ASAP

Signed-off-by: Manjusaka <[email protected]>
@Zheaoli Zheaoli requested a review from colesbury September 6, 2024 17:03
Copy link
Contributor

@colesbury colesbury left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This version of the PR has a few issues. The time.sleep(1.0) adds an unnecessary one second delay. The 0.1 timeout will make the test too sensitive to timing variations and lead to spurious failures on heavily loaded machines.

Please see the suggested code in my comment at: #123697 (comment)

Signed-off-by: Manjusaka <[email protected]>
Signed-off-by: Manjusaka <[email protected]>
Signed-off-by: Manjusaka <[email protected]>
@Zheaoli
Copy link
Contributor Author

Zheaoli commented Sep 6, 2024

This version of the PR has a few issues. The time.sleep(1.0) adds an unnecessary one second delay. The 0.1 timeout will make the test too sensitive to timing variations and lead to spurious failures on heavily loaded machines.

Please see the suggested code in my comment at: #123697 (comment)

Thanks for the review. I have updated this PR. I think this version of this PR is similar to your suggested code. I add an extra step to check the socket which is generated bt accept call works fine before the close action. I'm not sure this is necessary but I think this should be better.

Copy link
Contributor

@colesbury colesbury left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@colesbury colesbury self-assigned this Sep 6, 2024
@colesbury colesbury merged commit 8a46a2e into python:main Sep 6, 2024
34 checks passed
@Zheaoli Zheaoli deleted the manjusaka/fix-socket-tsan branch September 7, 2024 08:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make TSAN tests pass with the GIL disabled in free-threaded builds
4 participants