Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix busy-wait on client SSL connection #4588

Merged
merged 19 commits into from
Aug 8, 2024
Merged

Conversation

JavierJF
Copy link
Collaborator

@JavierJF JavierJF commented Jul 12, 2024

Main Fix

This PR fixes the following busy-loop scenario when handling SSL client connections, the description is in terms of reads handling within MySQL_Data_Stream::read_from_net function:

  1. A client connection was closed, leading to a return of 0 by 'recv', and 'errno' to be set with 'EAGAIN'.
  2. Error handling is performed taking into account a never performed call to 'SSL_read' (as if this '0' was a result for this call).
  3. The call to 'SSL_get_error' returns with 'SSL_ERROR_SYSCALL', as a result of using this invalid param of '0' (which for 'SSL_read' would mean non-retryable error), and the non-reset 'errno' would still be 'EAGAIN'.
  4. This would create the illusion of an interrupted read. So a new read is tried again, while in reality the socket is closed. This busy loop will continue until 'POLLHUP' is received when we attempt to write back to the client and a 'RST' is received as a response.

Minor Fix

This PR makes proxy_debug_func save/restore the errno value. This can prevent issues of logging influencing error handling.

Extra fixes

  1. Since this PR when a value supplied for an interval is rounded, a warning is printed in error log.
  2. Due to recent changes in the CI this PR packs multiple fixes for issues recently observed within some tests, and some improvements for the tooling.

JavierJF added 4 commits July 12, 2024 18:27
Errors and EOF returned by 'recv' are now properly handled, isolated
from the return value ('r') from the later and conditional 'SSL_read'.

Previously, a busy loop was possible in the following scenario:

1. A client connection was closed, leading to a return of 0 by 'recv',
   and 'errno' to be set with 'EAGAIN'.
2. Error handling is performed taking into account a never performed
   call to 'SSL_read' (as if this '0' was a result for this call).
3. The call to 'SSL_get_error' returns with 'SSL_ERROR_SYSCALL', as a
   result of using this invalid param of '0' (which for 'SSL_read' would
   mean non-retryable error), and the non-reset 'errno' would still be
   'EAGAIN'.
4. This would create the illusion of an interrupted read. So a new read
   is tried again, while in reality the socket is closed. This busy loop
   will continue until 'POLLHUP' is received when we attempt to write
   back to the client and a 'RST' is received as a response.
Several calls inside 'proxy_debug_func' (specially if database logging
is enabled) could alter 'errno', which could influence error handling.
@mirostauder
Copy link
Collaborator

retest this please

@mirostauder
Copy link
Collaborator

retest this please

@JavierJF JavierJF force-pushed the v2.x-client_ssl_busy_wait branch from 3010f72 to cf2163f Compare July 15, 2024 19:37
JavierJF added 3 commits July 16, 2024 21:19
- Ensure new values has been written to 'system_cpu'.
- Use the correct (runtime) interval for the computation.
- Issue warning when supplied interval != runtime (rounded value).
Variables specifying timing intervals, like 'stats_mysql_connections',
may get rounded to the closer, valid time interval. Now a warning is
issued for making the user aware of this rounding operation.
@mirostauder
Copy link
Collaborator

retest this please

JavierJF added 6 commits July 17, 2024 10:08
Some tests assume ProxySQL should be in an idle state, only affected by
either idle connections or monitoring actions. This idle state could be
perturbed by the continuous actions of the scheduler in Core Cluster
nodes.
- Simplified and unified condition waiting checks.
- Added new helper function for fetching cluster core nodes.
- Refactor nodes fetching using new helper functions.
- Simplified and documented 'plan' tests count computation.
- Other minor cleanups.
Test is now aware of 'cluster_mysql_servers_sync_algorithm'. In case of
'runtime_mysql_servers' propagation, the test disables monitoring since
it creates servers in hostgroups with active monitoring, if config
doesn't allows 'runtime' propagation, the test assumes everything will
just work. This should be a common pattern for tests creating fake
servers.
- Adds config option with microseconds resolution, ON by default.
- Preserves TAP convention for messages to prevent tooling breaking.
This list hold tracking errors that in practice are equivalent to
ER_UNKNOWN_SYSTEM_VARIABLE and that we want to handle in the same way.
@JavierJF JavierJF force-pushed the v2.x-client_ssl_busy_wait branch from 85c80f1 to b91b6b8 Compare July 18, 2024 10:39
@renecannao renecannao merged commit 01d1fc9 into v2.x Aug 8, 2024
50 of 51 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants