Connect errors should invalidate `DNS` cache entries #4593

JavierJF · 2024-07-24T17:43:21Z

Current Behavior

Right now the refresh of DNS entries is uniquely determined by variable monitor_local_dns_cache_refresh_interval. This determines the frequency at which the DNS are checked for expired TTL and placed in the resolver queue for renewal.

Issue

Since cache entries are only refreshed at these intervals, if a server IP changes by any reason (e.g. unplanned failover), all subsequent connection attempts to this server will fail until the entry TTL expires and a new check (via refresh_interval) is triggered. A protection for these scenarios is to set a smaller refresh interval than the expected delay due to DNS update propagation. This will be sufficient to reduce the expected downtime of the instance to that given interval.

Improvement

A way to improve this situation would be to remove the cache entry corresponding to the server whenever we find a connection error to a backend instance. This invalidation will be immediate. This will serve as a generic protection mechanism that will reduce downtime to the delay of the DNS update propagation itself. All subsequent connections on that server will perform DNS resolution until the next monitor_local_dns_cache_refresh_interval updates the cache with a new valid value.

Implementation Details

Whenever a connect error is detected for a backend connection:

If DNS was used for the connection attempt (entry was retrieved from DNS cache):
1. A exclusion list should be check, if the error is found, nothing should be done. This exclusion list shall include errors not related to not being able to reach the server, like Access denied errors.
2. If the error is not found in this list, the corresponding entry for this server in the DNS cache must be removed.
If DNS wasn't used (no entry or disabled), nothing should be done.

This should be enough for making all subsequent connections attempts on the server to attempt DNS resolution until the next monitor_local_dns_cache_refresh_interval updates the cache with a new valid value.

The text was updated successfully, but these errors were encountered:

renecannao · 2024-09-26T22:24:54Z

Solved in #4662 and #4656

JavierJF added the enhancement label Jul 24, 2024

renecannao closed this as completed Sep 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Connect errors should invalidate `DNS` cache entries #4593

Connect errors should invalidate `DNS` cache entries #4593

JavierJF commented Jul 24, 2024

renecannao commented Sep 26, 2024

Connect errors should invalidate DNS cache entries #4593

Connect errors should invalidate DNS cache entries #4593

Comments

JavierJF commented Jul 24, 2024

Current Behavior

Issue

Improvement

Implementation Details

renecannao commented Sep 26, 2024

Connect errors should invalidate `DNS` cache entries #4593

Connect errors should invalidate `DNS` cache entries #4593