You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Right now the refresh of DNS entries is uniquely determined by variable monitor_local_dns_cache_refresh_interval. This determines the frequency at which the DNS are checked for expired TTL and placed in the resolver queue for renewal.
Issue
Since cache entries are only refreshed at these intervals, if a server IP changes by any reason (e.g. unplanned failover), all subsequent connection attempts to this server will fail until the entry TTL expires and a new check (via refresh_interval) is triggered. A protection for these scenarios is to set a smaller refresh interval than the expected delay due to DNS update propagation. This will be sufficient to reduce the expected downtime of the instance to that given interval.
Improvement
A way to improve this situation would be to remove the cache entry corresponding to the server whenever we find a connection error to a backend instance. This invalidation will be immediate. This will serve as a generic protection mechanism that will reduce downtime to the delay of the DNS update propagation itself. All subsequent connections on that server will perform DNS resolution until the next monitor_local_dns_cache_refresh_interval updates the cache with a new valid value.
Implementation Details
Whenever a connect error is detected for a backend connection:
If DNSwas used for the connection attempt (entry was retrieved from DNS cache):
A exclusion list should be check, if the error is found, nothing should be done. This exclusion list shall include errors not related to not being able to reach the server, like Access denied errors.
If the error is not found in this list, the corresponding entry for this server in the DNS cache must be removed.
If DNSwasn't used (no entry or disabled), nothing should be done.
This should be enough for making all subsequent connections attempts on the server to attempt DNS resolution until the next monitor_local_dns_cache_refresh_interval updates the cache with a new valid value.
The text was updated successfully, but these errors were encountered:
Current Behavior
Right now the refresh of
DNS
entries is uniquely determined by variablemonitor_local_dns_cache_refresh_interval
. This determines the frequency at which theDNS
are checked for expiredTTL
and placed in the resolver queue for renewal.Issue
Since cache entries are only refreshed at these intervals, if a server IP changes by any reason (e.g. unplanned failover), all subsequent connection attempts to this server will fail until the entry
TTL
expires and a new check (viarefresh_interval
) is triggered. A protection for these scenarios is to set a smaller refresh interval than the expected delay due toDNS
update propagation. This will be sufficient to reduce the expected downtime of the instance to that given interval.Improvement
A way to improve this situation would be to remove the cache entry corresponding to the server whenever we find a connection error to a backend instance. This invalidation will be immediate. This will serve as a generic protection mechanism that will reduce downtime to the delay of the
DNS
update propagation itself. All subsequent connections on that server will performDNS
resolution until the nextmonitor_local_dns_cache_refresh_interval
updates the cache with a new valid value.Implementation Details
Whenever a connect error is detected for a backend connection:
DNS
was used for the connection attempt (entry was retrieved fromDNS
cache):Access denied
errors.DNS
cache must be removed.DNS
wasn't used (no entry or disabled), nothing should be done.This should be enough for making all subsequent connections attempts on the server to attempt
DNS
resolution until the nextmonitor_local_dns_cache_refresh_interval
updates the cache with a new valid value.The text was updated successfully, but these errors were encountered: