Do not connect to the database if the replication configuration is incorrect #8381
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Now, if an error is made in replication.conf, replication is not initialized at all and a situation arises that can lead to desynchronization of an already configured replica.
For example, a synchronous replica is configured - it works successfully. And the administrator decides to add asynchronous replication, stops the DBMS, adds
journal_directory = /path/to/journals
to the configuration and starts the DBMS - at first glance, everything is fine (but access to the directory in the OS is not configured), as a result, during the first connection, access to the directory is checked - an error occurs (written to replication.log), the user connects and can continue working with the DB, but without replication at all, which he does not know about it.The administrator, except for the message in the log, cannot understand in any way that there are problems with the configuration and at the same time, if there is a trigger for connection, then the synchronous replica will most likely become irrelevant immediately after the first connection, which is not good.
This patch offers a way to fix this situation:
do not interrupt the process of reading the config when the first error is found, instead - all errors found are combined into one message. This will make it easier to fix them all at once, rather than step by step.
add a message output to the log if the administrator specified a parameter name without a value. Previously, such a parameter was simply ignored. But logically, if the administrator wrote it for some reason, it meant he wanted to use it. This is not a critical error, but the message in the log should attract his attention to remove this parameter from the configuration, or still set a value for it.
use the disable_on_error configuration parameter: in cases where it is enabled, allow disabling one or more replicas when initializing replication and allow the user to connect to the DB. In the example above, the connection will occur, but asynchronous replication will be disabled, since there is no access to the directory with journals, but the synchronous replica will continue to work. If
disable_on_error = false
the user will receive the error "One or more replicas configured with errors" when connecting.