You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As described in the document, ProxySQL should be able to change the status of backend servers automatically based on the detected replica lag.
max_lag_ms : replicas that have a replication greater or equal than max_lag_ms milliseconds are automatically disabled from the cluster until their lag returns below the configured threshold.
However it seems like the status change only happens once when mysql_aws_aurora_hostgroups got updated. The status never gets adjusted again regardless of the newly detected lag.
After looking into the codes of ProxySQL, I think there should be a code bug in MySQL_Monitor::evaluate_aws_aurora_results of MySQL_Monitor.cpp as shown below:
At line 6370
unsigned int current_lag_ms = estimate_lag(hse->server_id, lasts_ase, ase_idx, add_lag_ms, min_lag_ms, lag_num_checks);
hse->estimated_lag_ms = current_lag_ms;
if (current_lag_ms > max_latency_ms) {
enable = false;
}
......
unsigned int prev_lag_ms = estimate_lag(hse->server_id, lasts_ase, ase_idx, add_lag_ms, min_lag_ms, lag_num_checks);
if (prev_lag_ms > max_latency_ms) {
prev_enabled = false;
}
if (prev_enabled == enable) {
// the previous status should be the same
// do not run any action
run_action = false;
}
According to the code pieces, the "ase_idx" used for "estimate_lag" is not changed between the two calls, which results "current_lag_ms" and "prev_lag_ms" the same all the time. So it always falls into "prev_enabled == enable" true and takes no action.
After comparing to previous version, I tried to change the second "ase_idx" to "prev_ase_idx" and tested with the new build. As expected, everything worked well!
Please help check if the conclusion above is correct or not. This is really important to our project.
BTW, the explanation of "max_lag_ms" needs to be corrected as:
max_lag_ms : replicas that have a replication greater or equal than max_lag_ms milliseconds are automatically disabled from the cluster until their lag returns below or equal to the configured threshold.
first, thanks for the detailed report. This was indeed an issue, and was fixed by this commit as part of this PR. That mentioned PR has been recently merged as part of this bigger PR. So this issue is already fixed in the current development branch, and of course will be part of the next release.
@JavierJF Great to know!
And may I know what's the rough schedule for next release? I have checked the release history and the cadence was like once per 1~2 months. However it hasn't been updated since last August somehow.
As described in the document, ProxySQL should be able to change the status of backend servers automatically based on the detected replica lag.
However it seems like the status change only happens once when mysql_aws_aurora_hostgroups got updated. The status never gets adjusted again regardless of the newly detected lag.
After looking into the codes of ProxySQL, I think there should be a code bug in MySQL_Monitor::evaluate_aws_aurora_results of MySQL_Monitor.cpp as shown below:
proxysql/lib/MySQL_Monitor.cpp
Line 6326 in c41331a
According to the code pieces, the "ase_idx" used for "estimate_lag" is not changed between the two calls, which results "current_lag_ms" and "prev_lag_ms" the same all the time. So it always falls into "prev_enabled == enable" true and takes no action.
After comparing to previous version, I tried to change the second "ase_idx" to "prev_ase_idx" and tested with the new build. As expected, everything worked well!
Please help check if the conclusion above is correct or not. This is really important to our project.
BTW, the explanation of "max_lag_ms" needs to be corrected as:
max_lag_ms : replicas that have a replication greater
or equalthan max_lag_ms milliseconds are automatically disabled from the cluster until their lag returns below or equal to the configured threshold.proxysql/lib/MySQL_Monitor.cpp
Line 6372 in c41331a
Thanks,
Yuankai
The text was updated successfully, but these errors were encountered: