Deleted node is presented in resetup_status #47

nadzejam · 2023-11-09T11:59:00Z

I cteated a MySQL cluster with 1 Master (stand-mysql-1) and 2 Riplicas (stand-mysql-2, stand-mysql-3).
And I used MySync to automate cluster configuration.

I've tried to delete 1 Replica Host by
mysync host remove stand-mysql-2
After that I've destroyed stand-mysql-2 at all (it was vm).

As result I have such output for my cluster configuration:

root@stand-mysql-1:~# mysync info
active_nodes:
- stand-mysql-1
- stand-mysql-3
cascade_nodes: null
ha_nodes:
  stand-mysql-1:
    priority: 0
  stand-mysql-3:
    priority: 0
health:
  stand-mysql-1:
    check_at: "2023-11-09T08:28:55.529207845Z"
    check_by: stand-mysql-1
    daemon_state:
      crash_recovery: false
      recovery_time: "0001-01-01T00:00:01Z"
      start_time: "2023-11-09T08:02:13Z"
    disk_state:
      Total: 1.03944192e+10
      Used: 2.177769472e+09
    error: ""
    is_cascade: false
    is_file_system_readonly: false
    is_master: true
    is_offline: false
    is_readonly: false
    is_super_readonly: false
    master_state:
      executed_gtid_set: 29f6babb-7ed6-11ee-8f7e-fa163e84917e:1-17
    ping_dubious: false
    ping_ok: true
    semi_sync_state:
      master_enabled: true
      slave_enabled: false
      wait_slave_count: 1
    slave_state: null
  stand-mysql-3:
    check_at: "2023-11-09T08:28:54.558386224Z"
    check_by: stand-mysql-3
    daemon_state:
      crash_recovery: false
      recovery_time: "0001-01-01T00:00:01Z"
      start_time: "2023-11-09T08:15:57Z"
    disk_state:
      Total: 1.03944192e+10
      Used: 2.173104128e+09
    error: ""
    is_cascade: false
    is_file_system_readonly: false
    is_master: false
    is_offline: false
    is_readonly: true
    is_super_readonly: true
    master_state: null
    ping_dubious: false
    ping_ok: true
    semi_sync_state:
      master_enabled: false
      slave_enabled: true
      wait_slave_count: 1
    slave_state:
      executed_gtid_set: 29f6babb-7ed6-11ee-8f7e-fa163e84917e:1-17
      last_io_errno: 0
      last_sql_errno: 0
      master_host: stand-mysql-1
      master_log_file: mysql-bin-log.000005
      master_log_pos: 809
      replication_lag: 0
      replication_state: running
      retrieved_gtid_get: ""
last_shutdown_node_time: "2023-11-09T08:02:25.571185739Z"
low_space: false
manager:
  hostname: stand-mysql-1
  pid: 7026
master: stand-mysql-1
resetup_status:
  stand-mysql-1:
    Status: false
    UpdateTime: "2023-11-09T08:28:55.53501369Z"
  stand-mysql-2:
    Status: false
    UpdateTime: "2023-11-09T08:27:30.668844835Z"
  stand-mysql-3:
    Status: false
    UpdateTime: "2023-11-09T08:28:54.564428936Z"
root@stand-mysql-1:~#

As you see, I have "stand-mysql-2" in "resetup_status" section.
Of cource, there is the same situation on the zookeeper cluster:

[zk: localhost:2181(CONNECTED) 0] ls /mysql/cluster_id_1109/resetup_status
[stand-mysql-1, stand-mysql-2, stand-mysql-3]
[zk: localhost:2181(CONNECTED) 2] get /mysql/cluster_id_1109/resetup_status/stand-mysql-2
{"UpdateTime":"2023-11-09T08:27:30.668844835Z","Status":false}

I can't understand is it an error? Are there some reasons to have the record for the deleted host in resetup_status?

The text was updated successfully, but these errors were encountered:

teem0n · 2023-11-09T12:03:40Z

Yes, looks like a bug.
There is no reason to keep removed host in resetup section

FactorT · 2024-02-26T09:10:13Z

@teem0n @secwall Unfortunately, the problem is still exist even after PR#70
There are all three servers in zookeeper:

[zk: 192.168.1.131:2181(CONNECTED) 10] ls /mysql/cluster_id_2fb4f574-2ce8-59fb-a8c7-0dfbe20793e3/resetup_status
[mysql-1, mysql-2, mysql-3]

[zk: 192.168.1.131:2181(CONNECTED) 11] get /mysql/cluster_id_2fb4f574-2ce8-59fb-a8c7-0dfbe20793e3/rese
tup_status/mysql-3
{"UpdateTime":"2024-02-26T12:09:03.782164957+03:00","Status":false}

after mysync host remove mysql-3 command

secwall · 2024-02-26T16:46:54Z

I'm not able to reproduce it on current master branch (05e1a51). Are you sure that you have an actual version?

FactorT · 2024-02-27T08:21:50Z

@secwall Yes I have the actual version. And I know where is the problem.
For example we have cluster with three mysql servers with three mysync on each: mysql-1, mysql-2, mysql-3
First of all. app.dcs.Delete(dcs.JoinPath(pathResetupStatus, host)) works fine. It deletes a server at resetup_status path in zookeeper.
We stopped mysql on mysql-3 server systemctl stop mysql. After that we run command root@mysql-1:~# mysync host remove mysql-3 on mysql-1 server. It deletes mysql-3 from resetup_status in zookeeper.
It's good.
But another mysync on mysql-3 server restores this path.
In main application loop func (app *App) Run() you have a call to recoveryChecker
This recoveryChecker have a call to SetResetupStatus.
This SetResetupStatus have a call to app.setResetupStatus
And app.setResetupStatus make restore pathResetupStatus

Maybe setResetupStatus have to check if node not exist in pathHANodes before restore pathResetupStatus?

mialinx · 2024-02-27T09:23:12Z

@teem0n
Looks like we should check if current host is in the cluster before setting anything in DCS (health status, resetup flag, etc...)

teem0n · 2024-02-27T09:23:27Z

Thanks for detailed report!
We will check this and fix

teem0n · 2024-02-27T13:36:29Z

@FactorT when you stop mysql service on host, you should also stop mysync before removing the host
at the moment, local mysync doesn't know about its host has been removed

FactorT · 2024-02-28T12:28:01Z

@teem0n
Yes I tried this workaround and it worked. If it is correct way to "turnoff mysql - turnoff mysync - remove host" and there is no some logical mistake in mysync's logic, so i have no questions.
Thank you!

secwall mentioned this issue Jan 22, 2024

Drop resetup status from dcs on host delete #70

Merged

teem0n closed this as completed in #70 Jan 22, 2024

mialinx reopened this Feb 27, 2024

teem0n closed this as completed Feb 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deleted node is presented in resetup_status #47

Deleted node is presented in resetup_status #47

nadzejam commented Nov 9, 2023

teem0n commented Nov 9, 2023

FactorT commented Feb 26, 2024

secwall commented Feb 26, 2024

FactorT commented Feb 27, 2024 •

edited

Loading

mialinx commented Feb 27, 2024

teem0n commented Feb 27, 2024

teem0n commented Feb 27, 2024 •

edited

Loading

FactorT commented Feb 28, 2024

Deleted node is presented in resetup_status #47

Deleted node is presented in resetup_status #47

Comments

nadzejam commented Nov 9, 2023

teem0n commented Nov 9, 2023

FactorT commented Feb 26, 2024

secwall commented Feb 26, 2024

FactorT commented Feb 27, 2024 • edited Loading

mialinx commented Feb 27, 2024

teem0n commented Feb 27, 2024

teem0n commented Feb 27, 2024 • edited Loading

FactorT commented Feb 28, 2024

FactorT commented Feb 27, 2024 •

edited

Loading

teem0n commented Feb 27, 2024 •

edited

Loading