Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deleted node is presented in resetup_status #47

Closed
nadzejam opened this issue Nov 9, 2023 · 8 comments · Fixed by #70
Closed

Deleted node is presented in resetup_status #47

nadzejam opened this issue Nov 9, 2023 · 8 comments · Fixed by #70

Comments

@nadzejam
Copy link

nadzejam commented Nov 9, 2023

I cteated a MySQL cluster with 1 Master (stand-mysql-1) and 2 Riplicas (stand-mysql-2, stand-mysql-3).
And I used MySync to automate cluster configuration.

I've tried to delete 1 Replica Host by
mysync host remove stand-mysql-2
After that I've destroyed stand-mysql-2 at all (it was vm).

As result I have such output for my cluster configuration:

root@stand-mysql-1:~# mysync info
active_nodes:
- stand-mysql-1
- stand-mysql-3
cascade_nodes: null
ha_nodes:
  stand-mysql-1:
    priority: 0
  stand-mysql-3:
    priority: 0
health:
  stand-mysql-1:
    check_at: "2023-11-09T08:28:55.529207845Z"
    check_by: stand-mysql-1
    daemon_state:
      crash_recovery: false
      recovery_time: "0001-01-01T00:00:01Z"
      start_time: "2023-11-09T08:02:13Z"
    disk_state:
      Total: 1.03944192e+10
      Used: 2.177769472e+09
    error: ""
    is_cascade: false
    is_file_system_readonly: false
    is_master: true
    is_offline: false
    is_readonly: false
    is_super_readonly: false
    master_state:
      executed_gtid_set: 29f6babb-7ed6-11ee-8f7e-fa163e84917e:1-17
    ping_dubious: false
    ping_ok: true
    semi_sync_state:
      master_enabled: true
      slave_enabled: false
      wait_slave_count: 1
    slave_state: null
  stand-mysql-3:
    check_at: "2023-11-09T08:28:54.558386224Z"
    check_by: stand-mysql-3
    daemon_state:
      crash_recovery: false
      recovery_time: "0001-01-01T00:00:01Z"
      start_time: "2023-11-09T08:15:57Z"
    disk_state:
      Total: 1.03944192e+10
      Used: 2.173104128e+09
    error: ""
    is_cascade: false
    is_file_system_readonly: false
    is_master: false
    is_offline: false
    is_readonly: true
    is_super_readonly: true
    master_state: null
    ping_dubious: false
    ping_ok: true
    semi_sync_state:
      master_enabled: false
      slave_enabled: true
      wait_slave_count: 1
    slave_state:
      executed_gtid_set: 29f6babb-7ed6-11ee-8f7e-fa163e84917e:1-17
      last_io_errno: 0
      last_sql_errno: 0
      master_host: stand-mysql-1
      master_log_file: mysql-bin-log.000005
      master_log_pos: 809
      replication_lag: 0
      replication_state: running
      retrieved_gtid_get: ""
last_shutdown_node_time: "2023-11-09T08:02:25.571185739Z"
low_space: false
manager:
  hostname: stand-mysql-1
  pid: 7026
master: stand-mysql-1
resetup_status:
  stand-mysql-1:
    Status: false
    UpdateTime: "2023-11-09T08:28:55.53501369Z"
  stand-mysql-2:
    Status: false
    UpdateTime: "2023-11-09T08:27:30.668844835Z"
  stand-mysql-3:
    Status: false
    UpdateTime: "2023-11-09T08:28:54.564428936Z"
root@stand-mysql-1:~#

As you see, I have "stand-mysql-2" in "resetup_status" section.
Of cource, there is the same situation on the zookeeper cluster:

[zk: localhost:2181(CONNECTED) 0] ls /mysql/cluster_id_1109/resetup_status
[stand-mysql-1, stand-mysql-2, stand-mysql-3]
[zk: localhost:2181(CONNECTED) 2] get /mysql/cluster_id_1109/resetup_status/stand-mysql-2
{"UpdateTime":"2023-11-09T08:27:30.668844835Z","Status":false}

I can't understand is it an error? Are there some reasons to have the record for the deleted host in resetup_status?

@teem0n
Copy link
Contributor

teem0n commented Nov 9, 2023

Yes, looks like a bug.
There is no reason to keep removed host in resetup section

@FactorT
Copy link

FactorT commented Feb 26, 2024

@teem0n @secwall Unfortunately, the problem is still exist even after PR#70
There are all three servers in zookeeper:

[zk: 192.168.1.131:2181(CONNECTED) 10] ls /mysql/cluster_id_2fb4f574-2ce8-59fb-a8c7-0dfbe20793e3/resetup_status
[mysql-1, mysql-2, mysql-3]

[zk: 192.168.1.131:2181(CONNECTED) 11] get /mysql/cluster_id_2fb4f574-2ce8-59fb-a8c7-0dfbe20793e3/rese
tup_status/mysql-3
{"UpdateTime":"2024-02-26T12:09:03.782164957+03:00","Status":false}

after mysync host remove mysql-3 command

@secwall
Copy link
Contributor

secwall commented Feb 26, 2024

I'm not able to reproduce it on current master branch (05e1a51). Are you sure that you have an actual version?

@FactorT
Copy link

FactorT commented Feb 27, 2024

@secwall Yes I have the actual version. And I know where is the problem.
For example we have cluster with three mysql servers with three mysync on each: mysql-1, mysql-2, mysql-3
First of all. app.dcs.Delete(dcs.JoinPath(pathResetupStatus, host)) works fine. It deletes a server at resetup_status path in zookeeper.
We stopped mysql on mysql-3 server systemctl stop mysql. After that we run command root@mysql-1:~# mysync host remove mysql-3 on mysql-1 server. It deletes mysql-3 from resetup_status in zookeeper.
It's good.
But another mysync on mysql-3 server restores this path.
In main application loop func (app *App) Run() you have a call to recoveryChecker
This recoveryChecker have a call to SetResetupStatus.
This SetResetupStatus have a call to app.setResetupStatus
And app.setResetupStatus make restore pathResetupStatus

Maybe setResetupStatus have to check if node not exist in pathHANodes before restore pathResetupStatus?

@mialinx mialinx reopened this Feb 27, 2024
@mialinx
Copy link
Contributor

mialinx commented Feb 27, 2024

@teem0n
Looks like we should check if current host is in the cluster before setting anything in DCS (health status, resetup flag, etc...)

@teem0n
Copy link
Contributor

teem0n commented Feb 27, 2024

Thanks for detailed report!
We will check this and fix

@teem0n
Copy link
Contributor

teem0n commented Feb 27, 2024

@FactorT when you stop mysql service on host, you should also stop mysync before removing the host
at the moment, local mysync doesn't know about its host has been removed

@FactorT
Copy link

FactorT commented Feb 28, 2024

@teem0n
Yes I tried this workaround and it worked. If it is correct way to "turnoff mysql - turnoff mysync - remove host" and there is no some logical mistake in mysync's logic, so i have no questions.
Thank you!

@teem0n teem0n closed this as completed Feb 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants