Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nopreempt doesn't seem to work for keepalived 1.4.5 #20

Open
saqib-ahmed opened this issue Jun 6, 2018 · 5 comments
Open

nopreempt doesn't seem to work for keepalived 1.4.5 #20

saqib-ahmed opened this issue Jun 6, 2018 · 5 comments

Comments

@saqib-ahmed
Copy link
Contributor

Problem

nopreempt works great when docker service stops/restarts; when my network interface goes down; and when I restart the keepalived container, but when I restart the machine with 51 priority, it takes back the control from the other node(it preempts). Following the discussion here, I added a 60s delay before startup of the keepalived service inside my container (in process.sh) but it still preempts the node with lower priority after a minute. What could possibly be wrong here? Obviously it isn't the network because it doesn't take that long to initialize. This is a clone of this issue.

Configuration

My configuration file looks like below:

global_defs {
  default_interface enp0s3
}

vrrp_script chk_dockerd {
        script "pidof dockerd"      # verify the pid exists
        interval 2                  # check every 2 seconds
}

vrrp_instance VI_1 {
  interface enp0s3

  state BACKUP
  virtual_router_id 51
  priority 51                   # Second node has 50 priority, everything else is same
  nopreempt
  #advert_int 1

  unicast_peer {
    192.168.1.141
  }

  virtual_ipaddress {
    192.168.1.238
  }

  authentication {
    auth_type PASS
    auth_pass d0cker
  }

  track_script {
          chk_dockerd
  }

  notify "/container/service/keepalived/assets/notify.sh"
}

Logs

I also tried to manually start the container after some time upon reboot and it still preempts the lower priority node. I'm getting following logs after rebooting higher priority node:

Mon Jun  4 21:16:14 2018: VRRP_Instance(VI_1) Transition to MASTER STATE
Mon Jun  4 21:16:15 2018: VRRP_Instance(VI_1) Entering MASTER STATE
Mon Jun  4 21:16:15 2018: VRRP_Instance(VI_1) setting protocol VIPs.
Mon Jun  4 21:16:15 2018: Sending gratuitous ARP on enp0s3 for 192.168.8.235
Mon Jun  4 21:16:15 2018: VRRP_Instance(VI_1) Sending/queueing gratuitous ARPs on enp0s3 for 192.168.8.235
Mon Jun  4 21:16:15 2018: Sending gratuitous ARP on enp0s3 for 192.168.8.235
Mon Jun  4 21:16:15 2018: Sending gratuitous ARP on enp0s3 for 192.168.8.235
Mon Jun  4 21:16:15 2018: Sending gratuitous ARP on enp0s3 for 192.168.8.235
Mon Jun  4 21:16:15 2018: Sending gratuitous ARP on enp0s3 for 192.168.8.235
Mon Jun  4 21:16:15 2018: Opening script file /container/service/keepalived/assets/notify.sh
Mon Jun  4 21:16:20 2018: Sending gratuitous ARP on enp0s3 for 192.168.8.235
Mon Jun  4 21:16:20 2018: VRRP_Instance(VI_1) Sending/queueing gratuitous ARPs on enp0s3 for 192.168.8.235
Mon Jun  4 21:16:20 2018: Sending gratuitous ARP on enp0s3 for 192.168.8.235
Mon Jun  4 21:16:20 2018: Sending gratuitous ARP on enp0s3 for 192.168.8.235
Mon Jun  4 21:16:20 2018: Sending gratuitous ARP on enp0s3 for 192.168.8.235
Mon Jun  4 21:16:20 2018: Sending gratuitous ARP on enp0s3 for 192.168.8.235

#########################
Preemption after a minute
#########################

Mon Jun  4 21:17:58 2018: VRRP_Instance(VI_1) Master received advert with higher priority 51, ours 50
Mon Jun  4 21:17:58 2018: VRRP_Instance(VI_1) Entering BACKUP STATE
Mon Jun  4 21:17:58 2018: VRRP_Instance(VI_1) removing protocol VIPs.
Mon Jun  4 21:17:58 2018: Opening script file /container/service/keepalived/assets/notify.sh

tcpdump

I got the tcpdump at the reboot time of the higher priority node. Machine 1.89 has 51 priority and 1.141 has 50 priority (on which I'm dumping) with the above-mentioned configuration.

13:13:13.474199 IP (tos 0xc0, ttl 255, id 1171, offset 0, flags [none], proto VRRP (112), length 40)
    192.168.1.89 > 192.168.1.141: vrrp 192.168.1.89 > 192.168.1.141: VRRPv2, Advertisement, vrid 51, prio 51, authtype simple, intvl 1s, length 20, addrs: 192.168.1.238 auth "d0cker^@^@"
	0x0000:  45c0 0028 0493 0000 ff70 31dc c0a8 0159  E..(.....p1....Y
	0x0010:  c0a8 018d 2133 3301 0101 bb25 c0a8 01ee  ....!33....%....
	0x0020:  6430 636b 6572 0000 0000 0000 0000       d0cker........
13:13:14.753181 IP (tos 0xc0, ttl 255, id 1172, offset 0, flags [none], proto VRRP (112), length 40)
    192.168.1.89 > 192.168.1.141: vrrp 192.168.1.89 > 192.168.1.141: VRRPv2, Advertisement, vrid 51, prio 0, authtype simple, intvl 1s, length 20, addrs: 192.168.1.238 auth "d0cker^@^@"
	0x0000:  45c0 0028 0494 0000 ff70 31db c0a8 0159  E..(.....p1....Y
	0x0010:  c0a8 018d 2133 0001 0101 ee25 c0a8 01ee  ....!3.....%....
	0x0020:  6430 636b 6572 0000 0000 0000 0000       d0cker........
13:13:15.489759 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 192.168.1.141 tell 192.168.1.89, length 46
	0x0000:  0001 0800 0604 0001 0050 56a7 bf4b c0a8  .........PV..K..
	0x0010:  0159 0000 0000 0000 c0a8 018d 0000 0000  .Y..............
	0x0020:  0000 0000 0000 0000 0000 0000 0000       ..............

###############
1.89 shuts down
###############

13:13:15.805985 IP (tos 0xc0, ttl 64, id 54050, offset 0, flags [none], proto ICMP (1), length 68)
    192.168.1.89 > 192.168.1.141: ICMP 192.168.1.89 protocol 112 unreachable, length 48
	IP (tos 0xc0, ttl 255, id 370, offset 0, flags [none], proto VRRP (112), length 40)
    192.168.1.141 > 192.168.1.89: vrrp 192.168.1.141 > 192.168.1.89: VRRPv2, Advertisement, vrid 51, prio 50, authtype simple, intvl 1s, length 20, addrs: 192.168.1.238 auth "d0cker^@^@"
	0x0000:  45c0 0044 d322 0000 4001 22a0 c0a8 0159  E..D."..@."....Y
	0x0010:  c0a8 018d 0302 fcfd 0000 0000 45c0 0028  ............E..(
	0x0020:  0172 0000 ff70 34fd c0a8 018d c0a8 0159  .r...p4........Y
	0x0030:  2133 3201 0101 bc25 c0a8 01ee 6430 636b  !32....%....d0ck
	0x0040:  6572 0000                                er..

############
1.89 reboots
############

13:13:37.026592 ARP, Ethernet (len 6), IPv4 (len 4), Reply 192.168.1.89 is-at 00:50:56:a7:bf:4b (oui Unknown), length 46
	0x0000:  0001 0800 0604 0002 0050 56a7 bf4b c0a8  .........PV..K..
	0x0010:  0159 0050 56a7 bf4b c0a8 0159 0000 0000  .Y.PV..K...Y....
	0x0020:  0000 0000 0000 0000 0000 0000 0000       ..............
13:13:37.087711 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 192.168.1.92 tell 192.168.1.89, length 46
	0x0000:  0001 0800 0604 0001 0050 56a7 bf4b c0a8  .........PV..K..
	0x0010:  0159 0000 0000 0000 c0a8 015c 0000 0000  .Y.........\....
	0x0020:  0000 0000 0000 0000 0000 0000 0000       ..............
13:13:37.814611 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 192.168.1.141 tell 192.168.1.89, length 46
	0x0000:  0001 0800 0604 0001 0050 56a7 bf4b c0a8  .........PV..K..
	0x0010:  0159 0000 0000 0000 c0a8 018d 0000 0000  .Y..............
	0x0020:  0000 0000 0000 0000 0000 0000 0000       ..............
13:13:37.814993 IP (tos 0xc0, ttl 64, id 50728, offset 0, flags [none], proto ICMP (1), length 68)
    192.168.1.89 > 192.168.1.141: ICMP host 192.168.1.89 unreachable - admin prohibited, length 48
	IP (tos 0xc0, ttl 255, id 392, offset 0, flags [none], proto VRRP (112), length 40)
    192.168.1.141 > 192.168.1.89: vrrp 192.168.1.141 > 192.168.1.89: VRRPv2, Advertisement, vrid 51, prio 50, authtype simple, intvl 1s, length 20, addrs: 192.168.1.238 auth "d0cker^@^@"
	0x0000:  45c0 0044 c628 0000 4001 2f9a c0a8 0159  E..D.(..@./....Y
	0x0010:  c0a8 018d 030a fcf5 0000 0000 45c0 0028  ............E..(
	0x0020:  0188 0000 ff70 34e7 c0a8 018d c0a8 0159  .....p4........Y
	0x0030:  2133 3201 0101 bc25 c0a8 01ee 6430 636b  !32....%....d0ck
	0x0040:  6572 0000                                er..

...
...
...


13:18:50.916774 IP (tos 0xc0, ttl 64, id 6988, offset 0, flags [none], proto ICMP (1), length 68)
    192.168.1.89 > 192.168.1.141: ICMP host 192.168.1.89 unreachable - admin prohibited, length 48
	IP (tos 0xc0, ttl 255, id 705, offset 0, flags [none], proto VRRP (112), length 40)
    192.168.1.141 > 192.168.1.89: vrrp 192.168.1.141 > 192.168.1.89: VRRPv2, Advertisement, vrid 51, prio 50, authtype simple, intvl 1s, length 20, addrs: 192.168.1.238 auth "d0cker^@^@"
	0x0000:  45c0 0044 1b4c 0000 4001 da76 c0a8 0159  [email protected]
	0x0010:  c0a8 018d 030a fcf5 0000 0000 45c0 0028  ............E..(
	0x0020:  02c1 0000 ff70 33ae c0a8 018d c0a8 0159  .....p3........Y
	0x0030:  2133 3201 0101 bc25 c0a8 01ee 6430 636b  !32....%....d0ck
	0x0040:  6572 0000                                er..
13:18:51.917283 IP (tos 0xc0, ttl 64, id 7758, offset 0, flags [none], proto ICMP (1), length 68)
    192.168.1.89 > 192.168.1.141: ICMP host 192.168.1.89 unreachable - admin prohibited, length 48
	IP (tos 0xc0, ttl 255, id 706, offset 0, flags [none], proto VRRP (112), length 40)
    192.168.1.141 > 192.168.1.89: vrrp 192.168.1.141 > 192.168.1.89: VRRPv2, Advertisement, vrid 51, prio 50, authtype simple, intvl 1s, length 20, addrs: 192.168.1.238 auth "d0cker^@^@"
	0x0000:  45c0 0044 1e4e 0000 4001 d774 c0a8 0159  [email protected]
	0x0010:  c0a8 018d 030a fcf5 0000 0000 45c0 0028  ............E..(
	0x0020:  02c2 0000 ff70 33ad c0a8 018d c0a8 0159  .....p3........Y
	0x0030:  2133 3201 0101 bc25 c0a8 01ee 6430 636b  !32....%....d0ck
	0x0040:  6572 0000                                er..

###################################
Keepalived Starts and 1.89 preempts
###################################

13:18:52.633415 IP (tos 0xc0, ttl 255, id 1, offset 0, flags [none], proto VRRP (112), length 40)
    192.168.1.89 > 192.168.1.141: vrrp 192.168.1.89 > 192.168.1.141: VRRPv2, Advertisement, vrid 51, prio 51, authtype simple, intvl 1s, length 20, addrs: 192.168.1.238 auth "d0cker^@^@"
	0x0000:  45c0 0028 0001 0000 ff70 366e c0a8 0159  E..(.....p6n...Y
	0x0010:  c0a8 018d 2133 3301 0101 bb25 c0a8 01ee  ....!33....%....
	0x0020:  6430 636b 6572 0000 0000 0000 0000       d0cker........
13:18:53.635410 IP (tos 0xc0, ttl 255, id 2, offset 0, flags [none], proto VRRP (112), length 40)
    192.168.1.89 > 192.168.1.141: vrrp 192.168.1.89 > 192.168.1.141: VRRPv2, Advertisement, vrid 51, prio 51, authtype simple, intvl 1s, length 20, addrs: 192.168.1.238 auth "d0cker^@^@"
	0x0000:  45c0 0028 0002 0000 ff70 366d c0a8 0159  E..(.....p6m...Y
	0x0010:  c0a8 018d 2133 3301 0101 bb25 c0a8 01ee  ....!33....%....
	0x0020:  6430 636b 6572 0000 0000 0000 0000       d0cker........
13:18:54.635747 IP (tos 0xc0, ttl 255, id 3, offset 0, flags [none], proto VRRP (112), length 40)
    192.168.1.89 > 192.168.1.141: vrrp 192.168.1.89 > 192.168.1.141: VRRPv2, Advertisement, vrid 51, prio 51, authtype simple, intvl 1s, length 20, addrs: 192.168.1.238 auth "d0cker^@^@"
	0x0000:  45c0 0028 0003 0000 ff70 366c c0a8 0159  E..(.....p6l...Y
	0x0010:  c0a8 018d 2133 3301 0101 bb25 c0a8 01ee  ....!33....%....
	0x0020:  6430 636b 6572 0000 0000 0000 0000       d0cker........
13:18:55.636675 IP (tos 0xc0, ttl 255, id 4, offset 0, flags [none], proto VRRP (112), length 40)
    192.168.1.89 > 192.168.1.141: vrrp 192.168.1.89 > 192.168.1.141: VRRPv2, Advertisement, vrid 51, prio 51, authtype simple, intvl 1s, length 20, addrs: 192.168.1.238 auth "d0cker^@^@"
	0x0000:  45c0 0028 0004 0000 ff70 366b c0a8 0159  E..(.....p6k...Y
	0x0010:  c0a8 018d 2133 3301 0101 bb25 c0a8 01ee  ....!33....%....
	0x0020:  6430 636b 6572 0000 0000 0000 0000       d0cker........

In this dump, the machine with priority 51 (1.89) goes down at 13:13:15 and comes alive again at 13:13:37. Keepalived is started after 5 minutes delay and the preemption occurs. You can see the preemption happening at 13:18:52. Let me know if any further information is required to point out the issue.

@BertrandGouny
Copy link
Member

Hello,
i noticed this several month ago too. I think this started far before 1.4.5.

This would be nice to open a proper issue on keepalived repo too.

@saqib-ahmed
Copy link
Contributor Author

saqib-ahmed commented Jun 7, 2018

@BertrandGouny
I did ask it in keepalived repo: acassen/keepalived#512
The collaborator there said that it is a docker related issue and not related to keepalived. This is the response I got:

Your configuration appears to be using the default advert interval of 1 second. This means that when keepalived starts up, any VRRP instances (except those configured with priority 255) will wait approximately 3.8 seconds before transitioning to master (3 * advert_int + advert_int * (255 - priority)/255) (the 255 might be 256, I can't remember). Since 1.89 transitions to master at 13:18:52.6, it must have been in backup mode since 13:18:48.8, but from the tcpdump output we can see that during that time 1.89 is rejecting the vrrp adverts with ICMP messages unreachable/adminstratively down. keepalived on 1.89 therefore isn't receiving the adverts from 1.141 which is why 1.89 transitions to master. You should be able to see from the logs the time when 1.89 reports becoming backup, which I think in this case will have been either at 13:18:48 or 13:18:47.

This problem relates somehow to your local setup, and possibly the way the containers are handling the networking. It is not a keepalived issue.

You can continue discussion over there.

@BertrandGouny
Copy link
Member

Thanks,
sorry i read a bit too fast your first message.

Does this also occurs if the container is run --privileged ?

This may be related to a larger problem i'm also facing with keepalived 2.x, it can't find network interface in container :s this is also reported in 1.4.5 when keepalived start but vrrp managed to use the interface after a short period of time.

Not sure what is happening :/

@Peter-YAN
Copy link

Peter-YAN commented Mar 2, 2020

keepalived 1.3.5 also has this problem.
after reboot system, the VIP is floating according to ip sequence.

**!server1, 192.168.1.222 config_**_

global_defs {
router_id LVS_RABBITMQ_PROD1
enable_script_security
}

vrrp_script chk_myscript {
script "/usr/bin/pgrep sshd" ! "</dev/tcp/127.0.0.1/5672"
interval 1
fall 2
rise 2
user root
}

vrrp_instance VI_1 {
state BACKUP
interface ens192
virtual_router_id 51
priority 66
nopreempt
advert_int 1
authentication {
auth_type PASS
auth_pass 123
}
unicast_src_ip 192.168.1.222
unicast_peer {
192.168.1.223
}

virtual_ipaddress {
   192.168.1.224/24
}

track_script {
   chk_myscript
}

}

**!server2, 192.168.1.223 config_**_

global_defs {
router_id LVS_RABBITMQ_PROD2
enable_script_security
}

vrrp_script chk_myscript {
script "/usr/bin/pgrep sshd" ! "</dev/tcp/127.0.0.1/5672"
interval 1
fall 2
rise 2
user root
}

vrrp_instance VI_1 {
state BACKUP
interface ens192
virtual_router_id 51
priority 66
nopreempt
advert_int 1
authentication {
auth_type PASS
auth_pass 123
}
unicast_src_ip 192.168.1.223
unicast_peer {
192.168.1.222
}

virtual_ipaddress {
   192.168.1.224/24 
}     
     
track_script {
   chk_myscript
}        

}

systemd config in both server

[Unit]
Description=LVS and VRRP High Availability Monitor
After=syslog.target network-online.target
After=rabbitmq-server.service
Requires=rabbitmq-server.service

reboot server 1:

log in server 1
Mar 2 08:16:43 FID Keepalived_vrrp[2765]: VRRP_Script(chk_myscript) succeeded
Mar 2 08:16:44 FID rabbitmq-server: completed with 4 plugins.
Mar 2 08:16:47 FID Keepalived_vrrp[2765]: VRRP_Instance(VI_1) Transition to MASTER STATE
Mar 2 08:16:47 FID Keepalived_vrrp[2765]: VRRP_Instance(VI_1) Received advert with higher priority 66, ours 66
Mar 2 08:16:47 FID Keepalived_vrrp[2765]: VRRP_Instance(VI_1) Entering BACKUP STATE
log in server 2
Mar 2 08:14:25 FID Keepalived_vrrp[2778]: VRRP_Instance(VI_1) Transition to MASTER STATE
Mar 2 08:14:26 FID Keepalived_vrrp[2778]: VRRP_Instance(VI_1) Entering MASTER STATE
Mar 2 08:14:26 FID Keepalived_vrrp[2778]: VRRP_Instance(VI_1) setting protocol VIPs.
Mar 2 08:14:26 FID Keepalived_vrrp[2778]: Sending gratuitous ARP on ens192 for 192.168.1.224
Mar 2 08:14:26 FID Keepalived_vrrp[2778]: VRRP_Instance(VI_1) Sending/queueing gratuitous ARPs on ens192 for 192.168.1.224
Mar 2 08:14:26 FID Keepalived_vrrp[2778]: Sending gratuitous ARP on ens192 for 192.168.1.224
Mar 2 08:14:26 FID Keepalived_vrrp[2778]: Sending gratuitous ARP on ens192 for 192.168.1.224
Mar 2 08:14:26 FID Keepalived_vrrp[2778]: Sending gratuitous ARP on ens192 for 192.168.1.224
Mar 2 08:14:26 FID Keepalived_vrrp[2778]: Sending gratuitous ARP on ens192 for 192.168.1.224
Mar 2 08:14:31 FID Keepalived_vrrp[2778]: Sending gratuitous ARP on ens192 for 192.168.1.224
Mar 2 08:14:31 FID Keepalived_vrrp[2778]: VRRP_Instance(VI_1) Sending/queueing gratuitous ARPs on ens192 for 192.168.1.224
Mar 2 08:14:31 FID Keepalived_vrrp[2778]: Sending gratuitous ARP on ens192 for 192.168.1.224
Mar 2 08:14:31 FID Keepalived_vrrp[2778]: Sending gratuitous ARP on ens192 for 192.168.1.224
Mar 2 08:14:31 FID Keepalived_vrrp[2778]: Sending gratuitous ARP on ens192 for 192.168.1.224
Mar 2 08:14:31 FID Keepalived_vrrp[2778]: Sending gratuitous ARP on ens192 for 192.168.1.224

reboot server 2:

log in server 1
Mar 2 08:14:25 FID Keepalived_vrrp[2763]: VRRP_Instance(VI_1) Received advert with higher priority 66, ours 66
Mar 2 08:14:25 FID Keepalived_vrrp[2763]: VRRP_Instance(VI_1) Entering BACKUP STATE
Mar 2 08:14:25 FID Keepalived_vrrp[2763]: VRRP_Instance(VI_1) removing protocol VIPs.
log in server 2
Mar 2 08:16:47 FID Keepalived_vrrp[2778]: VRRP_Instance(VI_1) Received advert with lower priority 66, ours 66, forcing new election

@Peter-YAN
Copy link

i have to follow steps to prohibit the VIP floating.
After OS reboot.

  1. "systemctl disable keepalived" to disable keepalived auto start
  2. change prority in conf to a small number, eg. 65
  3. "systemctl start keepalived" to start keepalived after OS reboot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants