Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delete routing table gracefully #74

Open
axot opened this issue Sep 1, 2020 · 8 comments
Open

Delete routing table gracefully #74

axot opened this issue Sep 1, 2020 · 8 comments

Comments

@axot
Copy link

axot commented Sep 1, 2020

We observed connection time out in our client-side when scaling in or performing a rolling update of MIGs,

The workflow is,
Client -> ILB -> MIGs instance.

Before ACPI G2 soft off signal received[1], the forward ip(IP of ILB) was already deleted in routing table. This caused these long connections will be a timeout.

[1]
image

Related source code,
https://github.com/GoogleCloudPlatform/guest-agent/blob/master/google_guest_agent/addresses.go#L407

Is there any way to perform this more graceful? Any advice will very helpful.

@axot
Copy link
Author

axot commented Sep 3, 2020

Hi, @gaohannk @hopkiw, could you take a look for this case?

@hopkiw
Copy link
Contributor

hopkiw commented Sep 3, 2020

what does 'more graceful' mean? i don't follow your request at all

@axot
Copy link
Author

axot commented Sep 4, 2020

Hi @hopkiw, thank you for follow up this issue,

Because of forwarded ip in routing table will be deleted before shutdown-script,
this impact these active connections, we designed to send FIN or RST in our shutdown-script logic,
but these packets could not be arrived to client-side.

My opinion is, if google want to delete forwarded ip in routing table before shutdown-script,
it should send RST packet to all active connections, or let customer to handler these connections
in shutdown-script, does it make sense to you?

Thanks.

@hopkiw
Copy link
Contributor

hopkiw commented Sep 8, 2020

We don't delete routes unless they've been removed from the metadata server. Does pressing the 'power button' cause routes to be deleted from the metadata server? I will attempt to reproduce this.

The guest agent does not have any way to issue RST to destination of existing socket connections owned by other processes on the system.

@axot
Copy link
Author

axot commented Sep 9, 2020

We found this behavior will happen with the following setups.

  1. Create MIGs.
  2. Add a L4 TCP ILB in front of MIGs with connection draining enabled. eg: 60 seconds timeout.
  3. Establish long tcp connections from client to MIGs instances via ILB.
  4. Perform scale down on MIGs, eg: 3 to 2 instances.

You will see forwarded IP(ILB's IP) will be deleted before 'power button' event.

If you test press the 'power button' directly on an instance this issue will not happen.

@axot
Copy link
Author

axot commented Sep 23, 2020

Is there any workaround to improve this behavior?
How about disable network_enabled once the agent has been completed to configure the network?

@hopkiw
Copy link
Contributor

hopkiw commented Sep 24, 2020

MIG scale down sends the usual instances.terminate signal; the guest agent shuts down first, prior to running shutdown scripts. so i ran a shutdown script that shows the routes still existed after the shutdown of the guest agent.

@axot
Copy link
Author

axot commented Sep 24, 2020

Thank you for the investigation, one thing I want to confirm is did you setup a L4 ILB in front of your MIG instance with connection draining enabled?

patelne pushed a commit to patelne/guest-agent that referenced this issue Feb 17, 2022
* new package build workflow

* change msg to request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants