You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
OS: Ubuntu 20.04 LTS
Kernel: 5.4.0-1037-gcp #40-Ubuntu SMP Fri Feb 5 11:57:53 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
SystemD version: 245.4-4ubuntu3.5
Google Guess Agent version: 20201217.02-0ubuntu1~20.04.0
Problem
With the release of a security upgrade by Ubuntu on package systemd, the SystemD service systemd-networkd is restarted. This can make a GCP instance impaired for serving traffic.
When the systemd-networkd.service unit is restarted, the operating system local routing table is wiped. This cause the local host routes for Google Cloud regional TCP Load Balancers to disappear and produce the following behavior:
The health checks, originated from the TCP LB service IP, start failing because the node does not have a host route for it
With all instances in a failed state, the TCP LB enters into an always-open state. The traffic directed to the TCP LB service IP is being dropped by the instances (never answer to the TCP SYN packet) because of the lack of the host route.
The triage for this issue is restarting the google-guest-agent.service SystemD unit, so host routes are added back and both health checks and traffic start working again.
Reproduction steps
Create a TCP regional LB in a given region (does not matter if the public IP is static or ephemeral)
Configure a GCP instance in the same region as a backend instance. Configure a basic TCP health check on a TCP port that is wide open
Configure a frontend listener on port 80 using an ephemeral IP
Wait for it to be created
SSH to the instance and verify that TCP LB ephemeral IP is listed as host route in the output of ip ro list table local
Restart systemd-networkd using systemd restart systemd-networkd
Check the local route table again and verify the route is no longer there.
At this point, the route won't be re-added. You need to restart the google-guest-agent.service SystemD unit to the routes to be re-added.
There is an item in the PartOf for networking.service, but this systemd unit is managed by ifupdown package. In this specific user case, SystemD is also network managed and we'll need to consider it like that in the google-guest-agent.service configuration.
The text was updated successfully, but these errors were encountered:
ricbartm
changed the title
Restart Agent when SystemD Networking service is restarted
Restart Agent when SystemD Network unit is restarted
Mar 26, 2021
Environment
OS: Ubuntu 20.04 LTS
Kernel: 5.4.0-1037-gcp #40-Ubuntu SMP Fri Feb 5 11:57:53 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
SystemD version: 245.4-4ubuntu3.5
Google Guess Agent version: 20201217.02-0ubuntu1~20.04.0
Problem
With the release of a security upgrade by Ubuntu on package systemd, the SystemD service
systemd-networkd
is restarted. This can make a GCP instance impaired for serving traffic.When the
systemd-networkd.service
unit is restarted, the operating system local routing table is wiped. This cause the local host routes for Google Cloud regional TCP Load Balancers to disappear and produce the following behavior:The triage for this issue is restarting the
google-guest-agent.service
SystemD unit, so host routes are added back and both health checks and traffic start working again.Reproduction steps
ip ro list table local
systemd-networkd
usingsystemd restart systemd-networkd
At this point, the route won't be re-added. You need to restart the
google-guest-agent.service
SystemD unit to the routes to be re-added.Solution
The
systemd-networkd.service
unit is not listed as part of thePartOf
directive in the Google Guest Agent service unit configuration. See https://github.com/GoogleCloudPlatform/guest-agent/blob/master/google-guest-agent.service#L7There is an item in the
PartOf
fornetworking.service
, but this systemd unit is managed byifupdown
package. In this specific user case, SystemD is also network managed and we'll need to consider it like that in thegoogle-guest-agent.service
configuration.The text was updated successfully, but these errors were encountered: