-
-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Intermittent DNS name resolution failure, likely IPv6-related #29069
Comments
Hi! Any update on this issue? any new workaround? |
Hello! Same problem here. It would be nice to have a workaround on this asap. |
As documented here, the workaround across our server fleet is a small change to the file
...and remove the "resolve" service. After the change, the line reads:
Note that this is not the final solution but only a workaround. |
We are experiencing this since upgrading to Debian 12 AMIs |
We're affected by this, using Debian 12 AWS EC2 AMIs. As well as agreeing that it is somehow IPv6 related, it also seems to only cause resolution failures for CNAMEs. At least in my testing with A and CNAME lookups, only CNAMEs exhibit the intermittent failures. Our workaround has been to simply purge the systemd-resolved package entirely, which has the effect of sorting out nsswitch on its way out of the door. <edit> DO NOT USE THIS WORKAROUND, IT BREAKS DNS NAME RESOLUTION AFTER A REBOOT, SORRY 🤡
Thanks to @schams-net for the detailed repro details. I shamelessly grabbed the bash fragment [1] to easily reproduce the issue. Using the script to resolve a CNAME produced intermittent failures. Pointing it at an A record gave no failures. Telling netcat to use IPv4 (nc -4) also meant no failures, for A or CNAME lookups. |
@ahmgithubahm You don't need to remove all of systemd-resolved. Just removing libnss-resolve will accomplish what you need without disabling resolv.conf management altogether. Debian 12.6 images, when they're published, will make this the default. |
Thanks @nmeyerhans, that's good to know. I've edited my post now I realise that my workaround leaves you with no working /etc/resolv.conf after a reboot. Oops. 🤡 |
@schams-net @ahmgithubahm @jackarias @levinse @wjsc Be aware that the Debian 12 cloud images just released (version 20240415-1718) no longer enable If somebody who has encountered this issue on Debian 12 would be able to test the Debian |
Thanks @nmeyerhans! Is the problem somehow specific to AWS? I can't reproduce it from a local VM with |
I don't see any reason to think it'd be specific to AWS, but it's possible. As the problem is intermittent, it could be that something about the AWS DNS infrastructure triggers it more frequently, but that's just speculation. |
I can confirm that I'm unable reproduce the issue with the latest AMIs at AWS anymore 🥳 Previously (with "resolve"):
New(er) images (without "resolve"):
(As @nmeyerhans pointed out). I haven't had the chance to test the Debian sid images yet. |
systemd version the issue has been seen with
252
Used distribution
Debian 12
Linux kernel version used
6.1.0-10-cloud-amd64
CPU architectures issue was seen on
x86_64
Component
systemd-resolved
Expected behaviour you didn't see
DNS look-ups with a constant 100% success rate for IPv4 and IPv6 addresses.
Unexpected behaviour you saw
Intermittent DNS name resolution failures with a success rate of approx. 90% (approx. 10 errors out of 100 queries), likely IPv6-related.
Official Debian v12 AMIs on AWS show intermittent DNS name resolution failures when they perform DNS look-ups of an RDS Aurora endpoint. The instances use the Amazon-provided DNS server (VPC.2). While approx. 90 percent of the DNS queries succeed, 10 percent fail (name or service not known).
The issue occurs on Debian v12 but doesn't occur on Debian v11 or Amazon Linux 2 systems (same environment). No error occurs if the "systemd-resolved" service is disabled or bypassed on Debian v12. No error occurs if IPv6 is disabled on the Debian v12 system.
Steps to reproduce the problem
Please find a detailed problem description including analysis, steps how to reproduce the problem, and a Terraform plan to provision a test infrastructure in the following Git repository: https://github.com/schams-net/aws-ec2-rds-dns-lookup-issue
Additional program output to the terminal or log subsystem illustrating the issue
No response
The text was updated successfully, but these errors were encountered: