Add step to disable swap in ansible-playbooks for on-premise/bare-metal install of kubernetes 1.8+ #1127

ksatchit · 2018-01-17T14:43:43Z

Is this a BUG REPORT or FEATURE REQUEST?

BUG REPORT

What happened:

The on-premise kubernetes setup for versions greater than 1.8 using the ansible-playbooks is seen to fail with the following error. K8s 1.8+ needs kubelet running w/ swap on the nodes.

TASK [k8s-master : kubeadm init] ***************************************************************************************
task path: /home/ciuser/openebs/e2e/ansible/roles/k8s-master/tasks/main.yml:103
fatal: [kubemaster01]: FAILED! => {"changed": true, "failed": true, "rc": 1, "stderr": "Shared connection to 20.10.49.11 closed.\r\n", "stdout": "\r\n\r\nUpdating the host file...\r\nSetting up the Master using IPAddress: 20.10.49.11\r\n[kubeadm] WARNING: kubeadm is in beta, please do not use it for production clusters.\r\n[init] Using Kubernetes version: v1.8.2\r\n[init] Using Authorization modes: [Node RBAC]\r\n[preflight] Running pre-flight checks\r\n[preflight] WARNING: Running with swap on is not supported. Please disable swap or set kubelet's --fail-swap-on flag to false.\r\n[preflight] Starting the kubelet service\r\n[kubeadm] WARNING: starting in 1.8, tokens expire after 24 hours by default (if you require a non-expiring token use --token-ttl 0)\r\n[certificates] Generated ca certificate and key.\r\n[certificates] Generated apiserver certificate and key.\r\n[certificates] apiserver serving cert is signed for DNS names [mayamaster kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 20.10.49.11]\r\n[certificates] Generated apiserver-kubelet-client certificate and key.\r\n[certificates] Generated sa key and public key.\r\n[certificates] Generated front-proxy-ca certificate and key.\r\n[certificates] Generated front-proxy-client certificate and key.\r\n[certificates] Valid certificates and keys now exist in \"/etc/kubernetes/pki\"\r\n[kubeconfig] Wrote KubeConfig file to disk: \"admin.conf\"\r\n[kubeconfig] Wrote KubeConfig file to disk: \"kubelet.conf\"\r\n[kubeconfig] Wrote KubeConfig file to disk: \"controller-manager.conf\"\r\n[kubeconfig] Wrote KubeConfig file to disk: \"scheduler.conf\"\r\n[controlplane] Wrote Static Pod manifest for component kube-apiserver to \"/etc/kubernetes/manifests/kube-apiserver.yaml\"\r\n[controlplane] Wrote Static Pod manifest for component kube-controller-manager to \"/etc/kubernetes/manifests/kube-controller-manager.yaml\"\r\n[controlplane] Wrote Static Pod manifest for component kube-scheduler to \"/etc/kubernetes/manifests/kube-scheduler.yaml\"\r\n[etcd] Wrote Static Pod manifest for a local etcd instance to \"/etc/kubernetes/manifests/etcd.yaml\"\r\n[init] Waiting for the kubelet to boot up the control plane as Static Pods from directory \"/etc/kubernetes/manifests\"\r\n[init] This often takes around a minute; or longer if the control plane images have to be pulled.\r\n[kubelet-check] It seems like the kubelet isn't running or healthy.\r\n[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp [::1]:10255: getsockopt: connection refused.\r\n[kubelet-check] It seems like the kubelet isn't running or healthy.\r\n[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp [::1]:10255: getsockopt: connection refused.\r\n[kubelet-check] It seems like the kubelet isn't running or healthy.\r\n[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp [::1]:10255: getsockopt: connection refused.\r\n[kubelet-check] It seems like the kubelet isn't running or healthy.\r\n[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz/syncloop' failed with error: Get http://localhost:10255/healthz/syncloop: dial tcp [::1]:10255: getsockopt: connection refused.\r\n[kubelet-check] It seems like the kubelet isn't running or healthy.\r\n[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz/syncloop' failed with error: Get http://localhost:10255/healthz/syncloop: dial tcp [::1]:10255: getsockopt: connection refused.\r\n[kubelet-check] It seems like the kubelet isn't running or healthy.\r\n[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz/syncloop' failed with error: Get http://localhost:10255/healthz/syncloop: dial tcp [::1]:10255: getsockopt: connection refused.\r\n[kubelet-check] It seems like the kubelet isn't running or healthy.\r\n[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp [::1]:10255: getsockopt: connection refused.\r\n[kubelet-check] It seems like the kubelet isn't running or healthy.\r\n[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz/syncloop' failed with error: Get http://localhost:10255/healthz/syncloop: dial tcp [::1]:10255: getsockopt: connection refused.\r\n[kubelet-check] It seems like the kubelet isn't running or healthy.\r\n[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp [::1]:10255: getsockopt: connection refused.\r\n\r\nUnfortunately, an error has occurred:\r\n\ttimed out waiting for the condition\r\n\r\nThis error is likely caused by that:\r\n\t- The kubelet is not running\r\n\t- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)\r\n\t- There is no internet connection; so the kubelet can't pull the following control plane images:\r\n\t\t- gcr.io/google_containers/kube-apiserver-amd64:v1.8.2\r\n\t\t- gcr.io/google_containers/kube-controller-manager-amd64:v1.8.2\r\n\t\t- gcr.io/google_containers/kube-scheduler-amd64:v1.8.2\r\n\r\nYou can troubleshoot this for example with the following commands if you're on a systemd-powered system:\r\n\t- 'systemctl status kubelet'\r\n\t- 'journalctl -xeu kubelet'\r\ncouldn't initialize a Kubernetes cluster\r\n", "stdout_lines": ["", "", "Updating the host file...", "Setting up the Master using IPAddress: 20.10.49.11", "[kubeadm] WARNING: kubeadm is in beta, please do not use it for production clusters.", "[init] Using Kubernetes version: v1.8.2", "[init] Using Authorization modes: [Node RBAC]", "[preflight] Running pre-flight checks", "[preflight] WARNING: Running with swap on is not supported. Please disable swap or set kubelet's --fail-swap-on flag to false.", "[preflight] Starting the kubelet service", "[kubeadm] WARNING: starting in 1.8, tokens expire after 24 hours by default (if you require a non-expiring token use --token-ttl 0)", "[certificates] Generated ca certificate and key.", "[certificates] Generated apiserver certificate and key.", "[certificates] apiserver serving cert is signed for DNS names [mayamaster kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 20.10.49.11]", "[certificates] Generated apiserver-kubelet-client certificate and key.", "[certificates] Generated sa key and public key.", "[certificates] Generated front-proxy-ca certificate and key.", "[certificates] Generated front-proxy-client certificate and key.", "[certificates] Valid certificates and keys now exist in \"/etc/kubernetes/pki\"", "[kubeconfig] Wrote KubeConfig file to disk: \"admin.conf\"", "[kubeconfig] Wrote KubeConfig file to disk: \"kubelet.conf\"", "[kubeconfig] Wrote KubeConfig file to disk: \"controller-manager.conf\"", "[kubeconfig] Wrote KubeConfig file to disk: \"scheduler.conf\"", "[controlplane] Wrote Static Pod manifest for component kube-apiserver to \"/etc/kubernetes/manifests/kube-apiserver.yaml\"", "[controlplane] Wrote Static Pod manifest for component kube-controller-manager to \"/etc/kubernetes/manifests/kube-controller-manager.yaml\"", "[controlplane] Wrote Static Pod manifest for component kube-scheduler to \"/etc/kubernetes/manifests/kube-scheduler.yaml\"", "[etcd] Wrote Static Pod manifest for a local etcd instance to \"/etc/kubernetes/manifests/etcd.yaml\"", "[init] Waiting for the kubelet to boot up the control plane as Static Pods from directory \"/etc/kubernetes/manifests\"", "[init] This often takes around a minute; or longer if the control plane images have to be pulled.", "[kubelet-check] It seems like the kubelet isn't running or healthy.", "[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp [::1]:10255: getsockopt: connection refused.", "[kubelet-check] It seems like the kubelet isn't running or healthy.", "[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp [::1]:10255: getsockopt: connection refused.", "[kubelet-check] It seems like the kubelet isn't running or healthy.", "[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp [::1]:10255: getsockopt: connection refused.", "[kubelet-check] It seems like the kubelet isn't running or healthy.", "[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz/syncloop' failed with error: Get http://localhost:10255/healthz/syncloop: dial tcp [::1]:10255: getsockopt: connection refused.", "[kubelet-check] It seems like the kubelet isn't running or healthy.", "[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz/syncloop' failed with error: Get http://localhost:10255/healthz/syncloop: dial tcp [::1]:10255: getsockopt: connection refused.", "[kubelet-check] It seems like the kubelet isn't running or healthy.", "[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz/syncloop' failed with error: Get http://localhost:10255/healthz/syncloop: dial tcp [::1]:10255: getsockopt: connection refused.", "[kubelet-check] It seems like the kubelet isn't running or healthy.", "[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp [::1]:10255: getsockopt: connection refused.", "[kubelet-check] It seems like the kubelet isn't running or healthy.", "[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz/syncloop' failed with error: Get http://localhost:10255/healthz/syncloop: dial tcp [::1]:10255: getsockopt: connection refused.", "[kubelet-check] It seems like the kubelet isn't running or healthy.", "[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp [::1]:10255: getsockopt: connection refused.", "", "Unfortunately, an error has occurred:", "\ttimed out waiting for the condition", "", "This error is likely caused by that:", "\t- The kubelet is not running", "\t- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)", "\t- There is no internet connection; so the kubelet can't pull the following control plane images:", "\t\t- gcr.io/google_containers/kube-apiserver-amd64:v1.8.2", "\t\t- gcr.io/google_containers/kube-controller-manager-amd64:v1.8.2", "\t\t- gcr.io/google_containers/kube-scheduler-amd64:v1.8.2", "", "You can troubleshoot this for example with the following commands if you're on a systemd-powered system:", "\t- 'systemctl status kubelet'", "\t- 'journalctl -xeu kubelet'", "couldn't initialize a Kubernetes cluster"]}

PLAY RECAP *************************************************************************************************************
kubemaster01               : ok=30   changed=23   unreachable=0    failed=1
localhost                  : ok=27   changed=23   unreachable=0    failed=0

What you expected to happen:

The ansible-based kubernetes cluster creation should complete without issues

How to reproduce it (as minimally and precisely as possible):

Run the current setup-kubernetes ansible playbook with swap turned on in the kubernetes cluster

Anything else we need to know?:

Here are some good links on the reason why swap has to be disabled for kubernetes 1.8+ :

https://serverfault.com/questions/881517/why-disable-swap-on-kubernetes

Support for swap is non-trivial. Guaranteed pods should never require swap. Burstable pods should have their requests met without requiring swap. BestEffort pods have no guarantee. The kubelet right now lacks the smarts to provide the right amount of predictable behavior here across pods.

Kubelet/Kubernetes should work with Swap Enabled kubernetes/kubernetes#53533

The text was updated successfully, but these errors were encountered:

ksatchit · 2018-03-26T02:24:37Z

These steps have to be added into both the k8s-masters & k8s-hosts bring up procedure.

github-actions · 2019-12-04T00:10:44Z

Issues go stale after 90d of inactivity.

ksatchit self-assigned this Jan 17, 2018

ksatchit added type: CI type: Config labels Feb 27, 2018

github-actions bot added the No ISSUE activity label Dec 4, 2019

github-actions bot closed this as completed Dec 12, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add step to disable swap in ansible-playbooks for on-premise/bare-metal install of kubernetes 1.8+ #1127

Add step to disable swap in ansible-playbooks for on-premise/bare-metal install of kubernetes 1.8+ #1127

Add step to disable swap in ansible-playbooks for on-premise/bare-metal install of kubernetes 1.8+ #1127

Add step to disable swap in ansible-playbooks for on-premise/bare-metal install of kubernetes 1.8+ #1127

Comments

Is this a BUG REPORT or FEATURE REQUEST?