[go: nahoru, domu]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add step to disable swap in ansible-playbooks for on-premise/bare-metal install of kubernetes 1.8+ #1127

Closed
ksatchit opened this issue Jan 17, 2018 · 2 comments

Comments

@ksatchit
Copy link
Member

Is this a BUG REPORT or FEATURE REQUEST?

BUG REPORT

What happened:

The on-premise kubernetes setup for versions greater than 1.8 using the ansible-playbooks is seen to fail with the following error. K8s 1.8+ needs kubelet running w/ swap on the nodes.

TASK [k8s-master : kubeadm init] ***************************************************************************************
task path: /home/ciuser/openebs/e2e/ansible/roles/k8s-master/tasks/main.yml:103
fatal: [kubemaster01]: FAILED! => {"changed": true, "failed": true, "rc": 1, "stderr": "Shared connection to 20.10.49.11 closed.\r\n", "stdout": "\r\n\r\nUpdating the host file...\r\nSetting up the Master using IPAddress: 20.10.49.11\r\n[kubeadm] WARNING: kubeadm is in beta, please do not use it for production clusters.\r\n[init] Using Kubernetes version: v1.8.2\r\n[init] Using Authorization modes: [Node RBAC]\r\n[preflight] Running pre-flight checks\r\n[preflight] WARNING: Running with swap on is not supported. Please disable swap or set kubelet's --fail-swap-on flag to false.\r\n[preflight] Starting the kubelet service\r\n[kubeadm] WARNING: starting in 1.8, tokens expire after 24 hours by default (if you require a non-expiring token use --token-ttl 0)\r\n[certificates] Generated ca certificate and key.\r\n[certificates] Generated apiserver certificate and key.\r\n[certificates] apiserver serving cert is signed for DNS names [mayamaster kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 20.10.49.11]\r\n[certificates] Generated apiserver-kubelet-client certificate and key.\r\n[certificates] Generated sa key and public key.\r\n[certificates] Generated front-proxy-ca certificate and key.\r\n[certificates] Generated front-proxy-client certificate and key.\r\n[certificates] Valid certificates and keys now exist in \"/etc/kubernetes/pki\"\r\n[kubeconfig] Wrote KubeConfig file to disk: \"admin.conf\"\r\n[kubeconfig] Wrote KubeConfig file to disk: \"kubelet.conf\"\r\n[kubeconfig] Wrote KubeConfig file to disk: \"controller-manager.conf\"\r\n[kubeconfig] Wrote KubeConfig file to disk: \"scheduler.conf\"\r\n[controlplane] Wrote Static Pod manifest for component kube-apiserver to \"/etc/kubernetes/manifests/kube-apiserver.yaml\"\r\n[controlplane] Wrote Static Pod manifest for component kube-controller-manager to \"/etc/kubernetes/manifests/kube-controller-manager.yaml\"\r\n[controlplane] Wrote Static Pod manifest for component kube-scheduler to \"/etc/kubernetes/manifests/kube-scheduler.yaml\"\r\n[etcd] Wrote Static Pod manifest for a local etcd instance to \"/etc/kubernetes/manifests/etcd.yaml\"\r\n[init] Waiting for the kubelet to boot up the control plane as Static Pods from directory \"/etc/kubernetes/manifests\"\r\n[init] This often takes around a minute; or longer if the control plane images have to be pulled.\r\n[kubelet-check] It seems like the kubelet isn't running or healthy.\r\n[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp [::1]:10255: getsockopt: connection refused.\r\n[kubelet-check] It seems like the kubelet isn't running or healthy.\r\n[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp [::1]:10255: getsockopt: connection refused.\r\n[kubelet-check] It seems like the kubelet isn't running or healthy.\r\n[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp [::1]:10255: getsockopt: connection refused.\r\n[kubelet-check] It seems like the kubelet isn't running or healthy.\r\n[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz/syncloop' failed with error: Get http://localhost:10255/healthz/syncloop: dial tcp [::1]:10255: getsockopt: connection refused.\r\n[kubelet-check] It seems like the kubelet isn't running or healthy.\r\n[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz/syncloop' failed with error: Get http://localhost:10255/healthz/syncloop: dial tcp [::1]:10255: getsockopt: connection refused.\r\n[kubelet-check] It seems like the kubelet isn't running or healthy.\r\n[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz/syncloop' failed with error: Get http://localhost:10255/healthz/syncloop: dial tcp [::1]:10255: getsockopt: connection refused.\r\n[kubelet-check] It seems like the kubelet isn't running or healthy.\r\n[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp [::1]:10255: getsockopt: connection refused.\r\n[kubelet-check] It seems like the kubelet isn't running or healthy.\r\n[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz/syncloop' failed with error: Get http://localhost:10255/healthz/syncloop: dial tcp [::1]:10255: getsockopt: connection refused.\r\n[kubelet-check] It seems like the kubelet isn't running or healthy.\r\n[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp [::1]:10255: getsockopt: connection refused.\r\n\r\nUnfortunately, an error has occurred:\r\n\ttimed out waiting for the condition\r\n\r\nThis error is likely caused by that:\r\n\t- The kubelet is not running\r\n\t- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)\r\n\t- There is no internet connection; so the kubelet can't pull the following control plane images:\r\n\t\t- gcr.io/google_containers/kube-apiserver-amd64:v1.8.2\r\n\t\t- gcr.io/google_containers/kube-controller-manager-amd64:v1.8.2\r\n\t\t- gcr.io/google_containers/kube-scheduler-amd64:v1.8.2\r\n\r\nYou can troubleshoot this for example with the following commands if you're on a systemd-powered system:\r\n\t- 'systemctl status kubelet'\r\n\t- 'journalctl -xeu kubelet'\r\ncouldn't initialize a Kubernetes cluster\r\n", "stdout_lines": ["", "", "Updating the host file...", "Setting up the Master using IPAddress: 20.10.49.11", "[kubeadm] WARNING: kubeadm is in beta, please do not use it for production clusters.", "[init] Using Kubernetes version: v1.8.2", "[init] Using Authorization modes: [Node RBAC]", "[preflight] Running pre-flight checks", "[preflight] WARNING: Running with swap on is not supported. Please disable swap or set kubelet's --fail-swap-on flag to false.", "[preflight] Starting the kubelet service", "[kubeadm] WARNING: starting in 1.8, tokens expire after 24 hours by default (if you require a non-expiring token use --token-ttl 0)", "[certificates] Generated ca certificate and key.", "[certificates] Generated apiserver certificate and key.", "[certificates] apiserver serving cert is signed for DNS names [mayamaster kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 20.10.49.11]", "[certificates] Generated apiserver-kubelet-client certificate and key.", "[certificates] Generated sa key and public key.", "[certificates] Generated front-proxy-ca certificate and key.", "[certificates] Generated front-proxy-client certificate and key.", "[certificates] Valid certificates and keys now exist in \"/etc/kubernetes/pki\"", "[kubeconfig] Wrote KubeConfig file to disk: \"admin.conf\"", "[kubeconfig] Wrote KubeConfig file to disk: \"kubelet.conf\"", "[kubeconfig] Wrote KubeConfig file to disk: \"controller-manager.conf\"", "[kubeconfig] Wrote KubeConfig file to disk: \"scheduler.conf\"", "[controlplane] Wrote Static Pod manifest for component kube-apiserver to \"/etc/kubernetes/manifests/kube-apiserver.yaml\"", "[controlplane] Wrote Static Pod manifest for component kube-controller-manager to \"/etc/kubernetes/manifests/kube-controller-manager.yaml\"", "[controlplane] Wrote Static Pod manifest for component kube-scheduler to \"/etc/kubernetes/manifests/kube-scheduler.yaml\"", "[etcd] Wrote Static Pod manifest for a local etcd instance to \"/etc/kubernetes/manifests/etcd.yaml\"", "[init] Waiting for the kubelet to boot up the control plane as Static Pods from directory \"/etc/kubernetes/manifests\"", "[init] This often takes around a minute; or longer if the control plane images have to be pulled.", "[kubelet-check] It seems like the kubelet isn't running or healthy.", "[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp [::1]:10255: getsockopt: connection refused.", "[kubelet-check] It seems like the kubelet isn't running or healthy.", "[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp [::1]:10255: getsockopt: connection refused.", "[kubelet-check] It seems like the kubelet isn't running or healthy.", "[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp [::1]:10255: getsockopt: connection refused.", "[kubelet-check] It seems like the kubelet isn't running or healthy.", "[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz/syncloop' failed with error: Get http://localhost:10255/healthz/syncloop: dial tcp [::1]:10255: getsockopt: connection refused.", "[kubelet-check] It seems like the kubelet isn't running or healthy.", "[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz/syncloop' failed with error: Get http://localhost:10255/healthz/syncloop: dial tcp [::1]:10255: getsockopt: connection refused.", "[kubelet-check] It seems like the kubelet isn't running or healthy.", "[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz/syncloop' failed with error: Get http://localhost:10255/healthz/syncloop: dial tcp [::1]:10255: getsockopt: connection refused.", "[kubelet-check] It seems like the kubelet isn't running or healthy.", "[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp [::1]:10255: getsockopt: connection refused.", "[kubelet-check] It seems like the kubelet isn't running or healthy.", "[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz/syncloop' failed with error: Get http://localhost:10255/healthz/syncloop: dial tcp [::1]:10255: getsockopt: connection refused.", "[kubelet-check] It seems like the kubelet isn't running or healthy.", "[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp [::1]:10255: getsockopt: connection refused.", "", "Unfortunately, an error has occurred:", "\ttimed out waiting for the condition", "", "This error is likely caused by that:", "\t- The kubelet is not running", "\t- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)", "\t- There is no internet connection; so the kubelet can't pull the following control plane images:", "\t\t- gcr.io/google_containers/kube-apiserver-amd64:v1.8.2", "\t\t- gcr.io/google_containers/kube-controller-manager-amd64:v1.8.2", "\t\t- gcr.io/google_containers/kube-scheduler-amd64:v1.8.2", "", "You can troubleshoot this for example with the following commands if you're on a systemd-powered system:", "\t- 'systemctl status kubelet'", "\t- 'journalctl -xeu kubelet'", "couldn't initialize a Kubernetes cluster"]}

PLAY RECAP *************************************************************************************************************
kubemaster01               : ok=30   changed=23   unreachable=0    failed=1
localhost                  : ok=27   changed=23   unreachable=0    failed=0

What you expected to happen:

  • The ansible-based kubernetes cluster creation should complete without issues

How to reproduce it (as minimally and precisely as possible):

  • Run the current setup-kubernetes ansible playbook with swap turned on in the kubernetes cluster

Anything else we need to know?:

Here are some good links on the reason why swap has to be disabled for kubernetes 1.8+ :

Support for swap is non-trivial. Guaranteed pods should never require swap. Burstable pods should have their requests met without requiring swap. BestEffort pods have no guarantee. The kubelet right now lacks the smarts to provide the right amount of predictable behavior here across pods.

@ksatchit
Copy link
Member Author

These steps have to be added into both the k8s-masters & k8s-hosts bring up procedure.

@github-actions
Copy link
github-actions bot commented Dec 4, 2019

Issues go stale after 90d of inactivity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant