[go: nahoru, domu]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Operator installation with default containerd config file makes the K8s node NotReady. #139

Open
niteeshkd opened this issue Dec 5, 2022 · 6 comments

Comments

@niteeshkd
Copy link
niteeshkd commented Dec 5, 2022

After installing the containerd with docker (as suggested by install docker on ubuntu ) and creating a single-node K8s cluster with 'worker' role, when the operator is installed, the K8s node becomes NotReady.

Here is the content of the /etc/containerd/config.toml when the containerd is installed.

#   Copyright 2018-2022 Docker Inc.

#   Licensed under the Apache License, Version 2.0 (the "License");
#   you may not use this file except in compliance with the License.
#   You may obtain a copy of the License at

#       http://www.apache.org/licenses/LICENSE-2.0

#   Unless required by applicable law or agreed to in writing, software
#   distributed under the License is distributed on an "AS IS" BASIS,
#   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#   See the License for the specific language governing permissions and
#   limitations under the License.

disabled_plugins = ["cri"]

#root = "/var/lib/containerd"
#state = "/run/containerd"
#subreaper = true
#oom_score = 0

#[grpc]
#  address = "/run/containerd/containerd.sock"
#  uid = 0
#  gid = 0

#[debug]
#  address = "/run/containerd/debug.sock"
#  uid = 0
#  gid = 0
#  level = "info"

The K8s cluster is created after commenting out the line disabled_plugins = ["cri"] in above /etc/containerd/config.toml otherwise it reports the following error.
[ERROR CRI]: container runtime is not running: output: E1205 20:16:43.815128 869946 remote_runtime.go:948] "Status from runtime service failed" err="rpc error: code = Unimplemented desc = unknown service runtime.v1alpha2.RuntimeService"

The status of the K8s node is as follows.

$ kubectl get node
NAME        STATUS   ROLES                  AGE   VERSION
amdmilan1   Ready    control-plane,worker   74s   v1.25.4

Then, the operator is installed as follows.

$ kubectl apply -k github.com/confidential-containers/operator/config/release?ref=v0.2.0

$ kubectl apply  -f https://raw.githubusercontent.com/confidential-containers/operator/main/config/samples/ccruntime.yaml

$ kubectl get runtimeclass
NAME                    HANDLER                 AGE
kata                    kata                    4s
kata-clh                kata-clh                4s
kata-clh-tdx            kata-clh-tdx            4s
kata-clh-tdx-eaa-kbc    kata-clh-tdx-eaa-kbc    4s
kata-qemu               kata-qemu               4s
kata-qemu-sev           kata-qemu-sev           4s
kata-qemu-tdx           kata-qemu-tdx           4s
kata-qemu-tdx-eaa-kbc   kata-qemu-tdx-eaa-kbc   4s

Within 60 seconds of the installation of operator, the K8s node becomes NotReady.

$ sleep 60

$ kubectl get node
NAME        STATUS     ROLES                  AGE     VERSION
amdmilan1   NotReady   control-plane,worker   4m14s   v1.25.4
@niteeshkd
Copy link
Author

We found a workaround to handle the above problem.

Instead of using the installed /etc/containerd/config.toml, if we generate it using the command containerd config default, then the above problem does not occur.

@niteeshkd
Copy link
Author

@fitzthum @bpradipt

@niteeshkd niteeshkd changed the title Operator installation with installed containerd config file makes the K8s node NotReady. Operator installation with default containerd config file makes the K8s node NotReady. Dec 5, 2022
@fitzthum
Copy link
Member
fitzthum commented Dec 6, 2022

Not sure if this is the same issue, but one thing I have seen many times is that containerd does not always install with a config file or at least I have been on a number of machines where containerd is installed and there is no config. If you run the operator in this state, you will get an unusable installation. It might be a good idea to add a check somewhere that installs the default containerd config file (as @niteeshkd mentions above) if the current one is non-existent or empty.

I think in the case described here initially we have a config file, but it is malformed (some important stuff commented out) so that might be out of scope, but I think we should think about handling the empty config case.

@wainersm
Copy link
Member

Hi @niteeshkd ,

We found a workaround to handle the above problem.

Instead of using the installed /etc/containerd/config.toml, if we generate it using the command containerd config default, then the above problem does not occur.

That's exactly what we do on CI. On https://github.com/confidential-containers/operator/blob/main/tests/e2e/ansible/install_containerd.yml#L16 there is an explanation.

@fitzthum maybe it deserves a note on the quickstart's troubleshoot section?

@Amulyam24
Copy link

Hi, I recently faced this issue as well while installing the operator. Definitely +1 for adding a note regarding it. Or would it make sense to also handle it in the script container-engine-for-cc-deploy.sh?

@fitzthum
Copy link
Member
fitzthum commented Jul 5, 2023

I think it would be relatively straightforward to add a check for this in the operator. I guess we've been putting off containerd-related changes because we want to get rid of the containerd fork. This issue could affect upstream containerd as well though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants