-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: Sandbox API #4131
Comments
Seems like it should fit into containerd. Is sandbox going to be the official name/term for what this is? |
|
so with this API, do you need a way to |
Is this expected to be in v1.4 or in v1.5? |
@crosbymichael Generally we'd want to have sandbox implementations to be responsible for resource allocations and clients just specify desired state. Due to variety of resource types that might be needed for different sandbox implementations, it might be challenging to define a generic API for that. So containerd doesn't manage resources directly. It rather manages sandbox instances, that manage needed resources internally. So from resource perspective there are 3 major API endpoints:
To specify needed resources (desired state) we can use either Spec field (for generic ones similarly to runtime spec) or Extensions (which allow to specify sandbox specific configuration in generic way). |
@AkihiroSuda 1.4 looks feature complete and close to betas. I'd prefer to have this in 1.5, so we have more time on polishing. |
@mxpv Excellent proposal! This is going to be very useful for kata containers as well. I have one question, how is the |
Since CRI was designed more around containers, there is no right place to put Specifically In longer perspective, with better support from CRI, |
do you have an idea on how higher levels will interact with this new api? with cri-containerd need to be updated to take advantage of these APIs to replace that pause container? |
Right, |
Sounds good to me. thanks |
What's the fundamental difference from CRI? Would it make sense to just enhance CRI? CRI does not need to be k8s specific, I personally would love to see CRI expanded beyond only the needs of k8s. |
@ibuildthecloud Let's start from high level requirement here: "we're looking for a way to run microVM based containers on Kubernetes (1) and/or via containerd (2). We can do it together (via
This relates to (1). Yes, we still do need to enhance CRI either way for better microVM support, because CRI was designed mainly around "container" concept (as a random example:
So CRI is a Kubernetes specific concept. This is what you'd need to implement in order to make kubelet support custom runtime implementations in Kubernetes. Kubernetes team is very careful about API changes and backward compatibility, so it's going to be challenging to use CRI for something else beyond Kubernetes.
Same difference as in |
I like the idea proposed to host pluggable sandbox types. The design concept shown in the right half of the diagram makes good sense. The left portion isn't really how CRI works today, or at least is an over simplification (probably to save space :-). So I'll just ramble a bit. As stated CRI includes a Sandbox API in the runtime service apis, and that API group is already implemented by our code. The "notion of a group of containers in containerd - a
I suppose one could see it that way logically but all that stuff is done by the CRI implementation with the aid of system services, CNI etc. We do that stuff in containerd/cri on the running of pod/sandboxes and containers. The pause container is by and large just a process to hold the resources shared across the pod/sandbox. That process could be hosted any number of other ways. The sandbox api in CRI was never meant to be just for docker containers, but that certainly was the focus, and thus why the VM teams encountered issues. The sandbox api in CRI was always meant to be for VM runtimes, applications, compute resources... What may confuse/conflate the issue is the k8s team went back and forth with naming, mostly pod in docs but sandbox in the code, then eventually after much discussion it was agreed to call them podsandboxes, thus the podsandbox api. ... The solution, at the CRI layer, for supporting alternative container runtimes, was to add |
Is this still planned for 1.5? |
For 1.6 |
Any MR for this proposal? It may help to support confidential computing too. |
Hi, I think with the sandbox plugin enabled, the "sandbox" will be a first class citizen in the containerd. for kata, the arch with sandbox plugin is like this: I have made a poc project for kata with this new architecture, with the modification of @mxpv 's Draft, the container can already be started, and for kata container, there will be no shim process on the host anymore, because the kata-agent in the vm serves shimv2 task api, containerd "tasks" plugin can call the task api through vsock address, which is get from sandbox plugin.
the sandbox plugin is like snapshotter plugin, In the CRI RunPodSandbox method, we just need to add WithNewSandbox opt to the opts. I think making sandbox a first class citizen will make the architecture much cleaner than what it is now. the pause container can be removed, and for vm based container, no shim is needed. |
the pause container can be removed <<<<< Music to my ears!! |
A solution may be to split PodSandbox into "Pod" and "Sandbox". Then sandbox is the environment to run pod, and pod is used to group several containers and share namespaces.
|
I think the "pause container" is much like one kind of implementation of sanbox. maybe for the new runc sandbox plugin, we can directly start the shimv2 server in the new namespace, as a replace of the old "pause" process, and listen to an abstract unix domain socket. @jiangliu for runc, it is the runc sandbox plugin's responsbility to provide an environment to run pod, whether to start a "pause" process is up to the plugin's decision. the creation of pause container after the sandbox plugin created a sandbox seems to be a redundant step. at least for the vm based container, the pause container is really redundant. |
(We probably don't want to bring back CVE-2020-15257, right? 😅) |
with that in concern, maybe shimv2 server should still start itself in the root namespace, and start a pause process itself in the pod's namespace. I think still there is no need of a "pause container". the sandbox plugin can manage the "sandbox", creating a pause container is actually creating a sandbox, which is duplicated with the work of sandbox plugin. @tianon |
It depends on the implementation details. If there's nothing within the VM but out of the pod, the pause container is redundant. But if we run some services within the VM but out of the pod, we may still need the pause container as runC. |
Yes, but actually whether the pause container can be removed is not point,What I want to emphasize is that after the sandbox is started, the shimv2 API needs to be exposed instead of waiting for the tasks plugin to start a shimv2 process. |
Here is another example of issue that would benefit from having more information from containerd: kata-containers/kata-containers#2071 The problem there is that Kata Containers cannot correctly compute the number of VCPUs because the information is not there. It can be inferred in the case of CRI-O through some annotation, but that's not robust. |
The initial Sandbox API PR is merged. |
I’d like to bring up a discussion about Sandbox API in containerd.
Apparently we can not deny growth of popularity of containers with various flavors, such as Pods (e.g. a group of containers with shared namespace) or secure environments (aka micro VMs), like
fircracker-containerd, EKS on Fargate, or Kata containers.
On the other hand we need a path forward for higher level entities (container managers and orchestrators) to be able to run these container extensions transparently.
Today there is no defined way to do this, so everyone has to build own solution and solve same problems (how to manage microVM lifecycle? how to pass resource requirements? how to keep it orchestrator agnostic?).
This proposal introduces a notion of a group of containers in
containerd
- a “sandbox”. It aims to add a low-level lifecycle and resource management capabilities for containers that run inside of some environment (where "some" is defined and implemented by a client).The sandbox concept has the following properties in relation to containers it hosts:
The API aims to be implementation and orchestrator independent, as low level as possible, and introduces no dependencies to other Go packages.
It adds one more proxy plugin type - "sandbox", which implements Controller interface (similarly to snapshotters).
Controller
is responsible for platform specific sandbox implementation. It knows how to create/start/stop/delete a sandbox instance, check lifetime, report status, gather metadata, etc.containerd
keeps track of running instances (in metadata store), generates proper lifecycle events, and translates client calls to a proper proxy plugin. Client can manage sandboxes via client API (example).From orchestrator perspective there will be required a “bridge”, that translates orchestrator calls to sandbox API (e.g. cri-containerd, but implementation agnostic), sandbox controllers remain interchangeable.
Currently the sandbox API implemented in this fork (master...mxpv:sandbox) and exists as proof of concept in sandbox branch of firecracker-container repo.
cc: @samuelkarp @micahhausler @egernst
Also there is a good Twitter discussion about Kubernetes / Firecracker challenges working together: https://twitter.com/micahhausler/status/1238496944684597248
The text was updated successfully, but these errors were encountered: