-
Notifications
You must be signed in to change notification settings - Fork 18.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to react to graceful shutdown of (Windows) container #25982
Comments
ping @jhowardmsft ptal |
The call into HCSShim will differ under the covers depending on whether this is a Windows Server Container, or a Hyper-V container, and depending on not whether the initiator of the shutdown is a forced shutdown or graceful shutdown (eg from docker stop -f container vs docker stop container). In a forced case, for either container type, no notification will be expected. It is possible that there is a kernel issue in the Windows Server Container case, but I'm told by the kernel folks that the way it works is that the cexecsvc calls an initiatesystemshutdown, and the processes in the job object/silo should be notified as per a regular system shutdown. Can you confirm what container type this is? And if you see the same for both types. If you run the app outside of a container just on the host and run "shutdown /t 0 /r" (effectively an InitateSystemShutdown call), does your app get notified? |
(Note also this was on TP5 - I would strongly recommend seeing if the behaviour is different on 14393/RTM) |
@jhowardmsft This is a Windows Server container, and a graceful shutdown ( |
@jhowardmsft I gave it a shot this morning, and this is what I found. First, the versions I'm now working with:
My test application is a C# application, and I'm using the following methods to detect system shutdown:
Of these methods, when I run the application and shut it down using I am happy to provide the complete source to the app I'm using to test, if you'd like. |
Getting the app would be useful, although I'd probably just be a go-between for the kernel team who would need to dig into what is going on. |
Here's my test app: show_stop.txt Hopefully it provides some helpful insight. |
@darrenstahlmsft FYI |
Do I understand it right that there is currently no way to get a notification that a Windows container is shutting down? How are containerized apps intended to avoid incomplete writes in such a situation? Or do I miss something here? |
@sandersaares That is my understanding with current versions, yes. |
This is still in progress. Changes are needed to the Windows platform. MS#8633377. Will share more info as things change. |
@PatrickLang is that MS#8633377 case public accessible? |
It's internal. I referenced it so my team could find it and you would know it's being worked on. It's still on backlog for a future release. |
@jhowardmsft @PatrickLang any news about this? I've tested the current behavior from plain C applications and posted the results at rgl/docker-windows-2016-vagrant, essentially: Windows containers cannot be gracefully shutdown, either there is no shutdown notification or they are forcefully terminated after a while. The next table describes whether a
|
@PatrickLang will this feature be improved in RS3? |
@rgl I suspect the reason your services don't get notifications is because
Can you re-test with SERVICE_ACCEPT_PRESHUTDOWN? Should work and align it with the others (but still die after ~5 seconds). I presume this is still an open issue due to docker not supporting container shutdown deferral (for good reasons). Maybe an orchestrator-side flag to opt into this potentially unwanted behavior is in order (e.g. --allow-shutdown-deferral)? |
@riverar It's an open issue because Docker on Windows doesn't ask nicely to shut down processes, it simply kills them. This behavior does not match Docker on other platforms. This is (per @PatrickLang) due to issues in Windows. |
That seems to jive with my experience, but runs counter to @jhowardmsft's statement about changes in Windows to call InitateSystemShutdown. I'm running latest insider bits, so if it's not in here, those changes probably never made it into RS3. Maybe John or @PatrickLang can update us. Update: I'm guessing this is a larger issue of non-hyperv containers not having a winlogon.exe to do all the maid work during shutdown. |
Sorry for the delay updating this thread (Thanks @riverar for pinging me via email). Here's the current status on this. In RS3 (currently available in insider preview builds) there is a partial fix available. The partial fix appears to be what @rgl is testing against. Processes started by Docker are able to register for console notifications via This means that most console and GUI applications will receive the notification, as most application runtimes register for these notifications and send them via the runtime specific shutdown mechanisms. Services will not currently get the exit notification without a kernel fix which did not make RS3. It is possible to work around this by using a shim application which acts as the container entrypoint and manages starting the service at container start, and stopping the service when the shim receives the CTRL_CLOSE_EVENT. I can write a proof of concept for this if someone would like a starting point to work from. I'm open to feedback on the current approach for future releases (though I hope to get kernel support for this so it works like regular Windows). Let me know what shutdown features are necessary for your application that are not possible to do with the console notification and 5 second timeout prior to kill. |
@godefroi This fix is available starting in RS3, so the testing is probably on an insider preview build. This works in both HyperV containers and Windows Server containers running both nanoserver and windowsservercore. @rgl The above chart looks right for the current RS3 state, except that in my testing nanoserver console apps correctly receive the notification exactly the same as windowsservercore. I'll try to take a look at your examples and see if I can see what is happening. |
@darrenstahlmsft - Whats an RS3 when its around? |
@matt-richardson Sorry, I was using Microsoft internal language 👼 RS3 is the internal name for the Fall 2017 Windows Semi-Annual Channel release for Windows Server 2016, or the Windows 10 Fall Creators Update depending if you are on Server or Client SKU respectively. It is currently available for preview in the Windows Insider program for the public to download and test, and will be going public some time this fall (I don't think I can share a date yet). For more info on the Windows Server side, see this link. We generally refer to it as RS3 for brevity 😄 |
oh d'oh... I didn't actually implemented the (PRE)SHUTDOWN when I originally tested this! I've updated the source, and the service now receives the PRESHUTDOWN (but not SHUTDOWN) notification. Here's the updated table (only the service rows have changed). The next table describes whether a
NG setting do you guys known from where that (approximate) 195 seconds timeout comes from? anyways, I would only expect the container to be killed after the timeout specified at the docker --time argument expires. I'm testing this on a Windows Server 2016 (10.0.14393.1532) VM and with the following docker base images. microsoft/windowsservercore: microsoft/nanoserver: |
@godefroi I'm not using Hyper-V. Please, see my previous comment to known what I'm using. |
@OnurGumus @swernli I just wanted to comment in case this helps others too. If you don't add an event handler to
Here is how I FINALLY achieved graceful shutdown on windows containers after 2 days of beating my head against the wall. Using the following...
Starting the container with
Stopping the container with
This was all in conjuction with using |
@cphillips83 @OnurGumus can you confirm if this is still an issue, or now you have a workaround? Also do people need it for both Nano Server container and Server Core container? Sorry @cphillips83 to see you have to bead your head against the wall for 2 days. ouch ... For Microsoft folks, I created a new bug 25695040. I couldn't find the on Patrick gave 8633377. |
@weijuans-msft We need it in all windows images. So, yes in both Nano Server and in Server Core. |
I am trying something very similar to others here but am having an issue getting docker stop to wait longer than 5 minutes. I'm using the lines: (This is set to 45 minutes which is what I actually want to use when I get this working) I start it using: docker run -id I expect that it should wait 400 second (over 6 minutes) before killing the container. When I run it on my desktop which is Windows 10 Professional version 2004, it works as expected. I upgraded both Docker engines to version 19.03.12 but observe the same behavior as before. Should this work as I am expecting? If so, any ideas what could be wrong? |
@olandese Did you ever manage to get this to work? |
Just a note here for anyone who find this issue and was as confused as I was about all this:
I hope this helps someone! |
Are we sure that this setting ProcessShutdownTimeoutSeconds is actually doing anything? The first appearance of that key on the internet seems to be on a blog in 3 September 2019, but I can't find any information about where that came from. It may no be doing anything, it doesn't seem to have the functionality described by @artlogic . And it doesn't make sense that is it working, otherwise you'd just see the delay before your program gets the signal and then the "real" delay from the WaitToKillServiceTimeout starts. The reason I'm finding this out is I'm using a different mechanism for signalling graceful shutdown (a file appears) but it doesn't seem to work because the CTRL_SHUTDOWN_EVENT is sent immediately and not delayed by ProcessShutdownTimeoutSeconds. It seems SetConsoleCtrlHandler must be used to handle the CTRL_SHUTDOWN_EVENT and WaitToKillServiceTimeout provides time between that and service kill, and ProcessShutdownTimeoutSeconds isn't used. |
I didn't test |
I can confirm that on host 21H2, with base image nanoserver 1809 and later, the --time in Also, it seems that the |
from @darstahl comments:
I understand that this means the entrypoint has no control whatsoever in the shutdown sequence. If this is right, this design decision does not seem to allow for orquestration of shutdown. Let's say I have an entrypoint in a container running a MSSQL Server service, and I want to interact with MSSQL Server in the entrypoint shutdown. If they are both signaled at the same time, there is no way I can coordinate what happens inside the container from within the entrypoint shutdown logic. Would it not make sense to first wait for the entrypoint to fully exist/finish, and then signal all processes and services to shutdown? |
@zleight1 I was also getting totally blocked trying to achieve this through powershell. Guess what. You cannot do it in a straighforward way.
Full working example here: https://github.com/david-garcia-garcia/dockernosigterm If you are preparing containers for kubernetes, i recommend using LifeCycleHooks for this: https://kubernetes.io/docs/concepts/containers/container-lifecycle-hooks/ I am working on a kubernetes oriented container that must work on docker so that devs can use it locally. Production is K8S so this was already solved through LifeCycleHooks there. Just a word of WARNING about what i posted here and the other solutions in this thread, they key to chaos is this:
This effectively makes having any control on what is happening inside the container imposible. You can hold-up shutdown in your entrypoint, but the whole system is actually shutting down so you will face erratic behaviour as services start to shutdown without your control. I have setup a docker image that wraps up all of this container lifecycle management for windows in an image: https://github.com/david-garcia-garcia/windowscontainers Note that the script provided here was a POC iteration, and several considerations need to be made for this to be reliable and robust, see the linked repo for the final version and a better description on what approach was used to handle the complete container lifecycle in both Docker and K8S. |
@david-garcia-garcia your solution works, at least at some level, but did you find how to make Windows containers return 0 exit code instead of 0xC000013A ? |
Output of
docker version
:Output of
docker info
:I am unable to react to a graceful shutdown of my application running inside a (Windows) container. I have tried
SetConsoleCtrlHandler()
, but my handler is never called. I have triedsignal()
, but noSIGTERM
is received. I have tried running a message loop, butWM_CLOSE
is never received.The work of shutting down the container is (apparently) done by the
ShutdownComputeSystem
routine from vmcompute.dll (this is from zhcsshim.go), but I cannot find any documentation or other information on whatShutdownComputeSystem
does. It has been suggested that @jhowardmsft would know what's going on.Please help!
The text was updated successfully, but these errors were encountered: