qBittorrent container crashing and getting into "zombie" state

vipper_666 · February 10, 2026, 5:54pm

With the latest v1.5.4 updated I started running into a few issues with qBittorrent container where after working for like 3~6 hours the container simply stops responding.

When killing the container either via UI or CLI it always fails returning a tried to kill container, but did not receive an exit event error. The only alternative I have found so far to restore the container is a reboot. Literally ran out of things to try and the problem is still there, solution being only after rebooting ZimaOS altogether.

So far I have tried to:

Change the image variant (problem happens with both Nox and Hotio).
Change the image tag (latest, release-5.1.4, release-5.0.4
Change the container name (eg. from “qbittorrent” to “qbittorrent2”)
Change resource allocations (eg. from 2GB to 16GB of RAM or from high CPU shares to low)
Restarting Docker (sudo systemctl restart docker)
Completely removing the container and installing from scratch (docker rm -f qbittorrent)

Zero meaningful logs are being generated, the container just runs into a “zombie” state and gets completely unresponsive after a few hours.

Logs:

Zero network traffic when the container crashes:

I have also noticed an abnormal cache allocation to RAM, where the system is just using 6GB of RAM but for some reason ZimaOS is caching 4x times that. The VM has 32GB of RAM allocated to it:

Wonder if the time might have come to use Docker v29 instead v27? They have made a ton of improvements recently from a robustness perspective and could be managing zombie containers better too.

gelbuilding · February 10, 2026, 10:10pm

What’s happening here isn’t a qBittorrent crash and not a bad image or config.

After a few hours of sustained network and disk activity, the qBittorrent container stops responding. Network traffic drops to zero instantly, but the container stays in a “running” state. When you try to stop or remove it, Docker fails with the familiar “tried to kill container, but did not receive an exit event” error.

That error is important: it means Docker is sending signals, but the kernel never delivers them. At this point the process is no longer responding to userspace at all. Restarting Docker, changing images, tags, resource limits, or even force-removing the container won’t work, because Docker has already lost control.

In this state the qBittorrent process is typically stuck in uninterruptible sleep (D-state), usually waiting on I/O. Once that happens, even kill -9 can’t terminate it, and the only way to recover is a full host reboot.

The commands below are not meant to fix the issue — they’re to confirm this is a kernel-level block rather than an application bug. Run them after the container freezes and before rebooting:

docker stop qbittorrent

PID=$(docker inspect -f '{{.State.Pid}}' qbittorrent)
ps -o pid,stat,wchan,cmd -p $PID

kill -9 $PID

If STAT shows D and the process survives kill -9, that’s the smoking gun. It confirms a kernel / storage / network interaction issue in ZimaOS 1.5.4, not a qBittorrent problem.

vipper_666 · February 10, 2026, 10:36pm

Ah, Gemini gave me that suggestion too but even the process can’t be killed. It keeps surviving the kill -9 command and the only real solution to the problem ends up being to reboot ZimaOS.

Basically the “stop” command returns the same event error message and the kill -9 $PID also fails returning a 0 message. Gemini suggested restarting Docker and since that doesn’t work either…. well…. reboot to the rescue.

gelbuilding · February 10, 2026, 11:17pm

Yep, that behaviour actually confirms the diagnosis, and the detail you added about kill -9 returning 0 is important.

When kill -9 $PID returns 0, it only means the kernel accepted the signal, not that it could deliver it. If the process is stuck in uninterruptible sleep (D-state), the kernel can’t schedule it to handle any signal, even SIGKILL. So the process just sits there forever.

That’s why:

docker stop fails with “did not receive an exit event”
kill -9 appears to “succeed” but the process survives
restarting Docker has zero effect
rebooting the host immediately clears it

At that point the process isn’t really “running” anymore, it’s blocked inside the kernel, usually waiting on I/O that will never complete. Docker, systemd, and userspace tools are powerless once that happens.

So Gemini wasn’t wrong in theory, but in this failure mode restart Docker is impossible to work, because Docker never lost control, the kernel did.

Reboot being the only recovery is expected for D-state hangs. This strongly points to a kernel + storage/network interaction regression in 1.5.4, not a qBittorrent or Docker image issue.

You’ve basically already hit the smoking gun.

References

github.com/moby/moby

Idle connections over overlay network ends up in a broken state after 15 minutes

opened 04:03AM - 21 Feb 17 UTC

closed 04:29PM - 17 Jul 18 UTC

christopherobin

area/networking version/1.13

**Description** In a swarm setup using overlay networks, idle connections bet…ween 2 services will end up in a broken state after 15 minutes. The issue is related to the way docker overlay routes packets, using first iptables to mark them and use ipvs to forward them to the right hosts but the default expiration for connections on ipvs is set to 900 seconds (`ipvsadm -l --timeout`) after which it will stop forwarding packets even though the connection still exists; If this happens then any new packet on this connection will now try to go to the virtual IP for that service that has no valid resolution, resulting in a broken state where it is stuck in limbo while the kernel forever tries to resolve that virtual IP. **Steps to reproduce the issue:** 1. Start 2 services on the same network (on different hosts, though it should be reproducible even on a single host?) 2. `docker exec` in both of them, in one start a `nc` command in listen mode, in the other one connect to that `nc` server by using the service name DNS. 3. Send a packet from the client to the server, everything is fine 4. Find your `netns` and find your connection by doing `nsenter --net=2cc18e502f81 ipvsadm -lnc` 4. Wait for the connection to expire and be removed from the list 5. Send another packet, nothing ever gets there and the connection doesn't timeout, `tcpdump` shows lots of ARP packets going out **Describe the results you received:** Packet never reaches the target, kernel is stuck doing ARP requests over and over. **Describe the results you expected:** Either have the connection properly timeout, or find a way to restore the routing in ipvs. **Additional information you deem important (e.g. issue happens only occasionally):** Currently can be resolved by setting `net.ipv4.tcp_keepalive_time` to less than 900 seconds, to make sure the TCP connection doesn't expire but I'm not sure if it's a valid way to deal with this; At the very least this behavior should be documented. **Output of `docker version`:** ``` Client: Version: 1.13.1 API version: 1.26 Go version: go1.7.5 Git commit: 092cba3 Built: Wed Feb 8 06:38:28 2017 OS/Arch: linux/amd64 Server: Version: 1.13.1 API version: 1.26 (minimum version 1.12) Go version: go1.7.5 Git commit: 092cba3 Built: Wed Feb 8 06:38:28 2017 OS/Arch: linux/amd64 Experimental: false ``` **Output of `docker info`:** ``` Containers: 2 Running: 2 Paused: 0 Stopped: 0 Images: 2 Server Version: 1.13.1 Storage Driver: overlay Backing Filesystem: xfs Supports d_type: true Logging Driver: fluentd Cgroup Driver: cgroupfs Plugins: Volume: local Network: bridge host macvlan null overlay Swarm: active NodeID: l3e2evjei4cvcdgjqavtrztgo Is Manager: false Node Address: 172.24.0.100 Manager Addresses: 172.24.0.200:2377 172.24.0.50:2377 Runtimes: runc Default Runtime: runc Init Binary: docker-init containerd version: aa8187dbd3b7ad67d8e5e3a15115d3eef43a7ed1 runc version: 9df8b306d01f59d3a8029be411de015b7304dd8f init version: 949e6fa Security Options: seccomp Profile: default Kernel Version: 3.10.0-514.2.2.el7.x86_64 Operating System: CentOS Linux 7 (Core) OSType: linux Architecture: x86_64 CPUs: 2 Total Memory: 1.796 GiB Name: worker-1 ID: DR4G:LZEQ:YSQ7:CYTR:FAXW:ZNVJ:E4AZ:BX5L:QYYG:ZDY5:SO7U:TFZW Docker Root Dir: /var/lib/docker Debug Mode (client): false Debug Mode (server): false Registry: https://index.docker.io/v1/ WARNING: bridge-nf-call-ip6tables is disabled Labels: dawn.node.type=worker dawn.node.subtype=app Experimental: false Insecure Registries: 172.24.0.50:5000 127.0.0.0/8 Live Restore Enabled: false ``` **Additional environment details (AWS, VirtualBox, physical, etc.):** My current test setup is 5 vagrant boxes (2 managers + 3 workers), but it should happen in any environment.

vipper_666 · February 11, 2026, 12:15am

Precisely.

cc @777-Spider to investigate with the team if this is the first case and how to further debug the problem.