Every few days my ZimaCube locks up and becomes unresponsive. I have not been able to determine any reason for this behavior. I have to power down and restart to fix the issue.
I am running version 1.6.1 of the software.
Every few days my ZimaCube locks up and becomes unresponsive. I have not been able to determine any reason for this behavior. I have to power down and restart to fix the issue.
I am running version 1.6.1 of the software.
Random full lockups are hard to diagnose without logs, but the first thing I would check is whether ZimaOS is recording anything before the freeze.
After the next reboot, can you SSH into the ZimaCube and run these non-destructive checks:
uptime
dmesg | tail -n 120
journalctl -b -1 -n 200 2>/dev/null || echo "journalctl previous boot log not available on this system"
Also worth checking:
free -h
df -h
docker ps
If the system fully locks up every few days, it could be a kernel panic, memory issue, disk issue, overheating, or a container consuming all resources. The logs above should give IceWhale something useful to look at.
Also, are you running any heavy apps like Plex, Jellyfin, VMs, backup tasks, AI/Ollama, Nextcloud, or large Docker containers when it happens?
Hi, thanks for the feedback.
Can you check with lsmod if i915 is being used there? And if so with modinfo i915 what version it is?
If the system crashes, please run the following command after the next boot to collect the previous boot’s system logs and send the generated file to us or sent it to my email(dina@icewhale.org:
sudo journalctl -b -1 -n 500 > /DATA/logs.txt
In addition, you can run the following command:
ln -s /var/log/journal/ /DATA/journal
After that, open the journal folder in Files and send us all the logs inside. These logs contain records from previous system shutdowns. Once the logs have been sent, you may delete the journal folder from Files.
It crashed again. When I run the "sudo journalctl -b -1 -n 500 > /DATA/logs.txt command I get -bash: /DATA/logs.txt: Permission denied
When I run ln -s /var/log/journal/ /DATA/journal I get ln: failed to create symbolic link ‘/DATA/journal’: Permission denied
Uptime gives: 16:43:51 up 3 min, 0 users, load average: 0.72, 0.81, 0.38
dmesg response attached
journalctl response attached
free -h response: total used free shared buff/cache available
Mem: 15Gi 1.9Gi 11Gi 17Mi 2.0Gi 13Gi
Swap: 7.1Gi 0B 7.1Gi
df -h response:
Filesystem Size Used Avail Use% Mounted on
/dev/root 1.2G 1.2G 0 100% /
That could be the issue.
docker ps response:
WARNING: Error loading config file: open /DATA/.docker/config.json: permission denied
permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Get “http://%2Fvar%2Frun%2Fdocker.sock/v1.47/containers/json”: dial unix /var/run/docker.sock: connect: permission denied
sudo docker ps response:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
f00d603033c1 homebridge/homebridge:latest “/init” 4 weeks ago Up 9 minutes homebridge-homebridge-1
journalctl.txt (60.4 KB)
dmesg repsonse.txt (8.9 KB)
Thanks for the extra logs.
A few things stand out.
The /dev/root 100% result is normally expected on ZimaOS because the root system is a small read-only appliance-style system image. I would not treat that as the cause by itself.
The permission errors are because the shell redirection is not running as root. This command:
sudo journalctl -b -1 -n 500 > /DATA/logs.txt
runs journalctl with sudo, but the > write to /DATA/logs.txt is still done by your normal user.
Try this instead:
sudo sh -c 'journalctl -b -1 -n 1000 > /DATA/logs.txt'
Same for the symlink. It would need sudo:
sudo ln -s /var/log/journal /DATA/journal
Also, docker ps failing without sudo is expected from your output. sudo docker ps is the correct test on your system.
From the logs you attached, I do not see an obvious out-of-memory issue. Your RAM and swap output also look fine. The dmesg shows the RAID array coming up with 6 out of 6 devices active, and md0 mounting after recovery. It also shows the journal was “corrupted or uncleanly shut down”, which normally means the system was not shut down cleanly after the lockup, but it does not prove the original cause.
The next useful checks would be:
sudo journalctl -b -1 -p warning..alert
sudo dmesg -T | grep -iE 'error|fail|reset|timeout|nvme|ata|i/o|thermal|watchdog|oom|hung'
sudo mdadm --detail /dev/md0
At this stage I would not reinstall yet. I would first confirm whether the lockup is caused by disk/RAID errors, thermal/power issues, or a kernel/hardware hang.
Here is the output from each of the suggested commands.
logs.txt (304.4 KB)
mdadm output.txt (1.1 KB)
dmseg output.txt (15.3 KB)
warning.txt (155.8 KB)
Thanks for uploading the extra files.
From the mdadm output, the RAID array itself looks healthy:
State : clean
Active Devices : 6
Working Devices : 6
Failed Devices : 0
So I would not blame the RAID array at this stage.
I also do not see an obvious out-of-memory, kernel panic, or failed disk smoking gun in the logs provided.
What does stand out is the repeated permission problem around system services, especially cron and Samba:
unable to write timestamp to /var/spool/cron/cronstamps/...
and:
unable to lock file /etc/samba/smbpasswd. Error was Permission denied
Unable to open passdb database.
That looks more like a system/permission/state issue than a failed RAID issue.
At this point I would check whether the system files are mounted read-only or if permissions have become corrupted:
mount | grep -E ' / |/DATA|/etc|overlay'
ls -ld /var/spool/cron /var/spool/cron/cronstamps /etc/samba /etc/samba/smbpasswd
sudo journalctl -b -1 -p err..alert
If /DATA and the RAID are healthy but system services keep hitting permission denied on /etc and /var, this may need IceWhale to look at the OS layer/state rather than just the storage pool.
mount grep command output attached
ls command output:
ls: cannot access ‘/var/spool/cron/cronstamps’: No such file or directory
drwxr-xr-x 1 root root 1024 Apr 25 22:45 /etc/samba
-rw------- 1 root root 105 Jun 9 19:51 /etc/samba/smbpasswd
drwxr-xr-x 3 root root 60 Mar 15 22:37 /var/spool/cron
journalctl output attached
erralert.txt (26.9 KB)
mount grep output.txt (3.4 KB)
Thanks for the new outputs.
The mount output looks normal for ZimaOS:
/ is squashfs read-only
/etc is overlay read-write
/DATA is ext4 read-write
/DATA/.media/Raid-Storage is ext4 read-write
So I do not think the normal read-only root filesystem is the problem.
The new useful detail is this:
ls: cannot access '/var/spool/cron/cronstamps': No such file or directory
and the repeated Samba errors:
unable to lock file /etc/samba/smbpasswd. Error was Permission denied
Unable to open passdb database
ERROR executing command '/usr/libexec/samba/rpcd_witness': Permission denied
Your RAID still looked clean earlier, and the mounts now show /DATA and the RAID storage mounted read-write. So at this stage I would separate this into two issues:
I would next check whether the Samba binaries and database file have the expected permissions:
ls -l /usr/libexec/samba/rpcd_witness /etc/samba/smbpasswd
sudo test -x /usr/libexec/samba/rpcd_witness; echo $?
sudo pdbedit -L
If test -x returns anything other than 0, then Samba cannot execute that helper. If pdbedit -L also fails with permission denied, then the Samba account database or OS overlay state is likely broken.
I would not wipe the RAID or rebuild the array. The storage array itself does not look like the cause from the outputs so far.
ls output:
-rw------- 1 root root 105 Jun 9 19:51 /etc/samba/smbpasswd
-rwxr-xr-x 1 root root 63504 Apr 21 07:29 /usr/libexec/samba/rpcd_witness
test output:
0
pdbedit output:
mcombe:999:
Thanks, that helps.
Those results are actually good signs.
rpcd_witness is executable:
-rwxr-xr-x 1 root root 63504 Apr 21 07:29 /usr/libexec/samba/rpcd_witness
and this returning 0 confirms the system can execute it:
sudo test -x /usr/libexec/samba/rpcd_witness; echo $?
0
Also, pdbedit -L returning:
mcombe:999:
means the Samba account database can be read.
So I would not treat Samba itself as definitely broken now. The earlier Samba permission messages may be runtime/service timing errors rather than the root cause of the lockups.
At this point, the important things we have confirmed are:
RAID looks clean.
/DATA and the RAID storage are mounted read-write.
The Samba database is readable.
No clear OOM, failed RAID, or obvious kernel panic has been shown yet.
The lockup cause is still not proven from the logs. I would now focus on hardware/firmware style checks: temperature, power stability, BIOS/firmware, and whether the crash happens only under load or also while idle.
One more useful command after the next crash would be:
sudo journalctl -b -1 --no-pager | tail -n 200
That will show the last 200 log lines before the previous boot ended, which may be more useful than scanning the whole journal.