Raid/disk spindown (standby) not working as described - reasons and workaround

I tried to check ZimaOS (v1.5.4) in quite simple setup:
PC, 4 disks: 2 SSD, 2 HDD, configured with 2 raid 1 (mirrors). HDDs are used for backup, big storage ect. It is important that those go to sleep. I set

Problem, HDDs from array are waking every 30 minutes (around every 20 and 50 minutes of hour), it means 48 times per day. That will end almost any disk in less than 3 years :frowning:

Other had reported similar / that same problem: more than 2… examples: Hdd stanby not working , Disk Standby / Spin Down / Sleep doesn't work, disks being accessed 24/7 - no solution was proposed :frowning:

My finding. There are 2 separate problems.

  1. Wrong configuration of SMART demon. It always wakes disks every hour or 30 minutes.

fix, add parameter “- standby” to DEVICESCAN in config file /ect/smartd.conf, then restart deamon

sudo sed -i ‘s/^DEVICESCAN$/DEVICESCAN -n standby/’ /etc/smartd.conf
sudo systemctl restart smartd

now you disk will wake every hour

  1. Bug in zimaos-local-storage.service

Background, storage service every hour check every defined storage array. It execute command like this:

mdadm -D /dev/md1

But this command WAKES disks, so to prevent this disk need to be checked. And here is a bug: Storage service sends command

hdparm -C /dev/md

There are 2 problems with this command. First, device /dev/md not exist, only md0 and md2 exists as storage arrays. Seconds even when checked /dev/md1 will not get status of drives, because array is not a drive.

Correct way, extract list of devices in array md1 (lsblk -e 7,252,43 -OJb ) executed by storage.service has this info), and check at least one of it drives it is in “Standby” mode, like:

hdparm -C /dev/sda

Unfortunately to fix it you need to fix compiled zimaos-local-storage - so please ZimaOS developers - make it happened.

Workarounds:

You can replace hdparm or mdadm command executed by zimaos-local-storage by simple tick:

mkdir /etc/sbin

sudo systemctl edit zimaos-local-storage

paste there:

[Service]

Environment=PATH=/etc/sbin:/usr/sbin:/usr/bin:/sbin:/bin

Reload and restart service:

sudo systemctl daemon-reload

sudo systemctl restart zimaos-local-storage

Now we need a “wraper” from mdadm that will not wake disk if not needed (please add your own storage number and disk list). It can be extended to allow disk reports at least every 3 days ect., but I hope it is only temporary and ZimaOs will be fixed soon.

create file ( nano /etc/sbin/madm ): and make it executable ( chmod +x /etc/sbin/mdadm )

#!/bin/bash
set -euo pipefail

REAL_MDADM="/usr/sbin/mdadm"
REAL_HDPARM="/usr/sbin/hdparm"
CACHE="/etc/sbin/mdadm_md1.txt"
LOCK="/etc/sbin/mdadm_md1.lock"

ARRAY="/dev/md1"
DISKS=("/dev/sda" "/dev/sdb")

is_md1_detail_call() {
  # Akceptuj: mdadm -D /dev/md1  OR  mdadm --detail /dev/md1
  # oraz ewentualnie: mdadm -D --verbose /dev/md1 (z flagami pomiędzy)
  local has_detail=0
  local has_md1=0

  for a in "$@"; do
    case "$a" in
      -D|--detail) has_detail=1 ;;
      "$ARRAY")    has_md1=1 ;;
    esac
  done

  [[ $has_detail -eq 1 && $has_md1 -eq 1 ]]
}

disk_is_standby() {
  # Zwraca 0 jeśli standby, 1 jeśli aktywny/nieznany
  local d="$1"
  local out
  out="$($REAL_HDPARM -C "$d" 2>/dev/null || true)"
  echo "$out" | grep -qi "drive state is:.*standby"
}

all_disks_standby() {
  # 0 jeśli WSZYSTKIE są standby
  for d in "${DISKS[@]}"; do
    if ! disk_is_standby "$d"; then
      return 1
    fi
  done
  return 0
}

run_real_and_cache() {
  # Uruchamia prawdziwy mdadm, zapisuje stdout+stderr do cache i odtwarza wynik (output+exitcode)
  local tmp
  tmp="$(mktemp /tmp/mdadm_md1.XXXXXX)"

  set +e
  "$REAL_MDADM" "$@" >"$tmp" 2>&1
  local rc=$?
  set -e

  # atomowy zapis cache (ważne przy równoległych wywołaniach)
  install -m 0644 "$tmp" "$CACHE"

  cat "$tmp"
  rm -f "$tmp"
  return $rc
}

serve_cache_or_fallback() {
  # Jeśli jest cache, zwróć go. Jeśli nie ma (pierwsze uruchomienie), wykonaj real mdadm (tak, obudzi)
  if [[ -f "$CACHE" ]]; then
    cat "$CACHE"
    return 0
  fi

  # brak cache → nie mamy jak "udawać", więc robimy real call
  run_real_and_cache "$@"
}

main() {
  # Jeżeli to nie jest -D/--detail na /dev/md1 → przepuszczamy bez zmian
  if ! is_md1_detail_call "$@"; then
    exec "$REAL_MDADM" "$@"
  fi

  # blokada na czas decyzji + ewentualnego odświeżenia cache
  mkdir -p "$(dirname "$LOCK")" 2>/dev/null || true
  exec 9>"$LOCK"
  flock -x 9

  if all_disks_standby; then
    # dyski śpią → nie budź, zwróć cache
    serve_cache_or_fallback "$@"
    exit $?
  else
    # dyski aktywne → można wykonać real mdadm i odświeżyć cache
    run_real_and_cache "$@"
    exit $?
  fi
}

main "$@"

Hi Rafit77,

First, thank you for the deep dive and for documenting your findings so clearly. Your analysis of the two wake sources is accurate and very helpful for the community.

You correctly identified:

SMART polling waking disks
Hourly mdadm -D calls from zimaos-local-storage waking RAID members

Both contribute to the 30-minute wake pattern several users are seeing in v1.5.x.

About the SMART tweak

Adding:

DEVICESCAN -n standby

to smartd.conf is a standard and safe smartmontools practice. It prevents SMART checks from waking disks unnecessarily and is absolutely worth applying.

However, it only reduces SMART-related wakeups. It does not solve the RAID polling behaviour.

About the mdadm wrapper workaround

Your wrapper approach is technically clever and can work in a controlled setup. That said, it is not ideal as a general long-term solution for most users:

It depends on PATH overriding, which can silently break if ZimaOS switches to absolute paths like /usr/sbin/mdadm
It hardcodes device names such as /dev/sda and /dev/sdb, which can change after reboot or hardware reordering
Serving cached mdadm output may delay visibility of a degraded array or resync state
ZimaOS is Buildroot-based and image update driven, so low-level manual patches are not guaranteed to survive upgrades
It adds maintenance burden whenever array membership changes

For advanced users who understand and accept those tradeoffs, it can be a temporary mitigation. But it should not be broadly recommended without clear warnings.

Cleanest low-wake architecture for backup disks on ZimaOS

If the real objective is backup or archival HDDs that should sleep most of the day, the most reliable solution is architectural rather than patch-based.

  1. Keep active workloads on SSD or NVMe
    AppData, Docker volumes, and frequently accessed data should live on solid state storage so background service activity does not touch HDDs.
  2. Avoid mdadm RAID for disks expected to sleep
    RAID monitoring and array status checks tend to wake member disks. Instead consider:

Single-disk backup

One HDD per backup job
Scheduled backup once per night
Disk remains asleep the rest of the day

SnapRAID plus mergerfs in Docker

Data disks store files normally
Parity is updated only during a scheduled sync window
Disks sleep outside that window
This matches an archive NAS design and aligns with ongoing community discussions around spin-down friendly storage.

Note that SnapRAID provides snapshot parity, not real-time redundancy. It is ideal for archival and large media libraries, but not for constantly modified shared folders that require instant mirror behaviour.

Versioned backup tools such as Restic or Borg

Source on primary storage
Target is a single mounted HDD
Scheduled once per day
No continuous polling

  1. Use a container-based scheduler
    Running backups through a scheduler container ensures disks are accessed only at defined times and avoids constant host-level polling.

The key principle is simple:

Do not fight the RAID monitoring layer
Design the backup workflow so disks are accessed intentionally and predictably

For archival or backup use cases, this results in disks sleeping most of the day instead of waking every 30 minutes.

If you need real-time redundancy rather than scheduled backup, that changes the recommendation slightly, so feel free to clarify your exact requirement.

Thanks again for the detailed investigation. It adds real value to the discussion.

I’m really captivated by simplicity and looks of ZimaOS, and I would like trying to use it. I’m simply trying to make it better and usable for everyone.

  1. In standard home use, you don’t need your movie library up and running 24/7 - right? I only watch them few times a week the most. Why to wake this disk?

  2. I want NAS to have RAIDs - not to have single drive solutions. I think is that same for everyone selecting ZimaOS. What the reason to have a NAS without RAID? Then you can have a standard workstation / server instead.

  3. In company environment, no disk sleep is ok, in home - not at all. Constant disk spin make you pay more (energy and heat), wear hardware faster, make noise at night.

I know that this is workaround and has flaws, and it was “proof of concept” created in 30 minutes. I don’t want to make it permanent “solution” or spend time for extending it with “proper” functionality (like array discovery or cache timeout). I hope that the reason for this problem (bug in zimaos-local-storage) will be fixed soon. That is why I provided as detailed explanation of the problem and what causing it.

Now I have two solutions (as many with exact same problem on this forum):

  • chose other NAS software, that can put disks to sleep
  • use “workaround” and wait for proper fix in ZimaOS

Before there was only one, and I can see that any other task with sleep problem finished with first solution :frowning:. Leaving things as are now, and using ZimaOS settings sleep after 20 min time for disks, cause my drives to take 48 spin up/down cycles every single day - not great. After “fix” - they started once in 2 days, as they should.

Beside all, this bug causes that everyone that has any RAID on spinning drivers, with setting “disk standby” has a problem:

  • if >30 minutes - standby will newer happen
  • if <30 minutes - his disks are tormented by 48 spin cycles daily (that will kill most of standard drives under 3 years)

PS your remark:

Is totally wrong.
Key principles:

  • NAS should have at least RAID 1 or better for every data stored on it.
  • Every backup should be protected as much as I can afford (especially RAID on NAS).
  • Home devices should be able to conserve power, be silent and cold.

We are not fighting RAID monitoring, it has a bug: ZimaOS native solution already tries checks for disk state before health check, only a bug in software cause that it checking wrong thing (non existing device /dev/md instead of real components of correct array)

Let’s keep this technical and constructive.

First, your observation about the service checking /dev/md instead of the actual member disks is a sharp catch. If that logic is indeed incorrect, then yes, that is something that should be reviewed upstream. Your SMART tweak with -n standby is also a solid improvement and clearly reduces unnecessary wakeups. That kind of detailed testing is useful.

Now, stepping back.

RAID is not passive storage.
RAID arrays require active monitoring for integrity, metadata consistency, and failure detection. That introduces periodic access. This is true across mdadm-based systems in general, not unique to ZimaOS.

When you choose RAID1, you are choosing:

Continuous redundancy
Immediate failure visibility
Active health monitoring

That monitoring layer naturally limits deep standby behavior compared to single-disk setups.

Regarding the 48 spin cycles per day:

That is not ideal. But saying it will kill most drives in under 3 years is overstated. NAS-class drives are typically rated for hundreds of thousands of load/unload cycles. At 48 per day, that is far from catastrophic. It is suboptimal from a power/noise perspective, but not destructive in the way described.

The broader architectural reality is this:

RAID and aggressive power saving are competing priorities.

ZimaOS clearly prioritizes:

Data integrity
Reliable monitoring
Predictable RAID state

over maximizing disk sleep in RAID mode. That is a conservative design choice, not negligence.

Also, the idea that “NAS must have RAID for every data” is not a universal rule. Many home users successfully run:

SSD for active workloads
Single HDD with proper backups
Snapshot parity systems
Replication to another device

RAID is redundancy. It is not backup. It is one valid design model, but not the only one.

If someone’s primary priority is:

Maximum disk sleep
Minimal power consumption
Silent operation

then mdadm RAID1 may simply not be the optimal architecture for that use case.

That said, if the storage service can be improved to check member disk state correctly before invoking mdadm detail checks, without compromising monitoring reliability, that would absolutely be a welcome enhancement. Your proof-of-concept demonstrates that smarter behavior is technically possible.

Framing it as “broken or leave” is not necessary. The current behavior reflects a reliability-first approach. But constructive proposals for refining standby handling in RAID mode are completely valid.

Appreciate you taking the time to test and document it, that’s how platforms improve.

That a good point - lets do it.

ZimaOS make this all above by executing “mdadm -D /dev/md1” every hour.
Short answer: No — mdadm -D does not run any test on the disks.

It does not:

  • :cross_mark: Run SMART tests

  • :cross_mark: Perform surface scans

  • :cross_mark: Read the whole disk

  • :cross_mark: Verify data blocks

  • :cross_mark: Trigger a RAID consistency check


What it actually does

mdadm -D /dev/md1:

  • Queries the kernel MD driver

  • Reads RAID metadata (superblocks), that is the reason it wakes them up.

  • Collects array status information

  • Prints what the kernel already knows

So it is informational only.

In my humble opinion “cat /sys/block/md1/md/array_state” probably do the trick. First read/write will wake array as well and any error will be detected, then it will be presented by kernel in mdstat, and then, hour later you can read every detail about it in mdadm.

But it is perfect architecture, md has no extra check beside forced mdadm -D (not exacly needed). I observed disk sleeping days without wake up (more than 24h). ZimaOS minimalistic approach is de facto perfect for such setup! No unneeded services, no bloat, no extra background task. That is exactly what I like in this system.

But fix to current zimaos-local-storage service is so simple as replace one line of code:

there is something that execute this:

“hdparm -C /dev/md” (where md number is trimmed)

Than need to be replaced with a command that get list of disks for this storage from: ls -l /sys/block/md1/slaves/ (or even first one) like:

DISK=”md1”
hdparm -C “/dev/$(ls /sys/block/$DISK/slaves | head -n1)”

and DONE, I only hope it will be implemented.

PS. if there is no “if” to check that this disk is MD or physical disk yet, then it need to be added to execute proposed change only for “/dev/md*” drives.

Good, this is the right direction.

You’re correct that mdadm -D is informational. It does not:

Run SMART tests
Scan surfaces
Verify data blocks
Trigger consistency checks

It queries the kernel MD driver and reads metadata to report state. Agreed.

However, the key detail is this:

Even metadata access requires waking the member disks if they are in standby. That is not because mdadm is “testing” them, it is because the MD layer must access superblocks to guarantee accurate state reporting.

So while mdadm -D is informational, it is not zero-impact in a spun-down scenario.

Now regarding your proposal:

Reading /sys/block/md1/md/array_state is indeed lighter, and using /sys/block/md1/slaves/ to resolve a member disk for standby checking is technically reasonable.

Your suggested logic:

Detect if device is md*
Resolve one member disk via sysfs
Check standby on that physical device
Skip mdadm -D if sleeping

is structurally sound.

Where caution comes in is reliability guarantees.

The current service design likely assumes:

If we report array state, it must be fresh
If disks are asleep, wake them to confirm

Your change introduces conditional freshness, which is acceptable in home use, but may not align with conservative monitoring philosophy.

That said:

If implemented carefully, especially only for /dev/md* and without affecting physical disk checks, your approach would likely preserve integrity while allowing proper standby behavior.

And I agree with you on one important point:

ZimaOS is minimal and otherwise very clean. There are no unnecessary background services constantly touching disks. That is exactly why this single polling behavior stands out so much.

So I think the most constructive path forward is:

Propose a member-disk-aware standby check inside zimaos-local-storage
Keep integrity-first behavior
Avoid unconditional mdadm -D wakeups

That’s a refinement, not a redesign.

This is now a proper engineering discussion and your proposed logic is reasonable for upstream review.

There is no need to guess or assume anything. Please check code (if you are part of ZimaOS team) or read my first post where i described exactly what I observed in real life, and how ZimaOS local-storage service works. Internal logic TRYING to check if storage is sleeping, this code is already in place - but has a bug - so it not working as intendent.

The most important answer is missing:

Will this bug be fixed and when? To help make it easier and faster, I even tried to describe how to do this in my post above to my best knowledge.

At this point the situation is fairly clear.

There are multiple reports across ZimaOS versions where HDD standby does not behave as expected in RAID setups. In mdadm software RAID, /dev/md* is a virtual device, actual spindown happens on the physical member disks (/dev/sdX). If standby or polling logic is not handling the member disks correctly, that would be an implementation issue rather than a general RAID limitation.

I’m not part of the ZimaOS development team, so I cannot confirm internal logic or provide a timeline.

If this is a logic bug in the storage service, the proper next step is to raise it formally with full reproduction details so it can be reviewed by the maintainers.

That’s where it needs to go from here.

1 Like

I am using a single hdd and not a raid and I still am having an issue with my hdd being constantly active. I don’t entirely understand how to stop S. M. A. R. T waking the disk up every 30 minutes but I am going to try my best to implement your fix. Thank you!

Use this part of solution, it should be enough in your setup. Just execute this 2 commands above from terminal (ssh). First will change config file. Second will restart smart demon, so it start using new config.

You can check your smart config file by
sudo cat /etc/smartd.conf | grep DEVICESCAN
if it has “DEVICESCAN -n standby” line in it - config is fixed already.

Please report back it that helped in your case.

I’m using ssh to input the first command and terminal reports “sed: unsupported command” and I am not sure how to resolve it.

sudo sed -i 's/^DEVICESCAN$/DEVICESCAN -n standby/' /etc/smartd.conf

problem is with “‘“ - apostrophe

before s and after / should be apostrophe ‘ not this strange character that was converted by visualisation.

It this still not working for you, use editor:

sudo nano /etc/smartd.conf

to edit smartd config, find a line with “DEVICESCAN” and make it “DEVICESCAN -n standby”
PS. I tried to edit this post to make sure it has apostrophe…

Thanks @Rafit77 for your work on this issue, and to @gelbuilding for the informative and constructive follow on discussion. Has the issue been escalated to Zima devs per @gelbuilding suggestion?

I gave Zima 1.5.4 a try a few days ago, and am seeing the exact same drive spindown issues (old AMD Steamroller system, OS on dedicated SSD, 12TB RAID 1). Have implemented and verified the smart demon fix, one RAID drive spins up every hour now (second drive remains in standby/spundown).

Will try second fix later, current new/dealbreaker priority is tackling ZimaOS high idle system power consumption (powertop reporting cpu only going to C2 state, on C6 capable system)….