Hello everyone,
I’m reaching out to the community for help with a persistent issue I’m facing on my ZimaBoard 2. I’ve done quite a bit of troubleshooting already, but the problem still appears occasionally, and I’m hoping someone here might have experience with a similar setup.
Hardware Setup:
- Board: ZimaBoard 2 16Gb ram
- Storage Adapter: Zima PCIe 3.0 x4 to Dual NVMe M.2 adapter card
- SSDs: 2x Crucial P3 1TB NVMe SSDs
- OS Drive: 128GB SATA SSD (for TrueNAS)
- Additional Disk: Crucial BX500 1TB SATA SSD (for local snapshots/backups)
Software:
- OS: TrueNAS Scale 25.10.2 (fresh install)
- Kernel: 6.12.33-production+truenas
The Problem:
I occasionally see the following errors in the console or logs:
nvme nvme0: controller is down: will reset CSTS=0x3, PCI_STATUS=0x10
nvme nvme0: resetting controller due to persistent internal error
After the error, the system usually recovers (the controller resets), but it’s clearly a sign of instability. The chip on the NVMe adapter gets very hot to the touch (can’t keep a finger on the heatsink), though the SSDs themselves remain cool.
What I’ve Tried So Far:
-
Sysctl/kernel parameters:
-
Added
pcie_aspm=off(confirmed active in/proc/cmdline) -
Tried adding
nvme_core.default_ps_max_latency_us=0andpcie_port_pm=offvia the TrueNAS web UI (Sysctl with UDEV type), but later discovered these are not proper sysctl variables. -
Then applied them correctly as
kernel_extra_optionsvia themidcltcommand:midclt call system.advanced.update '{"kernel_extra_options": "nvme_core.default_ps_max_latency_us=0 pcie_aspm=off pcie_port_pm=off"}'
After a reboot, I verified that all three parameters are now present in
/proc/cmdlineand thatcat /sys/module/nvme_core/parameters/default_ps_max_latency_usreturns0.Result: The error frequency has decreased significantly, but it hasn’t disappeared completely.
-
What I Haven’t Tried Yet:
- Firmware update: I haven’t updated the Crucial P3 firmware. I plan to do it, but I need to find a Windows machine for that.
- Active cooling: Adding a small fan pointing at the adapter.
- Testing with a single NVMe to isolate a potential power delivery issue.
My Questions:
- Has anyone here experienced similar
nvme controller is downerrors with a ZimaBoard + dual NVMe adapter + Crucial P3 combo? - Could the adapter chip overheating be a normal behavior, or is it a red flag? Should I prioritize active cooling?
- Does anyone know if there’s a known firmware issue with Crucial P3 drives that might cause this? (I’ll update it anyway, but curious if others have seen improvements after updating.)
- Could this be a power delivery limitation of the ZimaBoard’s PCIe slot? The adapter draws power from the slot, and two high-performance NVMe drives might be too much.
- Is there any other kernel parameter or BIOS setting I should try before concluding it’s a hardware issue?
Additional Info:
- The SSDs are brand new and pass
smartctllong tests. - I’m aiming for a mirror pool with these two drives for data redundancy.
Any insights, experiences, or suggestions would be greatly appreciated. Thanks in advance for your help!