Disk health (smartctl) Monitor dedicated server disk health on HolyCloud with smartmontools and smartctl, interpret SMART, and plan replacement. ~9 min read Beginner #smartctl #disks #smart #dedie Disk health (smartctl) Dedicated server disks eventually fail. SMART (Self-Monitoring, Analysis and Reporting Technology) reports reallocated sectors, read errors, and SSD wear before total failure. smartctl is part of the smartmontools package. Prerequisites Linux dedicated server with root access SATA/SAS/NVMe disks recognized by controller (hardware RAID may hide SMART — see below) No destructive tests during production hours without a maintenance window Installation sudo apt update sudo apt install -y smartmontools sudo systemctl enable --now smartd Identify disks lsblk -d -o NAME,SIZE,MODEL,ROTA sudo smartctl --scan Example paths: | Type | Device | |------|--------| | SATA | /dev/sda | | NVMe | /dev/nvme0 | First SMART read sudo smartctl -a /dev/sda sudo smartctl -a /dev/nvme0 Critical attributes (HDD): | Attribute | Meaning | |----------|---------------| | Reallocated_Sector_Ct | Remapped bad sectors — > 0 to watch | | Current_Pending_Sector | Unstable sectors pending | | UDMA_CRC_Error_Count | Often faulty cable/SAS | | Temperature_Celsius | Excessive heat | SSD NVMe: check Percentage Used, Media Errors, Available Spare. Short test (non-destructive) sudo smartctl -t short /dev/sda # wait ~2 min sudo smartctl -a /dev/sda | tail -20 Long test (HDD, several hours): sudo smartctl -t long /dev/sda Plan a maintenance window — high I/O. smartd: automatic alerts Edit /etc/smartd.conf: /dev/sda -a -o on -S on -s (S/../.././02|L/../../6/03) -m root@localhost /dev/nvme0 -a -o on -S on -m root@localhost -s: scheduled tests (short daily, long weekly) -m: email (configure postfix or HolyCloud relay) sudo systemctl restart smartd sudo smartctl -i /dev/sda | grep -i smart Hardware RAID (MegaRAID, etc.) Physical disk may be /dev/bus/0 — use controller tools: # MegaRAID example sudo apt install -y megacli # or storcli from vendor Ask HolyCloud support for RAID model for exact smartctl -d megaraid,N -a /dev/sda command. Quick interpretation sudo smartctl -H /dev/sda | Result | Action | |----------|--------| | PASSED | Continue monitoring | | FAILED | Immediate backup, disk replacement ticket | | Inconsistent data | Cable, backplane, controller | Logging sudo smartctl -a /dev/sda > /root/smart-sda-$(date +%F).txt Keep monthly history to see counter drift. Dedicated server best practices Monitor all RAID disks, not only the visible logical volume. Pair SMART with off-server backups (S3, another DC). After HolyCloud disk replacement, rerun smartctl -t short. Troubleshooting | Problem | Approach | |----------|-------| | SMART Disabled | smartctl -s on /dev/sda | | Device open failed | Disk in RAID without -d option | | NVMe « unknown» | Update smartmontools | Need help? Open a ticket with full smartctl -a output, disk serial (panel / IPMI), and slot for warranty replacement. Continue reading Previous article Debian netinst installation Read Next article iLO / iDRAC access (HPE/Dell) Read