Back to site

Disk health (smartctl)

Monitor dedicated server disk health on HolyCloud with smartmontools and smartctl, interpret SMART, and plan replacement.

Disk health (smartctl)

Dedicated server disks eventually fail. SMART (Self-Monitoring, Analysis and Reporting Technology) reports reallocated sectors, read errors, and SSD wear before total failure. smartctl is part of the smartmontools package.

Prerequisites

  • Linux dedicated server with root access
  • SATA/SAS/NVMe disks recognized by controller (hardware RAID may hide SMART — see below)
  • No destructive tests during production hours without a maintenance window

Installation

sudo apt update
sudo apt install -y smartmontools
sudo systemctl enable --now smartd

Identify disks

lsblk -d -o NAME,SIZE,MODEL,ROTA
sudo smartctl --scan

Example paths:

| Type | Device |

|------|--------|

| SATA | /dev/sda |

| NVMe | /dev/nvme0 |

First SMART read

sudo smartctl -a /dev/sda
sudo smartctl -a /dev/nvme0

Critical attributes (HDD):

| Attribute | Meaning |

|----------|---------------|

| Reallocated_Sector_Ct | Remapped bad sectors — > 0 to watch |

| Current_Pending_Sector | Unstable sectors pending |

| UDMA_CRC_Error_Count | Often faulty cable/SAS |

| Temperature_Celsius | Excessive heat |

SSD NVMe: check Percentage Used, Media Errors, Available Spare.

Short test (non-destructive)

sudo smartctl -t short /dev/sda
# wait ~2 min
sudo smartctl -a /dev/sda | tail -20

Long test (HDD, several hours):

sudo smartctl -t long /dev/sda

Plan a maintenance window — high I/O.

smartd: automatic alerts

Edit /etc/smartd.conf:

/dev/sda -a -o on -S on -s (S/../.././02|L/../../6/03) -m root@localhost
/dev/nvme0 -a -o on -S on -m root@localhost
  • -s: scheduled tests (short daily, long weekly)
  • -m: email (configure postfix or HolyCloud relay)
sudo systemctl restart smartd
sudo smartctl -i /dev/sda | grep -i smart

Hardware RAID (MegaRAID, etc.)

Physical disk may be /dev/bus/0 — use controller tools:

# MegaRAID example
sudo apt install -y megacli
# or storcli from vendor

Ask HolyCloud support for RAID model for exact smartctl -d megaraid,N -a /dev/sda command.

Quick interpretation

sudo smartctl -H /dev/sda

| Result | Action |

|----------|--------|

| PASSED | Continue monitoring |

| FAILED | Immediate backup, disk replacement ticket |

| Inconsistent data | Cable, backplane, controller |

Logging

sudo smartctl -a /dev/sda > /root/smart-sda-$(date +%F).txt

Keep monthly history to see counter drift.

Dedicated server best practices

  • Monitor all RAID disks, not only the visible logical volume.
  • Pair SMART with off-server backups (S3, another DC).
  • After HolyCloud disk replacement, rerun smartctl -t short.

Troubleshooting

| Problem | Approach |

|----------|-------|

| SMART Disabled | smartctl -s on /dev/sda |

| Device open failed | Disk in RAID without -d option |

| NVMe « unknown» | Update smartmontools |

Need help?

Open a ticket with full smartctl -a output, disk serial (panel / IPMI), and slot for warranty replacement.