Disk issues in production are rarely fixed by “ one magic command ”. You ’ re usually dealing with a whole stack: hardware behavior (HDD vs SSD), block devices and partition tables, RAID/LVM layering, and finally filesystem semantics (inodes, links, deletion, and why space doesn ’ t come back). This post walks the end-to-end workflow — identify a new disk, partition it, format it, mount it, make it persistent, expand capacity with minimal downtime, and debug the common failure modes — while also explaining the underlying mechanisms so you can reason about what the system is doing.
Storage basics: what you ’ re really buying (latency vs throughput vs safety)
Before you touch a single command, it helps to have the right mental model.
Hot vs cold storage (SSD vs HDD) and the “ random I/O tax ”
SSD (hot storage) is great when you need low latency and fast random reads/writes (databases, caches, indexes). HDD (cold storage) is great when you need cheap capacity and large sequential throughput (archives, backups, large logs).
Where the big difference comes from:
- HDD random I/O pays two mechanical waits: seek time (move head) + rotational latency (wait for the sector to rotate under the head).
- SSD is electronic; random I/O is much closer to sequential, but writes have their own complexity (erase blocks, garbage collection, write amplification).
Practical takeaway:
- If a workload becomes random-I/O heavy on HDD, performance can collapse even if “ MB/s ” looks fine for sequential tests.
- If you saturate SSD writes, you may see latency spikes due to internal garbage collection.

What is a “ sector ”, what is a filesystem “ block ”, and why small files waste space
Disks store data in sectors (historically 512B; many drives are 4K physical sectors). Filesystems allocate in blocks (allocation units). A file cannot occupy “ half a block ”, so a 1-byte file still consumes at least one block plus metadata.
This explains real-world surprises:
- “ My directory of tiny files is huge on disk.”
- “
duandls -lreport different sizes.”
TRIM on SSD and “ can deleted data be recovered?”
On HDD, deletion typically only removes directory entries and metadata; the old data may remain until overwritten. On SSD, after deletion the OS may issue TRIM/discard, and the device may reclaim blocks quickly. That ’ s why recovery assumptions differ.
Object storage is a different abstraction (S3/OSS)
If your “ disk problem ” is really “ I have too many blobs to manage on one VM ”, you often want object storage instead of endlessly growing a filesystem.
Block devices in Linux: how disks show up (and how to not shoot yourself)
The core commands to identify hardware and mapping
1 | lsblk -f |
What you ’ re looking for:
- device name:
/dev/sda,/dev/nvme0n1, etc. - partitions:
/dev/sda1,/dev/nvme0n1p1 - filesystem type and UUID (for persistent mounts)
Naming pitfalls: why
/dev/sdb can “ change ”
Device names can change across reboots (especially with multiple disks). For persistence:
- mount by UUID
- or use stable paths like
/dev/disk/by-uuid/and/dev/disk/by-id/
Partition tables: GPT vs MBR (and what tools to use)
MBR vs GPT (decision guide)
- MBR: legacy, limited partitioning model, historically painful for large disks in old BIOS setups.
- GPT: modern standard (UEFI-friendly), more partitions, better metadata and robustness.
In practice: use GPT unless you are constrained by old hardware/boot modes.
Tools: fdisk vs
gdisk vs parted
fdisk: common, works for MBR and (on modern distros) GPT toogdisk: GPT-focusedparted: convenient for some scripted workflows
Example: create a partition (high-level)
1 | sudo fdisk /dev/sdb |
Typical flow inside fdisk:
- create a new partition
- write changes
- re-read partition table (or reboot if required)
Afterwards verify:
1 | lsblk -f |
Filesystems:
format, mount, and persist with fstab
Choose a filesystem: ext4 vs xfs
- ext4: common default, solid general-purpose filesystem
- xfs: strong for large files and parallel I/O; excellent tooling; must be grown online and cannot be shrunk easily
Format
1 | sudo mkfs.ext4 /dev/sdb1 |
Mount
1 | sudo mkdir -p /mnt/data |
Make mount persistent:
/etc/fstab
Always prefer UUID:
1 | sudo blkid /dev/sdb1 |
Example fstab entry:
1 | UUID=<uuid> /mnt/data ext4 defaults 0 2 |
Safety tip: after editing fstab, test without
reboot:
1 | sudo mount -a |
If this errors, fix it before rebooting.
RAID: redundancy and performance, with real trade-offs
RAID is about two knobs:
- availability (tolerate disk failures)
- performance (especially read throughput)
RAID levels (what people actually choose)
- RAID 0: fastest, no redundancy (one disk fails → everything fails)
- RAID 1: mirroring (capacity ~50%), simple redundancy
- RAID 5: parity, tolerate 1 disk failure; write penalty; rebuild risk on large arrays
- RAID 6: double parity, tolerate 2 disk failures; more write overhead
- RAID 10: mirror + stripe; high performance + redundancy; higher cost
When in doubt in production:
- prefer RAID 10 for latency-sensitive workloads
- prefer RAID 6 for large HDD arrays where rebuild risk matters
Software RAID on Linux
(mdadm)
Create a RAID 1 array:
1 | sudo mdadm --create /dev/md0 --level=1 --raid-devices=2 /dev/sda1 /dev/sdb1 |
Persist array definition (file varies by distro):
1 | sudo mdadm --detail --scan | sudo tee -a /etc/mdadm.conf |
Fail/remove a device (example):
1 | sudo mdadm /dev/md0 --fail /dev/sda1 --remove /dev/sda1 |
Operational rule: always verify rebuild status in
/proc/mdstat before assuming you ’ re safe again.
LVM: how to expand disks without re-partitioning pain
LVM is the layer that makes capacity changes manageable. The mental model:
- PV: a disk/partition enrolled into LVM
- VG: a pool of capacity built from one or more PVs
- LV: virtual block devices carved from a VG
Typical expansion workflow (the “ minimal downtime ” playbook)
1 | # 1) Prepare a new disk (or partition) as PV |
Why this works operationally:
- you can add capacity without moving the old blocks first
- expansion is often online (service can stay up if filesystem supports it)
A safer “ data migration ” variant (when you really need to move)
If you must migrate data to a new mount, do it in a controlled window:
- stop writes (or stop the service)
- snapshot/backup
- copy with
rsyncpreserving permissions - switch mount points
- verify, then reopen traffic
1 | sudo rsync -aHAX --delete /old/ /new/ |
/dev
special devices you ’ ll see in disk work
These are not “ real disks ”, but they matter for ops:
/dev/null: discard output/dev/zero: infinite zeros (create files, test throughput)/dev/random//dev/urandom: randomness sources
Examples:
1 | # create a 1GB file for testing (fast on many systems) |
Inodes, hard links, symlinks: filesystem semantics that explain weird incidents
Inodes: why “ file name ” is not “ the file ”
A filename is a directory entry pointing to an inode. The inode points to data blocks.
This helps explain:
- why hard links work
- why deleting a file doesn ’ t always reclaim space immediately
- why inode exhaustion can happen even with free disk space
Check inode usage:
1 | df -i |
Hard link vs symlink (what ’ s the real difference)
- Hard link: another directory entry pointing to the same inode (cannot cross filesystems; usually not for directories)
- Symlink: its own inode containing a path (can cross filesystems; can become dangling)
“ I deleted files but disk space didn ’ t come back ”: the real cause and the fix
The classic root cause is: a process still has the file open.
Find deleted-but-open files:
1 | sudo lsof | grep '(deleted)' |
Fix options:
- restart the holding process (common for log files)
- or rotate logs properly (avoid truncation pitfalls)
This is one of those incidents where understanding filesystem semantics saves hours of guessing.
End-to-end checklist: new disk → usable space → expandable setup
If you want a compact “ do it right ” path:
- Identify disk:
lsblk -f - Partition (GPT preferred):
fdisk/gdisk - Format:
mkfs.ext4ormkfs.xfs - Mount and verify:
mount,df -h - Persist mount by UUID:
/etc/fstab+mount -a - If you expect growth: plan RAID/LVM from day 1 (don ’ t paint yourself into a corner)
If you can run this checklist confidently, most disk incidents become systematic rather than stressful.
A deeper performance model: why “ MB/s looks fine ” but the service is slow
In production, disk complaints usually show up as one of these:
- requests timing out even though CPU is low
- high load average with low CPU utilization
- periodic latency spikes that correlate with log rotation or backups
A useful lens is to separate throughput from latency:
- Throughput answers: “ how many MB per second can I stream?”
- Latency answers: “ how long does one small read/write take?”
Databases and many web workloads care far more about latency than bulk throughput.
Random I/O and IOPS
IOPS (I/O operations per second) is a better metric than MB/s when operations are small (4K –16K). HDD can have decent MB/s sequentially but terrible random IOPS because each random access pays mechanical latency.
Page cache: why reads can be fast until they aren ’ t
Linux aggressively caches file data in memory. This is good. But it can mislead you if you benchmark without clearing cache or if your workload suddenly exceeds memory.
Quick sanity checks:
1 | free -h |
If you see most “ free ” memory in buff/cache, that is
normal and reclaimable.
I/O wait and load average
High load average with low CPU often points to I/O wait. Tools:
1 | top |
Look for:
- high
%wainvmstat - high
awaitand lowsvctm/high utilization iniostat -x
Partition alignment and 4K sectors (the silent performance killer)
Modern disks often have 4K physical sectors even if they expose 512B logical sectors. If partitions are misaligned, a single filesystem write can turn into multiple physical reads/writes.
Practical rule:
- align partitions to 1MiB boundaries (most modern tools do this by default)
Check alignment (roughly):
1 | sudo fdisk -l /dev/sdb |
If you see partition starts at 2048 sectors on 512B logical sector disks, you ’ re typically aligned (2048 * 512B = 1MiB).
Filesystem selection and tuning (ext4 vs xfs vs “ what knobs matter ”)
ext4: safe defaults, broad compatibility
ext4 is often a good default because:
- tooling is mature (
fsck,tune2fs) - it behaves predictably across workloads
xfs: strong for large volumes and parallel I/O
xfs shines with:
- large files
- parallel access patterns
- big filesystems
Operational note: shrinking xfs is not supported in the usual way; plan capacity accordingly.
Mount options: small changes, big behavior differences
Some options you ’ ll actually care about:
noatime: reduce metadata writes from access-time updates (common for read-heavy workloads)discard: continuous TRIM (can add overhead); many setups prefer periodicfstriminstead
Example (conceptual):
1 | UUID=<uuid> /mnt/data ext4 defaults,noatime 0 2 |
For SSD TRIM on a schedule:
1 | sudo fstrim -av |
RAID in production: rebuild risk, write penalties, and what people forget
Rebuild windows are dangerous
During rebuild:
- performance often degrades
- the array is in a more fragile state (another disk failure can be catastrophic depending on RAID level)
This is why large HDD arrays often prefer RAID 6 over RAID 5.
RAID is not a backup
RAID protects against disk failure, not:
- accidental deletion
- ransomware
- application bugs that corrupt data
You still need backups and restore drills.
Monitoring RAID health
You should be able to answer at any moment:
- Is the array degraded?
- Is a rebuild happening?
- How far along is it?
Commands:
1 | cat /proc/mdstat |
LVM in production: snapshots, rescue workflows, and practical patterns
Snapshots (conceptual)
LVM snapshots can help with:
- short maintenance windows
- consistency points before risky operations
But snapshots are not free; they consume space as changes accumulate. If the snapshot fills, it becomes invalid. The safe mindset is: snapshots help you roll back quickly, but they do not replace backups.
Growing vs shrinking
Growing is often safe if filesystem supports it; shrinking is harder:
- ext4 can be shrunk offline (carefully)
- xfs cannot be shrunk (typical approach is migrate data to a new LV)
This is one reason people prefer to “ grow-only ” and plan headroom.
Filesystem repair and “ read-only remount ” incidents
Sometimes the kernel remounts a filesystem as read-only to prevent further corruption. Symptoms:
- writes fail with “ Read-only file system ”
- services crash on writes
First check logs:
1 | dmesg | tail -n 200 |
Then consider a controlled repair:
- ext4:
fsck(offline; requires unmounted filesystem) - xfs:
xfs_repair(offline; requires unmounted filesystem)
Be careful: repair tools can change data structures. If this is production data, take snapshots/backups first.
Disk health: SMART, bad sectors, and when to replace hardware
If you see intermittent I/O errors, timeouts, or “ hung task ” warnings, don ’ t assume it ’ s software. Check disk health.
Install tools (varies by distro) and inspect SMART:
1 | sudo smartctl -a /dev/sda |
Things that matter:
- reallocated sector count (HDD)
- media errors
- device temperature
If the trend is worsening, replacement is often the correct fix.
Real-world troubleshooting playbook (what to do when something breaks)
“ Disk full ” but you deleted files
This is almost always:
- deleted file still open by a process
Confirm:
1 | sudo lsof | grep '(deleted)' |
Fix: restart the process holding the file, or rotate logs correctly.
“ Device or resource busy ” on unmount
Find who is using the mount:
1 | sudo lsof +D /mnt/data | head |
“ Mount fails after reboot ”
Common causes:
- wrong UUID in
/etc/fstab - missing filesystem driver/module
- ordering: trying to mount before RAID/LVM is ready
Use mount -a to test, and review boot logs.
“ Performance suddenly got worse ”
Checklist:
iostat -x 1(is the disk saturated?)vmstat 1(is there I/O wait / swapping?)dmesg(are there I/O errors?)- Is RAID rebuilding?
- Did a backup/log job start?
A worked example: minimal-downtime capacity expansion for a growing service
Scenario:
- a service writes to
/data - disk usage is approaching 80%
- you want to expand with minimal downtime
One practical pattern:
- Attach a new disk.
- Enroll it into LVM as a PV.
- Extend the VG, then extend the LV backing
/data. - Grow the filesystem online (if supported).
- Verify with
df -hand run a small write test.
Example commands (adjust to your VG/LV names):
1 | sudo pvcreate /dev/sdb |
If you ’ re using xfs:
1 | sudo xfs_growfs /data |
The key operational idea is to remove “ migration ” from the critical path. With LVM, you can often expand in-place.
How disk space is reported (df vs du) and why the numbers disagree
This is a recurring ops confusion, and it matters when you are debugging “ where did my space go?”
df answers:
how full is the filesystem?
df reports filesystem-level allocation (blocks reserved,
metadata, etc.):
1 | df -h |
du
answers: how much space do these paths account for?
du walks directories and sums file sizes (as seen by
directory entries):
1 | sudo du -h -d 1 /var | sort -h |
Why
df says “ full ” but du can ’ t find the
culprit
Common causes:
- Deleted-but-open files (logs are the most common)
- Mount confusion (you are looking at a directory that is no longer the mount point you think it is)
- Reserved blocks (e.g., ext4 reserves a percentage for root to keep the system alive)
For deleted-but-open files:
1 | sudo lsof | grep '(deleted)' |
For mount confusion:
1 | mount | grep ' /var ' |
For ext4 reserved blocks:
1 | sudo tune2fs -l /dev/sdb1 | grep -i 'reserved' |
You can reduce reserved blocks on non-root volumes (carefully):
1 | sudo tune2fs -m 1 /dev/sdb1 |
Swap and “ disk pressure masquerading as memory pressure ”
Sometimes the user experience feels like “ disk is slow ”, but the root cause is memory pressure leading to swapping, which then produces heavy disk I/O.
Check swap usage:
1 | free -h |
If swap is actively used and the system is thrashing, you will see high I/O wait and high latency. The correct fix is usually:
- add memory
- reduce memory footprint
- tune workload
Swap is a safety net, not a performance plan.
Common mount topologies for web stacks (why layouts matter)
For a typical web/app server, a reasonable layout often separates:
/(OS + core binaries)/var(logs, package caches, some DBs depending on layout)/dataor/srv(application data)
Why this helps:
- logs can ’ t fill the root filesystem and break boot/login
- you can snapshot or expand data volumes independently
- permissions and ownership can be scoped more cleanly
In cloud environments, this often maps naturally to separate block volumes attached to the instance.
A short “ decision tree ” for day-2 operations
If you want a quick mental flow:
- Need redundancy? → RAID 1/10 (latency) or RAID 6 (big HDD arrays)
- Need flexible growth? → put data under LVM (VG/LV), plan for grow-only
- Need predictable behavior? → ext4; need huge scale/parallel I/O → xfs
- Debugging space issues? →
df+du+lsof (deleted)+findmnt
The point isn ’ t to memorize more commands; it ’ s to know which layer you ’ re operating on (hardware → block → RAID/LVM → filesystem → application).
Practical command appendix (small but complete)
This section is deliberately “ boring ”: it ’ s a compact list you can copy when you ’ re on-call.
Discover and inspect
1 | lsblk -f |
Partitioning
1 | sudo fdisk -l |
Filesystem creation and checks
1 | sudo mkfs.ext4 /dev/sdb1 |
Mounting and persistence
1 | sudo mount /dev/sdb1 /mnt/data |
RAID (mdadm)
1 | cat /proc/mdstat |
LVM
1 | sudo pvs |
“ Space not reclaimed ”
1 | sudo lsof | grep '(deleted)' |
Kernel messages for I/O issues
1 | dmesg | tail -n 200 |
SMART health (if available)
1 | sudo smartctl -a /dev/sda | head -n 60 |
Two “ save you at 3am ” reminders
- Always double-check the target device before destructive operations.
If you are unsure, stop and re-run
lsblk -f. - After any change to partitioning/RAID/LVM, verify the layer you just changed is visible before moving to the next layer (block → md → lvm → filesystem → mount).
- Post title:Linux Disk Management: From Hardware to Filesystems (RAID, LVM, GPT/MBR, Mounting, and Recovery)
- Post author:Chen Kai
- Create time:2022-12-25 00:00:00
- Post link:https://www.chenk.top/en/linux-disk-management/
- Copyright Notice:All articles in this blog are licensed under BY-NC-SA unless stating additionally.