Linux Disk Management: From Hardware to Filesystems (RAID, LVM, GPT/MBR, Mounting, and Recovery)
Chen Kai BOSS

Disk issues in production are rarely fixed by “ one magic command ”. You ’ re usually dealing with a whole stack: hardware behavior (HDD vs SSD), block devices and partition tables, RAID/LVM layering, and finally filesystem semantics (inodes, links, deletion, and why space doesn ’ t come back). This post walks the end-to-end workflow — identify a new disk, partition it, format it, mount it, make it persistent, expand capacity with minimal downtime, and debug the common failure modes — while also explaining the underlying mechanisms so you can reason about what the system is doing.

Storage basics: what you ’ re really buying (latency vs throughput vs safety)

Before you touch a single command, it helps to have the right mental model.

Hot vs cold storage (SSD vs HDD) and the “ random I/O tax ”

SSD (hot storage) is great when you need low latency and fast random reads/writes (databases, caches, indexes). HDD (cold storage) is great when you need cheap capacity and large sequential throughput (archives, backups, large logs).

Where the big difference comes from:

  • HDD random I/O pays two mechanical waits: seek time (move head) + rotational latency (wait for the sector to rotate under the head).
  • SSD is electronic; random I/O is much closer to sequential, but writes have their own complexity (erase blocks, garbage collection, write amplification).

Practical takeaway:

  • If a workload becomes random-I/O heavy on HDD, performance can collapse even if “ MB/s ” looks fine for sequential tests.
  • If you saturate SSD writes, you may see latency spikes due to internal garbage collection.

What is a “ sector ”, what is a filesystem “ block ”, and why small files waste space

Disks store data in sectors (historically 512B; many drives are 4K physical sectors). Filesystems allocate in blocks (allocation units). A file cannot occupy “ half a block ”, so a 1-byte file still consumes at least one block plus metadata.

This explains real-world surprises:

  • “ My directory of tiny files is huge on disk.”
  • du and ls -l report different sizes.”

TRIM on SSD and “ can deleted data be recovered?”

On HDD, deletion typically only removes directory entries and metadata; the old data may remain until overwritten. On SSD, after deletion the OS may issue TRIM/discard, and the device may reclaim blocks quickly. That ’ s why recovery assumptions differ.

Object storage is a different abstraction (S3/OSS)

If your “ disk problem ” is really “ I have too many blobs to manage on one VM ”, you often want object storage instead of endlessly growing a filesystem.


Block devices in Linux: how disks show up (and how to not shoot yourself)

The core commands to identify hardware and mapping

1
2
3
lsblk -f
sudo fdisk -l
sudo blkid

What you ’ re looking for:

  • device name: /dev/sda, /dev/nvme0n1, etc.
  • partitions: /dev/sda1, /dev/nvme0n1p1
  • filesystem type and UUID (for persistent mounts)

Naming pitfalls: why /dev/sdb can “ change ”

Device names can change across reboots (especially with multiple disks). For persistence:

  • mount by UUID
  • or use stable paths like /dev/disk/by-uuid/ and /dev/disk/by-id/

Partition tables: GPT vs MBR (and what tools to use)

MBR vs GPT (decision guide)

  • MBR: legacy, limited partitioning model, historically painful for large disks in old BIOS setups.
  • GPT: modern standard (UEFI-friendly), more partitions, better metadata and robustness.

In practice: use GPT unless you are constrained by old hardware/boot modes.

Tools: fdisk vs gdisk vs parted

  • fdisk: common, works for MBR and (on modern distros) GPT too
  • gdisk: GPT-focused
  • parted: convenient for some scripted workflows

Example: create a partition (high-level)

1
sudo fdisk /dev/sdb

Typical flow inside fdisk:

  • create a new partition
  • write changes
  • re-read partition table (or reboot if required)

Afterwards verify:

1
lsblk -f

Filesystems: format, mount, and persist with fstab

Choose a filesystem: ext4 vs xfs

  • ext4: common default, solid general-purpose filesystem
  • xfs: strong for large files and parallel I/O; excellent tooling; must be grown online and cannot be shrunk easily

Format

1
2
3
sudo mkfs.ext4 /dev/sdb1
# or
sudo mkfs.xfs /dev/sdb1

Mount

1
2
3
sudo mkdir -p /mnt/data
sudo mount /dev/sdb1 /mnt/data
df -h

Make mount persistent: /etc/fstab

Always prefer UUID:

1
sudo blkid /dev/sdb1

Example fstab entry:

1
UUID=<uuid>  /mnt/data  ext4  defaults  0  2

Safety tip: after editing fstab, test without reboot:

1
sudo mount -a

If this errors, fix it before rebooting.


RAID: redundancy and performance, with real trade-offs

RAID is about two knobs:

  • availability (tolerate disk failures)
  • performance (especially read throughput)

RAID levels (what people actually choose)

  • RAID 0: fastest, no redundancy (one disk fails → everything fails)
  • RAID 1: mirroring (capacity ~50%), simple redundancy
  • RAID 5: parity, tolerate 1 disk failure; write penalty; rebuild risk on large arrays
  • RAID 6: double parity, tolerate 2 disk failures; more write overhead
  • RAID 10: mirror + stripe; high performance + redundancy; higher cost

When in doubt in production:

  • prefer RAID 10 for latency-sensitive workloads
  • prefer RAID 6 for large HDD arrays where rebuild risk matters

Software RAID on Linux (mdadm)

Create a RAID 1 array:

1
2
3
sudo mdadm --create /dev/md0 --level=1 --raid-devices=2 /dev/sda1 /dev/sdb1
cat /proc/mdstat
sudo mdadm --detail /dev/md0

Persist array definition (file varies by distro):

1
sudo mdadm --detail --scan | sudo tee -a /etc/mdadm.conf

Fail/remove a device (example):

1
sudo mdadm /dev/md0 --fail /dev/sda1 --remove /dev/sda1

Operational rule: always verify rebuild status in /proc/mdstat before assuming you ’ re safe again.


LVM: how to expand disks without re-partitioning pain

LVM is the layer that makes capacity changes manageable. The mental model:

  • PV: a disk/partition enrolled into LVM
  • VG: a pool of capacity built from one or more PVs
  • LV: virtual block devices carved from a VG

Typical expansion workflow (the “ minimal downtime ” playbook)

1
2
3
4
5
6
7
8
9
10
11
12
# 1) Prepare a new disk (or partition) as PV
sudo pvcreate /dev/sdb

# 2) Add it into an existing VG
sudo vgextend vg0 /dev/sdb

# 3) Extend the LV (example: +100G)
sudo lvextend -L +100G /dev/vg0/data

# 4) Grow the filesystem
sudo resize2fs /dev/vg0/data # ext4
sudo xfs_growfs /mount/point # xfs (must be mounted)

Why this works operationally:

  • you can add capacity without moving the old blocks first
  • expansion is often online (service can stay up if filesystem supports it)

A safer “ data migration ” variant (when you really need to move)

If you must migrate data to a new mount, do it in a controlled window:

  • stop writes (or stop the service)
  • snapshot/backup
  • copy with rsync preserving permissions
  • switch mount points
  • verify, then reopen traffic
1
sudo rsync -aHAX --delete /old/ /new/

/dev special devices you ’ ll see in disk work

These are not “ real disks ”, but they matter for ops:

  • /dev/null: discard output
  • /dev/zero: infinite zeros (create files, test throughput)
  • /dev/random / /dev/urandom: randomness sources

Examples:

1
2
# create a 1GB file for testing (fast on many systems)
dd if=/dev/zero of=test.bin bs=1M count=1024

Inodes: why “ file name ” is not “ the file ”

A filename is a directory entry pointing to an inode. The inode points to data blocks.

This helps explain:

  • why hard links work
  • why deleting a file doesn ’ t always reclaim space immediately
  • why inode exhaustion can happen even with free disk space

Check inode usage:

1
df -i
  • Hard link: another directory entry pointing to the same inode (cannot cross filesystems; usually not for directories)
  • Symlink: its own inode containing a path (can cross filesystems; can become dangling)

“ I deleted files but disk space didn ’ t come back ”: the real cause and the fix

The classic root cause is: a process still has the file open.

Find deleted-but-open files:

1
sudo lsof | grep '(deleted)'

Fix options:

  • restart the holding process (common for log files)
  • or rotate logs properly (avoid truncation pitfalls)

This is one of those incidents where understanding filesystem semantics saves hours of guessing.


End-to-end checklist: new disk → usable space → expandable setup

If you want a compact “ do it right ” path:

  1. Identify disk: lsblk -f
  2. Partition (GPT preferred): fdisk/gdisk
  3. Format: mkfs.ext4 or mkfs.xfs
  4. Mount and verify: mount, df -h
  5. Persist mount by UUID: /etc/fstab + mount -a
  6. If you expect growth: plan RAID/LVM from day 1 (don ’ t paint yourself into a corner)

If you can run this checklist confidently, most disk incidents become systematic rather than stressful.


A deeper performance model: why “ MB/s looks fine ” but the service is slow

In production, disk complaints usually show up as one of these:

  • requests timing out even though CPU is low
  • high load average with low CPU utilization
  • periodic latency spikes that correlate with log rotation or backups

A useful lens is to separate throughput from latency:

  • Throughput answers: “ how many MB per second can I stream?”
  • Latency answers: “ how long does one small read/write take?”

Databases and many web workloads care far more about latency than bulk throughput.

Random I/O and IOPS

IOPS (I/O operations per second) is a better metric than MB/s when operations are small (4K –16K). HDD can have decent MB/s sequentially but terrible random IOPS because each random access pays mechanical latency.

Page cache: why reads can be fast until they aren ’ t

Linux aggressively caches file data in memory. This is good. But it can mislead you if you benchmark without clearing cache or if your workload suddenly exceeds memory.

Quick sanity checks:

1
2
free -h
cat /proc/meminfo | head

If you see most “ free ” memory in buff/cache, that is normal and reclaimable.

I/O wait and load average

High load average with low CPU often points to I/O wait. Tools:

1
2
3
top
vmstat 1
iostat -x 1

Look for:

  • high %wa in vmstat
  • high await and low svctm/high utilization in iostat -x

Partition alignment and 4K sectors (the silent performance killer)

Modern disks often have 4K physical sectors even if they expose 512B logical sectors. If partitions are misaligned, a single filesystem write can turn into multiple physical reads/writes.

Practical rule:

  • align partitions to 1MiB boundaries (most modern tools do this by default)

Check alignment (roughly):

1
sudo fdisk -l /dev/sdb

If you see partition starts at 2048 sectors on 512B logical sector disks, you ’ re typically aligned (2048 * 512B = 1MiB).


Filesystem selection and tuning (ext4 vs xfs vs “ what knobs matter ”)

ext4: safe defaults, broad compatibility

ext4 is often a good default because:

  • tooling is mature (fsck, tune2fs)
  • it behaves predictably across workloads

xfs: strong for large volumes and parallel I/O

xfs shines with:

  • large files
  • parallel access patterns
  • big filesystems

Operational note: shrinking xfs is not supported in the usual way; plan capacity accordingly.

Mount options: small changes, big behavior differences

Some options you ’ ll actually care about:

  • noatime: reduce metadata writes from access-time updates (common for read-heavy workloads)
  • discard: continuous TRIM (can add overhead); many setups prefer periodic fstrim instead

Example (conceptual):

1
UUID=<uuid> /mnt/data ext4 defaults,noatime 0 2

For SSD TRIM on a schedule:

1
sudo fstrim -av

RAID in production: rebuild risk, write penalties, and what people forget

Rebuild windows are dangerous

During rebuild:

  • performance often degrades
  • the array is in a more fragile state (another disk failure can be catastrophic depending on RAID level)

This is why large HDD arrays often prefer RAID 6 over RAID 5.

RAID is not a backup

RAID protects against disk failure, not:

  • accidental deletion
  • ransomware
  • application bugs that corrupt data

You still need backups and restore drills.

Monitoring RAID health

You should be able to answer at any moment:

  • Is the array degraded?
  • Is a rebuild happening?
  • How far along is it?

Commands:

1
2
cat /proc/mdstat
sudo mdadm --detail /dev/md0

LVM in production: snapshots, rescue workflows, and practical patterns

Snapshots (conceptual)

LVM snapshots can help with:

  • short maintenance windows
  • consistency points before risky operations

But snapshots are not free; they consume space as changes accumulate. If the snapshot fills, it becomes invalid. The safe mindset is: snapshots help you roll back quickly, but they do not replace backups.

Growing vs shrinking

Growing is often safe if filesystem supports it; shrinking is harder:

  • ext4 can be shrunk offline (carefully)
  • xfs cannot be shrunk (typical approach is migrate data to a new LV)

This is one reason people prefer to “ grow-only ” and plan headroom.


Filesystem repair and “ read-only remount ” incidents

Sometimes the kernel remounts a filesystem as read-only to prevent further corruption. Symptoms:

  • writes fail with “ Read-only file system ”
  • services crash on writes

First check logs:

1
2
dmesg | tail -n 200
journalctl -k --since \"1 hour ago\"

Then consider a controlled repair:

  • ext4: fsck (offline; requires unmounted filesystem)
  • xfs: xfs_repair (offline; requires unmounted filesystem)

Be careful: repair tools can change data structures. If this is production data, take snapshots/backups first.


Disk health: SMART, bad sectors, and when to replace hardware

If you see intermittent I/O errors, timeouts, or “ hung task ” warnings, don ’ t assume it ’ s software. Check disk health.

Install tools (varies by distro) and inspect SMART:

1
sudo smartctl -a /dev/sda

Things that matter:

  • reallocated sector count (HDD)
  • media errors
  • device temperature

If the trend is worsening, replacement is often the correct fix.


Real-world troubleshooting playbook (what to do when something breaks)

“ Disk full ” but you deleted files

This is almost always:

  • deleted file still open by a process

Confirm:

1
sudo lsof | grep '(deleted)'

Fix: restart the process holding the file, or rotate logs correctly.

“ Device or resource busy ” on unmount

Find who is using the mount:

1
2
sudo lsof +D /mnt/data | head
sudo fuser -vm /mnt/data

“ Mount fails after reboot ”

Common causes:

  • wrong UUID in /etc/fstab
  • missing filesystem driver/module
  • ordering: trying to mount before RAID/LVM is ready

Use mount -a to test, and review boot logs.

“ Performance suddenly got worse ”

Checklist:

  1. iostat -x 1 (is the disk saturated?)
  2. vmstat 1 (is there I/O wait / swapping?)
  3. dmesg (are there I/O errors?)
  4. Is RAID rebuilding?
  5. Did a backup/log job start?

A worked example: minimal-downtime capacity expansion for a growing service

Scenario:

  • a service writes to /data
  • disk usage is approaching 80%
  • you want to expand with minimal downtime

One practical pattern:

  1. Attach a new disk.
  2. Enroll it into LVM as a PV.
  3. Extend the VG, then extend the LV backing /data.
  4. Grow the filesystem online (if supported).
  5. Verify with df -h and run a small write test.

Example commands (adjust to your VG/LV names):

1
2
3
4
5
sudo pvcreate /dev/sdb
sudo vgextend vg0 /dev/sdb
sudo lvextend -l +100%FREE /dev/vg0/data
sudo resize2fs /dev/vg0/data
df -h /data

If you ’ re using xfs:

1
sudo xfs_growfs /data

The key operational idea is to remove “ migration ” from the critical path. With LVM, you can often expand in-place.


How disk space is reported (df vs du) and why the numbers disagree

This is a recurring ops confusion, and it matters when you are debugging “ where did my space go?”

df answers: how full is the filesystem?

df reports filesystem-level allocation (blocks reserved, metadata, etc.):

1
df -h

du answers: how much space do these paths account for?

du walks directories and sums file sizes (as seen by directory entries):

1
sudo du -h -d 1 /var | sort -h

Why df says “ full ” but du can ’ t find the culprit

Common causes:

  1. Deleted-but-open files (logs are the most common)
  2. Mount confusion (you are looking at a directory that is no longer the mount point you think it is)
  3. Reserved blocks (e.g., ext4 reserves a percentage for root to keep the system alive)

For deleted-but-open files:

1
sudo lsof | grep '(deleted)'

For mount confusion:

1
2
mount | grep ' /var '
findmnt /var

For ext4 reserved blocks:

1
sudo tune2fs -l /dev/sdb1 | grep -i 'reserved'

You can reduce reserved blocks on non-root volumes (carefully):

1
sudo tune2fs -m 1 /dev/sdb1

Swap and “ disk pressure masquerading as memory pressure ”

Sometimes the user experience feels like “ disk is slow ”, but the root cause is memory pressure leading to swapping, which then produces heavy disk I/O.

Check swap usage:

1
2
free -h
swapon --show

If swap is actively used and the system is thrashing, you will see high I/O wait and high latency. The correct fix is usually:

  • add memory
  • reduce memory footprint
  • tune workload

Swap is a safety net, not a performance plan.


Common mount topologies for web stacks (why layouts matter)

For a typical web/app server, a reasonable layout often separates:

  • / (OS + core binaries)
  • /var (logs, package caches, some DBs depending on layout)
  • /data or /srv (application data)

Why this helps:

  • logs can ’ t fill the root filesystem and break boot/login
  • you can snapshot or expand data volumes independently
  • permissions and ownership can be scoped more cleanly

In cloud environments, this often maps naturally to separate block volumes attached to the instance.


A short “ decision tree ” for day-2 operations

If you want a quick mental flow:

  1. Need redundancy? → RAID 1/10 (latency) or RAID 6 (big HDD arrays)
  2. Need flexible growth? → put data under LVM (VG/LV), plan for grow-only
  3. Need predictable behavior? → ext4; need huge scale/parallel I/O → xfs
  4. Debugging space issues?df + du + lsof (deleted) + findmnt

The point isn ’ t to memorize more commands; it ’ s to know which layer you ’ re operating on (hardware → block → RAID/LVM → filesystem → application).


Practical command appendix (small but complete)

This section is deliberately “ boring ”: it ’ s a compact list you can copy when you ’ re on-call.

Discover and inspect

1
2
3
4
5
6
lsblk -f
findmnt
mount
df -h
df -i
sudo blkid

Partitioning

1
2
3
sudo fdisk -l
sudo fdisk /dev/sdb
sudo gdisk /dev/sdb

Filesystem creation and checks

1
2
3
4
5
sudo mkfs.ext4 /dev/sdb1
sudo mkfs.xfs /dev/sdb1

# ext4 info
sudo tune2fs -l /dev/sdb1 | head

Mounting and persistence

1
2
3
sudo mount /dev/sdb1 /mnt/data
sudo umount /mnt/data
sudo mount -a

RAID (mdadm)

1
2
cat /proc/mdstat
sudo mdadm --detail /dev/md0

LVM

1
2
3
4
5
6
7
sudo pvs
sudo vgs
sudo lvs

sudo pvcreate /dev/sdb
sudo vgextend vg0 /dev/sdb
sudo lvextend -L +100G /dev/vg0/data

“ Space not reclaimed ”

1
sudo lsof | grep '(deleted)'

Kernel messages for I/O issues

1
2
dmesg | tail -n 200
journalctl -k --since \"1 hour ago\" | tail -n 200

SMART health (if available)

1
sudo smartctl -a /dev/sda | head -n 60

Two “ save you at 3am ” reminders

  • Always double-check the target device before destructive operations. If you are unsure, stop and re-run lsblk -f.
  • After any change to partitioning/RAID/LVM, verify the layer you just changed is visible before moving to the next layer (block → md → lvm → filesystem → mount).
  • Post title:Linux Disk Management: From Hardware to Filesystems (RAID, LVM, GPT/MBR, Mounting, and Recovery)
  • Post author:Chen Kai
  • Create time:2022-12-25 00:00:00
  • Post link:https://www.chenk.top/en/linux-disk-management/
  • Copyright Notice:All articles in this blog are licensed under BY-NC-SA unless stating additionally.
 Comments