The real productivity jump on Linux isn't memorizing more commands — it's learning how to compose small tools into clear data flows. The pipe operator | embodies the core Unix philosophy: make each small tool do one thing (grep only filters, awk only extracts fields, sort only sorts), then chain them into a readable, debuggable pipeline. This post starts from the data flow model (stdin/stdout/stderr), systematically explains semantic differences between pipes and redirection (>, >>, 2>, 2>&1, < each do what), then fills in typical patterns for log triage, text filtering, statistical aggregation, and batch processing (when to use grep/awk/sed/sort/uniq/wc/cut/tr, how to progressively narrow scope), and uses practical cases (Nginx log analysis, batch file operations, safe deletion) to cover pitfalls like "spaces and newlines" (correct usage of find -print0 + xargs -0). After reading, you should be able to replace many "need to write a script" small tasks with one or two readable command lines and more easily understand others' one-liners.

Data Flow Model: stdin/stdout/stderr and File Descriptors

Three Standard Streams

Every Linux process has three standard streams:

Stream	File Descriptor	Default Behavior	Example
stdin	0	Read input from keyboard	`cat` (waits for user input when no args)
stdout	1	Output to screen	`echo "hello"`
stderr	2	Error output to screen	`ls /nonexistent 2>&1`

Why separate stdout and stderr?

Normal output and error output can be handled separately (like normal output saved to file, error output displayed on screen)
Pipe | only passes stdout (doesn't pass stderr), so error messages don't pollute data flow

Example:

1
2
3

ls /nonexistent  # Error message outputs to stderr (screen)
ls /nonexistent 2> err.log  # Error message redirected to err.log
ls /nonexistent 2>&1  # stderr redirected to stdout (merged into same stream)

File Descriptors (FD)

File descriptors are process's "handles" for opened files, represented by integers:

0: stdin
1: stdout
2: stderr
3+: Files opened by process itself

View process's open file descriptors:

1	ls -l /proc/$$/fd #$$is current shell's PID

Example output:

1
2
3

lrwx------ 1 user user 0 /proc/12345/fd/0 -> /dev/pts/0  # stdin
lrwx------ 1 user user 0 /proc/12345/fd/1 -> /dev/pts/0  # stdout
lrwx------ 1 user user 0 /proc/12345/fd/2 -> /dev/pts/0  # stderr

Redirection: Controlling Data Flow Direction

Output Redirection (stdout)

1 2	echo "hello" > file.txt # Overwrite (file cleared if exists) echo "world" >> file.txt # Append (add to end of file)

Common usage:

1 2	ls -l > filelist.txt # Save file list date >> log.txt # Append timestamp to log

Error Output Redirection (stderr)

1 2	ls /nonexistent 2> err.log # Error output redirected to err.log ls /nonexistent 2>> err.log # Error output appended to err.log

Redirect Both stdout and stderr

Method 1: `2>&1` (Traditional)

1	command > output.log 2>&1 # Both stdout and stderr redirected to output.log

Order matters:

> output.log first redirects stdout to output.log
2>&1 then redirects stderr to stdout's location (also output.log)

Wrong way:

1	command 2>&1 > output.log # Wrong! stderr first redirects to stdout (screen), then stdout redirects to file

Method 2: `&>` (Modern, Recommended)

1 2	command &> output.log # Both stdout and stderr redirected to output.log command &>> output.log # Append mode

Discard Output (/dev/null)

/dev/null is a special "black hole" file; data written to it is discarded.

1
2
3

command > /dev/null  # Discard stdout
command 2> /dev/null  # Discard stderr
command &> /dev/null  # Discard both stdout and stderr

Use cases:

Don't want to see command output (like scripts in cron jobs)
Only care if command succeeded (check exit code via $?)

Input Redirection (stdin)

1	sort < input.txt # Read input from input.txt

Here-document (multi-line input):

cat <<EOF > config.txt
line 1
line 2
line 3
EOF

Here-string (single-line input):

1	grep "error" <<< "ERROR: something bad"

Pipe Operator: Chaining Commands

Core Concept of Pipes

Unix Philosophy: Each tool does one thing, does it well, then combine them via pipes.

Example:

1	cat access.log \| grep "404" \| wc -l

Breakdown: 1. cat access.log: Output log content (stdout) 2. grep "404": Read from stdin, filter lines containing "404" (stdout) 3. wc -l: Read from stdin, count lines (stdout)

Why this design?

Avoids temporary files (data flows in memory, not written to disk)
Strong readability (each step is clear)
Easy debugging (can add pipes step by step, see each step's output)

Debugging Pipes: Using tee

tee can output data to both screen and file simultaneously (like a "T-junction pipe").

1	cat access.log \| grep "404" \| tee filtered.log \| wc -l

tee filtered.log: Saves grep output to filtered.log, while passing to next command
This lets you see intermediate results, helpful for debugging

Text Processing Toolchain: grep/awk/sed/cut/tr/sort/uniq

grep: Filter Lines

grep is the most commonly used text filtering tool for finding matching lines.

Basic usage:

1 2	grep "pattern" file # Find lines matching pattern in file command \| grep "pattern" # Find in command output

Common parameters:

-i: Ignore case
-v: Invert match (only show lines NOT containing pattern)
-n: Show line numbers
-A N: Show N lines after match (After)
-B N: Show N lines before match (Before)
-C N: Show N lines before and after match (Context)
-E: Extended regex (supports |, +, ?, etc.)
-r: Recursively search directory

Practical examples:

1. View Errors in Logs

1 2	grep -i "error" /var/log/syslog # Case-insensitive search for error grep -E "error\|fail\|timeout" /var/log/syslog # Search multiple keywords

2. View Error Context

1	grep -C 3 "OutOfMemoryError" app.log # Show 3 lines before and after error

3. Recursively Search Directory

1 2	grep -r "TODO" /srv/project # Recursively find TODO in project directory grep -rn "import numpy" /srv/project # Find and show line numbers

4. Count Matches

1 2	grep -c "ERROR" app.log # Count lines containing ERROR grep "ERROR" app.log \| wc -l # Same (more common)

awk: Extract Fields and Aggregate

awk is a powerful tool for processing columnar text (like logs, CSV, tables).

Basic concepts:

awk processes text line by line, splitting each line by whitespace (or specified delimiter) into fields
$1 is first field, $2 is second field, $0 is entire line

Common examples:

1. Extract Fields

# Nginx log format: IP - - [time] "GET /path HTTP/1.1" 200 1234
awk '{print$1}' access.log  # Extract IP address (column 1)
awk '{print$7}' access.log  # Extract request path (column 7)
awk '{print$9}' access.log  # Extract status code (column 9)

2. Filter Lines (Like grep)

1 2	awk '/404/ {print$0}' access.log # Only show lines containing 404 awk '$9 >= 400 {print$0}' access.log # Only show lines with status code >= 400

3. Statistics and Aggregation

# Count occurrences of each status code
awk '{count[$9]++} END {for (k in count) print k, count[k]}' access.log

# Count requests per IP
awk '{count[$1]++} END {for (k in count) print k, count[k]}' access.log | sort -nr -k2

4. Custom Delimiter

1 2	# Comma-separated CSV file awk -F',' '{print$2}' data.csv # -F specifies delimiter

sed: Text Replacement and Editing

sed is a stream editor for text replacement, deletion, insertion, etc.

Common examples:

1. Replace Text

1
2
3

sed 's/foo/bar/' file.txt  # Replace first foo with bar on each line
sed 's/foo/bar/g' file.txt  # Replace all foo with bar on each line (g=global)
sed 's/foo/bar/gi' file.txt  # Replace ignoring case

2. Delete Lines

1
2
3

sed '/pattern/d' file.txt  # Delete lines containing pattern
sed '/^$/d' file.txt  # Delete empty lines
sed '1,10d' file.txt  # Delete first 10 lines

3. Insert and Append

1 2	sed '1i\First Line' file.txt # Insert text before first line sed '$a\Last Line' file.txt # Append text after last line

cut/tr/sort/uniq: Simple Efficient Text Tools

cut: Extract Fields (Simple Cases)

1 2	cut -d',' -f1 data.csv # Extract comma-separated first column cut -d':' -f1,7 /etc/passwd # Extract username and shell (columns 1 and 7)

tr: Character Replacement/Deletion

1
2
3

echo "HELLO" | tr 'A-Z' 'a-z'  # Convert to lowercase
echo "a b c" | tr ' ' '\n'  # Replace spaces with newlines
echo "abc123" | tr -d '0-9'  # Delete numbers

sort: Sorting

sort file.txt  # Sort alphabetically
sort -n file.txt  # Sort numerically
sort -r file.txt  # Reverse sort
sort -k2 file.txt  # Sort by second column
sort -u file.txt  # Sort and remove duplicates (equivalent to sort + uniq)

uniq: Remove Duplicates (Only Adjacent Duplicates)

1
2
3

sort file.txt | uniq  # Sort first, then remove duplicates
sort file.txt | uniq -c  # Count occurrences of each line
sort file.txt | uniq -d  # Only show duplicate lines

Important: uniq can only remove adjacent duplicate lines, so usually need to sort first.

Practical Case: Nginx Log Analysis

Suppose you have an Nginx log file access.log, each line formatted like:

1 2	192.168.1.100 - - [28/Jan/2025:12:00:00 +0000] "GET /api/users HTTP/1.1" 200 1234 192.168.1.101 - - [28/Jan/2025:12:00:01 +0000] "POST /api/login HTTP/1.1" 404 567

1. Count Top Visiting IPs

1	awk '{print$1}' access.log \| sort \| uniq -c \| sort -nr \| head -10

Breakdown: 1. awk '{print$1}': Extract IP address (column 1) 2. sort: Sort (make identical IPs adjacent) 3. uniq -c: Remove duplicates and count occurrences 4. sort -nr: Sort by count in reverse order (-n numeric, -r reverse) 5. head -10: Show only top 10

2. Count Most Visited URLs

1	awk '{print$7}' access.log \| sort \| uniq -c \| sort -nr \| head -10

$7 is request path (like /api/users)

3. Count Each Status Code's Occurrences

1	awk '{print$9}' access.log \| sort \| uniq -c \| sort -nr

$9 is status code (like 200, 404, 500)

Example output:

1
2
3

1234 200
 567 404
 123 500

4. Find Errors in Last Hour

1	grep "28/Jan/2025:12:" access.log \| grep -E " (4\|5)[0-9]{2} " \| tail -n 100

First grep filters time
Second grep filters 4xx and 5xx status codes
tail -n 100 shows only last 100 lines

xargs: Batch File Processing

xargs converts previous command's output (usually file list) into arguments, passing to next command.

Why xargs Is Needed

Problem: Some commands (like rm, cp, mv) don't support reading arguments from stdin.

1 2	find . -name ".tmp" # Outputs file list find . -name ".tmp" \| rm # ❌ Wrong! rm doesn't read from stdin

Solution: Use xargs to convert file list to arguments

1	find . -name "*.tmp" \| xargs rm # ✅ Correct

Basic Usage

1	echo "file1 file2 file3" \| xargs rm # Delete three files

Advanced Usage: `-i` and Placeholder `{}`

1	find . -name "*.log" \| xargs -i cp {} {}.bak # Copy each file with .bak backup

-i: Enable placeholder {}
{}: Represents each input filename
{}.bak: Add .bak to filename

Handle Filenames with Spaces (Important!)

Problem: Filenames with spaces cause xargs to treat them as multiple arguments.

Wrong example:

1	find . -name "*.txt" \| xargs rm # If filename is "my file.txt", treated as "my" and "file.txt"

Correct approach: Use find -print0 + xargs -0

1	find . -name "*.txt" -print0 \| xargs -0 rm

-print0: Use null character (\0) to separate filenames (instead of newline)
-0: xargs uses null character as delimiter

Or use find -exec (simpler):

1	find . -name "*.txt" -exec rm {} +

Practical Cases: Batch File Operations

Case 1: Batch Rename Files

Suppose you have files img_001.jpg, img_002.jpg, want to rename to photo_001.jpg, photo_002.jpg.

1
2
3

for file in img_*.jpg; do
    mv "$file" "${file/img/photo}"
done

Or use rename command (needs installation):

1	rename 's/img/photo/' img_*.jpg

Case 2: Batch Modify File Permissions

1 2	find /var/www/html -type f -exec chmod 644 {} + # Files to 644 find /var/www/html -type d -exec chmod 755 {} + # Directories to 755

Case 3: Batch Delete Empty Files

1	find /tmp -type f -empty -delete # Delete all empty files

Case 4: Batch Compress Log Files

1	find /var/log -name "*.log" -mtime +7 -exec gzip {} \;

-mtime +7: Files modified more than 7 days ago
-exec gzip {} \;: Execute gzip compression on each file

Advanced Techniques

Process Substitution

Syntax: <(command)

Purpose: Treat command output as a temporary file.

Example: Compare two sorted files (without creating temp files)

1	diff <(sort file1.txt) <(sort file2.txt)

Equivalent to:

sort file1.txt > /tmp/sorted1
sort file2.txt > /tmp/sorted2
diff /tmp/sorted1 /tmp/sorted2
rm /tmp/sorted1 /tmp/sorted2

Parallel Processing (xargs -P)

If you have multiple CPU cores, can process multiple files in parallel.

1	find . -name "*.json" -print0 \| xargs -0 -P 8 -n 1 jq -c . > /dev/null

-P 8: Run max 8 processes simultaneously
-n 1: Pass 1 argument to command each time

Safety and Best Practices

1. Never Parse `ls` Output

Wrong example:

1	ls \| xargs rm # ❌ Filenames with spaces will error

Correct approach:

1	find . -maxdepth 1 -type f -print0 \| xargs -0 rm

2. Preview Before Deletion

1 2	find . -name ".tmp" -print # First see which files to delete find . -name ".tmp" -delete # Confirm correct then delete

3. Use `set -e` and `set -o pipefail` (In Scripts)

#!/bin/bash
set -e  # Exit if any command fails
set -o pipefail  # Exit if any command in pipeline fails

# Now if any step fails, script immediately exits
cat file.log | grep "error" | process_errors.sh

4. Importance of Quotes

Wrong example:

1 2	dir="my documents" rm -rf$dir # ❌ Will delete "my" and "documents" two directories

Correct approach:

1	rm -rf "$dir" # ✅ Correctly deletes "my documents" directory

Summary and Further Reading

This article covers the core content of Linux file operations and pipes: 1. ✅ Data flow model (stdin/stdout/stderr, file descriptors) 2. ✅ Redirection (>, >>, 2>, 2>&1, <) 3. ✅ Pipe operator (| principles and debugging techniques) 4. ✅ Text processing toolchain (grep/awk/sed/cut/tr/sort/uniq) 5. ✅ Practical cases (Nginx log analysis, batch file operations) 6. ✅ Correct xargs usage (handling spaces, parallel processing) 7. ✅ Safety and best practices (don't parse ls, preview before delete, correct quoting)

Further Reading:

The Art of Command Line: Command-line tips encyclopedia
man bash: View detailed Bash manual (redirection, pipes, etc.)
man 1 awk: View detailed awk manual

Next Steps:

《 Linux User Management 》: Learn how to manage users/groups, /etc/passwd, /etc/shadow, sudo configuration

By this point, you should have upgraded from "can use pipes" to "can write readable debuggable one-liners, can quickly analyze logs, can safely batch-process files." Pipes and text processing are core Linux capabilities; mastering them makes you much more efficient at ops tasks.

Data Flow Model: stdin/stdout/stderr and File Descriptors

Three Standard Streams

File Descriptors (FD)

Redirection: Controlling Data Flow Direction

Output Redirection (stdout)

Error Output Redirection (stderr)

Redirect Both stdout and stderr

Method 1: 2>&1 (Traditional)

Method 2: &> (Modern, Recommended)

Discard Output (/dev/null)

Input Redirection (stdin)

Pipe Operator: Chaining Commands

Core Concept of Pipes

Debugging Pipes: Using tee

Text Processing Toolchain: grep/awk/sed/cut/tr/sort/uniq

grep: Filter Lines

1. View Errors in Logs

2. View Error Context

3. Recursively Search Directory

4. Count Matches

awk: Extract Fields and Aggregate

1. Extract Fields

2. Filter Lines (Like grep)

3. Statistics and Aggregation

4. Custom Delimiter

sed: Text Replacement and Editing

1. Replace Text

2. Delete Lines

3. Insert and Append

cut/tr/sort/uniq: Simple Efficient Text Tools

cut: Extract Fields (Simple Cases)

tr: Character Replacement/Deletion

sort: Sorting

uniq: Remove Duplicates (Only Adjacent Duplicates)

Practical Case: Nginx Log Analysis

1. Count Top Visiting IPs

2. Count Most Visited URLs

3. Count Each Status Code's Occurrences

4. Find Errors in Last Hour

xargs: Batch File Processing

Why xargs Is Needed

Basic Usage

Advanced Usage: -i and Placeholder {}

Handle Filenames with Spaces (Important!)

Practical Cases: Batch File Operations

Case 1: Batch Rename Files

Case 2: Batch Modify File Permissions

Case 3: Batch Delete Empty Files

Case 4: Batch Compress Log Files

Advanced Techniques

Process Substitution

Parallel Processing (xargs -P)

Safety and Best Practices

1. Never Parse ls Output

2. Preview Before Deletion

3. Use set -e and set -o pipefail (In Scripts)

4. Importance of Quotes

Summary and Further Reading

Method 1: `2>&1` (Traditional)

Method 2: `&>` (Modern, Recommended)

Advanced Usage: `-i` and Placeholder `{}`

1. Never Parse `ls` Output

3. Use `set -e` and `set -o pipefail` (In Scripts)