Sparsity is a ubiquitous feature in nature and data. Compressed sensing exploits this property to achieve the "less is more" miracle.
Introduction: From JPEG to MRI — Sparsity is Everywhere
Have you ever wondered why a 10MB raw photo can be compressed into a few hundred KB JPEG file with almost no visible difference? Why can MRI scan time be reduced from 30 minutes to 5 minutes?
The answer is sparsity— signals, when represented in some appropriate basis, have most coefficients equal to zero or close to zero.
Sparsity in Daily Life: - Images: Natural images are sparse in the wavelet basis (most wavelet coefficients are small) - Audio: Speech signals are sparse in the Fourier basis (only a few frequencies have energy) - Text: An article uses only a small fraction of the vocabulary - Social Networks: A person knows only a tiny fraction of all people
The Revolutionary Idea of Compressed Sensing: Since signals are sparse, can we recover them using fewer measurements than traditional methods require?
The answer is yes! This is the core content of this chapter.
Learning Objectives
- Sparse Representation: Understanding signal sparse decomposition in dictionaries
- L0 and L1 Norms: Why does L1 promote sparsity?
- Compressed Sensing Theory: RIP conditions, measurement matrix design
- Algorithm Implementation: LASSO, OMP, ISTA
- Applications: MRI imaging, single-pixel cameras, image compression
Mathematical Foundations of Sparse Representation
What is Sparsity?
Definition: A vector
Mathematical Notation:
Examples: -
Approximately Sparse: In practice, signals are usually approximately sparse— most coefficients are small but not exactly zero.
Analogy: Imagine a class's grade distribution. If only a few students got high scores and most got zero or low scores, this grade vector is sparse.
稀疏矩阵与压缩感知/fig1.png)
Sparse Representation and Dictionaries
Most signals are not sparse themselves, but may be sparse in some transform domain or dictionary.
Mathematical Expression: Given a signal
Common Dictionaries:
| Dictionary Type | Suitable Signals | Characteristics |
|---|---|---|
| Fourier basis | Periodic signals, audio | Frequency-domain sparse |
| Wavelet basis | Images, non-stationary signals | Time-frequency localization |
| DCT basis | Images (JPEG) | Similar to Fourier, real-valued |
| Learned dictionary | Domain-specific data | Data-driven |
Overcomplete Dictionary: When
The Sparse Coding Problem
Problem: Given signal
Difficulty: This is an NP-hard problem!
Why NP-hard? - Need to search all possible nonzero
element position combinations - For an
Analogy: It's like finding a book in a library when you only know the general content, and the book could be in any location. You'd need to check every possible combination of locations, which is infeasible when there are many books.
Uniqueness of Sparse Representation
Question: Even if we find a sparse solution, is it unique?
Theorem (Uniqueness Condition): If
Here
Intuition: Mutual coherence measures the maximum similarity between atoms in the dictionary. If atoms are very different from each other (low mutual coherence), sparse solutions are more likely to be unique.
L1 Regularization: Convex Relaxation of Sparsity
From L0 to L1
Since L0 optimization is NP-hard, we need a computable alternative.
Key Observation: The L1 norm is the best
convex relaxation of the L0 norm.
Convex Relaxation Problem:
Why Does L1 Promote Sparsity? — Geometric Explanation
This is one of the most important intuitions in this chapter.
Shape of the L1 Ball: In 2D, the L1 unit ball is a
diamond (square rotated 45 degrees):
Shape of the L2 Ball: In 2D, the L2 unit ball is a
circle:
Key Difference: The L1 ball has "sharp corners" on the coordinate axes, while the L2 ball is smooth.
Geometric Intuition: 1. The constraint
Analogy: Imagine you're in a room and need to touch a wall (constraint). If you're holding a diamond-shaped balloon (L1 ball) that's slowly inflating, the balloon will most likely first touch the wall at a corner. But if you're holding a round balloon (L2 ball), it could touch the wall anywhere.
Comparison of L1 and L2 Regularization
| Property | L1 Regularization | L2 Regularization |
|---|---|---|
| Unit ball shape | Diamond (has corners) | Sphere (smooth) |
| Solution characteristics | Sparse (many exact zeros) | Small but nonzero |
| Geometric intuition | Tends to be on coordinate axes | Tends to shrink uniformly |
| Optimization | Non-smooth, needs special algorithms | Smooth, gradient descent works |
| Applications | Feature selection, compressed sensing | Prevent overfitting, ridge regression |
Elastic Net: Combining L1 and L2
Elastic Net combines the advantages of both:
Advantages: - Inherits L1's sparsity - L2 component provides stability, especially when features are highly correlated - Can select feature groups (grouping effect)
LASSO Regression
Definition of LASSO
LASSO (Least Absolute Shrinkage and Selection
Operator) is the most important sparsity method in statistics:
Equivalent Form (constrained form):
The LASSO Solution
Subgradient Condition: The optimal solution
Soft Thresholding Operation
For the simple case (orthogonal design
Comparison with Hard Thresholding: - Soft thresholding: Smoothly shrinks small coefficients to zero - Hard thresholding: Directly sets small coefficients to zero, leaves large ones unchanged
Visualization: The soft thresholding function acts like a "squeeze" operation, pushing all coefficients toward zero, with small coefficients becoming exactly zero.
稀疏矩阵与压缩感知/fig2.png)
Properties of LASSO
Sparsity: LASSO automatically sets unimportant feature coefficients to exactly zero.
Feature Selection: Nonzero coefficients correspond to selected features — this is automatic feature selection.
Bias-Variance Trade-off: - Large
Choosing
The LASSO Path
The LASSO path is the trajectory of
Characteristics: - When
The LARS algorithm (Least Angle Regression) can efficiently compute the entire LASSO path.
Compressed Sensing Theory
Core Idea
Traditional Sampling: The Shannon-Nyquist theorem requires sampling rate at least twice the signal's highest frequency.
Compressed Sensing: If the signal is sparse, we can accurately recover it using far fewer measurements than the Nyquist rate!
Revolutionary Implications: - MRI scan time can be dramatically reduced - Sensors can be simpler and cheaper - Data storage and transmission become more efficient
Measurement Model where:
-
Problem: Recover the
Key Insight: Sparsity provides additional constraints, making recovery possible.
Restricted Isometry Property (RIP)
Definition: Matrix
Intuition:
Geometric Meaning: When
Analogy: Imagine projecting a high-dimensional object onto lower dimensions. RIP ensures the projection doesn't map different sparse vectors to the same point (otherwise they couldn't be distinguished).
Recovery Theorem
Theorem (Cand è s-Tao): If
Number of Measurements Required: Random matrices
satisfying RIP exist, requiring only
Significance: - Traditional methods need
稀疏矩阵与压缩感知/fig3.png)
Measurement Matrix Design
Random Matrices (most common): - Gaussian random
matrix:
Advantages of Random Matrices: - Satisfy RIP with high probability - Signal-independent (universality) - Easy to generate
Deterministic Matrices: Exist but construction is complex, commonly used for specific applications.
Noisy Case
Basis Pursuit Denoising (BPDN):
Recovery Guarantee: If
Algorithms
Iterative Shrinkage-Thresholding Algorithm (ISTA)
ISTA is a classic algorithm for solving LASSO problems.
Update Rule:
Algorithm Interpretation: 1. Gradient descent step:$^{(k)} - T({(k)} - )$2. Soft thresholding step: Promotes sparsity
Convergence Rate:
Fast Iterative Shrinkage-Thresholding Algorithm (FISTA)
FISTA uses Nesterov acceleration:
Convergence Rate:
Orthogonal Matching Pursuit (OMP)
OMP is a greedy algorithm:
Algorithm Steps: 1. Initialize: residual
Coordinate Descent
Idea: Optimize one coordinate at a time.
For LASSO:
Advantages: Very efficient for large-scale sparse problems.
Applications
Medical Imaging: Compressed Sensing MRI
Background: MRI scanning requires acquiring large amounts of k-space data, which is time-consuming.
Compressed Sensing MRI: - Acquire only partial k-space data (random undersampling) - Exploit image sparsity in wavelet domain - Reconstruct image via L1 optimization
Result: Scan time reduced by 2-8x, especially useful for motion-sensitive imaging.
Practical Systems: Since 2017, FDA has approved multiple compressed sensing MRI systems.
Single-Pixel Camera
Principle: Capture images with a single photodetector.
How It Works: 1. Use digital micromirror array (DMD)
to generate random patterns 2. Each pattern produces one measurement
(weighted sum of all pixels) 3.
Applications: - Infrared imaging (expensive detectors) - Terahertz imaging - High-speed imaging
Image Compression and Recovery
Image Inpainting: - Known partial pixels, recover complete image - Exploit sparse priors of images (wavelet, DCT, learned dictionary)
Super-Resolution: - Recover high-resolution from low-resolution images - Assume high-resolution image is sparsely representable
稀疏矩阵与压缩感知/fig4.png)
Genomics
Problem: Identify disease-related genes.
Data:
Method: LASSO regression - Sparsity assumption: Only a few genes are related to the disease - Automatically select relevant genes
Finance
Sparse Portfolio: - Invest in only a few assets (reduce transaction costs) - L1 regularization promotes sparse holdings
High-Frequency Trading Signals: - Select key indicators from many features
Sparse Matrix Storage and Computation
Sparse Matrix Formats
COO Format (Coordinate): Store triples
CSR Format (Compressed Sparse Row): -
data: Nonzero element values - indices: Column
indices - indptr: Row pointers
CSC Format (Compressed Sparse Column): Similar to CSR, but stored by column.
Sparse Matrix Operations
Matrix-Vector Multiplication: - Dense:
Deep Understanding: Why Does Compressed Sensing Work?
Counter-Intuitive High-Dimensional Geometry
In high-dimensional space, our intuition often fails:
Volume of High-Dimensional Spheres: - Most volume concentrates near the surface (not at the center) - Volume of high-dimensional balls approaches zero
Structure of Sparse Vectors: - k-sparse vectors live
in the union of
Johnson-Lindenstrauss Lemma
JL Lemma:
Connection to Compressed Sensing: RIP is a strengthened version of the JL lemma for sparse vectors.
Information-Theoretic Perspective
Information Content of Sparse Signals: k-sparse
vectors have about
Lower Bound on Measurements:
Conclusion: The number of measurements in compressed sensing is information-theoretically optimal (up to constants)!
Python Implementation
LASSO Regression
1 | import numpy as np |
稀疏矩阵与压缩感知/fig5.png)
OMP Algorithm
1 | def omp(Phi, y, k): |
ISTA Algorithm
1 | def soft_threshold(x, tau): |
FISTA Algorithm
1 | def fista(Phi, y, lambda_param, max_iter=1000, tol=1e-6): |
Compressed Sensing Experiment
1 | import matplotlib.pyplot as plt |
Summary and Outlook
Key Points of This Chapter
Sparsity is a universal feature of natural signals
L1 norm is the convex relaxation of L0, geometrically promoting sparsity
LASSO simultaneously achieves sparse estimation and feature selection
Compressed Sensing: Recover k-sparse signals from
measurements RIP condition guarantees recovery correctness
Algorithms: ISTA, FISTA, OMP, coordinate descent
Connections to Other Chapters
- Chapter 9 (SVD): Low-rank is another form of "sparsity"
- Chapter 10 (Norms): Properties of L1/L2 norms
- Chapter 11 (Optimization): LASSO is a convex optimization problem
- Chapter 15 (Machine Learning): LASSO used for feature selection
Further Learning
- Structured Sparsity: Group sparsity, tree sparsity
- Dictionary Learning: Data-driven dictionary design
- Matrix Completion: Recovery of low-rank matrices
- Deep Compressed Sensing: Neural networks + sparse recovery
Preview of Next Chapter
"Tensors and Multilinear Algebra"
- Tensors are generalizations of vectors and matrices
- CP decomposition, Tucker decomposition
- Applications: Recommender systems, chemometrics
Exercises
Basic Problems
Prove that the L1 ball has corners on the coordinate axes.
Compute the L0, L1, L2 norms of vector
. Explain why L0 optimization is NP-hard.
Derive the formula for the soft thresholding operation.
Write out the KKT conditions for LASSO and derive the subgradient conditions.
Advanced Problems
Prove: If
satisfies RIP, then different sparse vectors are mapped to different measurements. Implement FISTA and compare convergence speed with ISTA on a synthetic problem.
Prove that the condition number of
is the square of the condition number of , and explain why this makes normal equations less stable for sparse recovery. Derive the update formula for coordinate descent applied to LASSO.
Prove that OMP correctly recovers any k-sparse signal when
satisfies certain coherence conditions.
Programming Problems
Use LASSO for feature selection on the MNIST dataset and visualize selected features.
Implement compressed sensing image reconstruction:
- Take random Fourier measurements of an image
- Reconstruct using total variation minimization
- Compare reconstruction quality vs. number of measurements
Compare recovery performance of OMP and LASSO:
- Generate signals with varying sparsity levels
- Plot success rate vs. sparsity
- Analyze computational time trade-offs
Implement a simple dictionary learning algorithm:
- Alternate between sparse coding (fix dictionary, update coefficients)
- And dictionary update (fix coefficients, update dictionary)
Application Problems
- MRI Reconstruction Simulation:
- Generate a phantom image
- Create random k-space undersampling mask
- Reconstruct using L1-minimization in wavelet domain
- Compare with zero-filled reconstruction
- Time Series Anomaly Detection:
- Model normal behavior as sparse in some basis
- Detect anomalies as deviations from sparse representation
- Apply to real sensor data
Theoretical Problems
Prove the relationship between mutual coherence and spark of a dictionary.
Show that Gaussian random matrices satisfy RIP with high probability when
. Derive the dual problem of LASSO and interpret the dual variables geometrically.
Prove that the LASSO path is piecewise linear and derive the conditions under which a coefficient becomes nonzero.
References
- Cand è s, E. & Wakin, M. (2008). "An
Introduction to Compressive Sampling". IEEE Signal Processing Magazine.
- Accessible introduction to compressed sensing theory
- Tibshirani, R. (1996). "Regression Shrinkage and
Selection via the Lasso". JRSS-B.
- Original LASSO paper
- Elad, M. (2010). Sparse and Redundant
Representations. Springer.
- Comprehensive treatment of sparse representation
- Hastie, T., Tibshirani, R., & Wainwright, M.
(2015). Statistical Learning with Sparsity. CRC Press.
- Modern treatment of sparsity in statistics and machine learning
- Foucart, S. & Rauhut, H. (2013). A
Mathematical Introduction to Compressive Sensing. Springer.
- Rigorous mathematical foundations
- Beck, A. & Teboulle, M. (2009). "A Fast
Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems".
SIAM J. Imaging Sciences.
- Original FISTA paper
This is Chapter 12 of the 18-part "Essence of Linear Algebra" series.
- Post title:Essence of Linear Algebra (12): Sparse Matrices and Compressed Sensing
- Post author:Chen Kai
- Create time:2019-03-04 11:30:00
- Post link:https://www.chenk.top/chapter-12-sparse-matrices-and-compressed-sensing/
- Copyright Notice:All articles in this blog are licensed under BY-NC-SA unless stating additionally.