Essence of Linear Algebra (14): Random Matrix Theory

When you fill a huge matrix with random numbers and compute its eigenvalues, something magical happens: the distribution of these eigenvalues exhibits stunning regularity. It is like finding order in chaos, hearing music in noise. Random matrix theory tells us that when dimensions are high enough, randomness itself gives rise to profound mathematical structure.

Starting from Intuition: Why Aren't Random Matrices "Random"?

Imagine you are in a huge concert hall where ten thousand people are simultaneously randomly hitting keyboards. Intuitively, this should produce pure noise. But if you analyze the frequency distribution of these sounds using Fourier analysis, you will find certain statistical patterns always emerge — not because people are coordinating, but because of the magical manifestation of the law of large numbers and central limit theorem in high-dimensional space.

Random matrices work the same way. A matrix has one million random elements, yet its eigenvalue distribution exhibits a precisely predictable shape. This "order in chaos" is the core charm of random matrix theory.

Definition and Classification of Random Matrices

What is a Random Matrix?

A random matrix is a matrix whose elements are random variables. This sounds simple, but this definition conceals rich mathematical structure.

Letbe anmatrix. If each elementis a random variable on some probability space, thenis a random matrix.

The simplest example: generate a matrix where each element is independently drawn from the standard normal distribution.

1
2
3

import numpy as np
n = 100
A = np.random.randn(n, n)  # This is a random matrix

Core Questions

The core question in random matrix theory is: What are the statistical properties of eigenvalues when matrix dimension?

The answer to this question is surprisingly universal — regardless of what distribution generates the matrix elements, as long as certain basic conditions are satisfied, the eigenvalue distribution converges to the same limiting shape.

Main Random Matrix Models

Wigner Matrices (Symmetric/Hermitian Random Matrices)

Wigner matrices are the most classical random matrix model. Letbe anreal symmetric matrix where: - Diagonal elementsare i.i.d. with mean 0 and variance - Upper triangular elements () are i.i.d. with mean 0 and variance - Lower triangular elements are determined by symmetry:The most common example is the Gaussian Orthogonal Ensemble (GOE), where all random variables follow Gaussian distributions.

Real-life analogy: Imagine a social network whererepresents the "closeness" between personand person. If these closeness values are randomly generated (like randomly paired strangers), then this "closeness matrix" is a Wigner matrix.

Wishart Matrices (Sample Covariance Matrices)

Letbe anrandom matrix with i.i.d. elements. The Wishart matrix is defined as:This is exactly the form of sample covariance matrices in statistics! If you collectsamples, each withfeatures, then the sample covariance matrix is a Wishart matrix.

Real-life analogy: Suppose you are a fund manager tracking 500 stocks, recording their daily returns. After a year you have about 250 trading days of data. The covariance matrix you compute is aWishart matrix with,.

Other Important Models

Gaussian Unitary Ensemble (GUE): Complex elements satisfying Hermitian condition
Gaussian Symplectic Ensemble (GSE): Quaternion elements
Circular Ensembles: Eigenvalues distributed on the unit circle

Wigner Semicircle Law: The "Central Limit Theorem" of Random Matrices

Statement of the Theorem

The Wigner semicircle law is the most fundamental and beautiful result in random matrix theory. It states:

Letbe anWigner matrix with element variance. Define the normalized matrix. As, the empirical eigenvalue distribution ofconverges almost surely to the semicircle distribution:For,.

Intuitive Understanding

Why a semicircle? Here are several intuitive explanations:

Intuition 1: Mechanical Equilibrium

Imagine eigenvalues as charged particles on a line that repel each other (because eigenvalues don't like to "cluster"). At the same time, an external force pulls them toward the origin (the normalization effect). When repulsion and attraction balance, the particle density distribution becomes semicircular.

Intuition 2: High-dimensional Geometry

In high-dimensional space, the "volume" of a unit ball concentrates near the equator. The eigenvalue distribution of random matrices reflects this high-dimensional geometric property — most eigenvalues are neither too large nor too small, distributed in the "middle zone."

Intuition 3: Method of Moments

Mathematically, the classical proof of the semicircle law uses the method of moments. Computing the moments of the eigenvalue distribution reveals they exactly equal the moments of the semicircle distribution. This is like determining a normal distribution through its mean and variance.

Numerical Verification

Let us verify the Wigner semicircle law with code:

import numpy as np
import matplotlib.pyplot as plt

# Parameter setup
n = 2000  # Matrix dimension
num_matrices = 50  # Repetitions for smoother histogram

all_eigenvalues = []

for _ in range(num_matrices):
    # Generate Wigner matrix (GOE)
    A = np.random.randn(n, n)
    A = (A + A.T) / 2  # Symmetrize
    A = A / np.sqrt(n)  # Normalize
    
    # Compute eigenvalues
    eigenvalues = np.linalg.eigvalsh(A)
    all_eigenvalues.extend(eigenvalues)

# Plot histogram
plt.figure(figsize=(10, 6))
plt.hist(all_eigenvalues, bins=100, density=True, alpha=0.7, 
         label='Empirical distribution')

# Theoretical semicircle distribution (sigma = 1)
x = np.linspace(-2, 2, 1000)
y = np.sqrt(np.maximum(4 - x**2, 0)) / (2 * np.pi)
plt.plot(x, y, 'r-', linewidth=2, label='Wigner semicircle')

plt.xlabel('Eigenvalue', fontsize=12)
plt.ylabel('Density', fontsize=12)
plt.legend(fontsize=11)
plt.title(f'Wigner Semicircle Law (n = {n})', fontsize=14)
plt.grid(True, alpha=0.3)
plt.show()

Running this code, you will see the empirical distribution perfectly matches the theoretical semicircle curve. The precision of this match never fails to amaze.

Universality: Why the Distribution Doesn't Matter

A stunning feature of the Wigner semicircle law is universality: regardless of what distribution generates matrix elements (Gaussian, uniform, discrete...), as long as basic conditions like zero mean, finite variance, and independence are satisfied, the limiting distribution is always semicircular.

This is like the central limit theorem — no matter what the original distribution is, the sum of enough independent random variables tends toward a normal distribution. The semicircle law is the "central limit theorem" of random matrix theory.

Marchenko-Pastur Distribution: The Limit for Sample Covariance Matrices

Background

In statistics and data science, we often need to estimate covariance matrices. Suppose we haveindependent samples, each a-dimensional vector. The sample covariance matrix is:whereis andata matrix.

Key question: When bothandare large, what is the eigenvalue distribution of?

Statement of the Theorem

Letbe anrandom matrix with i.i.d. elements having mean 0 and variance 1. Let(called the aspect ratio).

As, the empirical eigenvalue distribution of sample covariance matrixconverges to the Marchenko-Pastur distribution:where the boundaries are:When, all eigenvalues fall in the interval.

When, there arezero eigenvalues, and the rest follow the above distribution.

Intuitive Understanding

Why don't eigenvalues concentrate around 1?

If the population covariance matrix is the identity, ideally the sample covariance matrix should also be close to, with all eigenvalues near 1. But in reality, due to finite sample effects, eigenvalues "spread out."

The aspect ratiocharacterizes this spreading: - Larger(fewer samples relative to dimensions) means wider eigenvalue spread - As(infinitely many samples),, eigenvalues shrink to 1

Real-life analogy: This is like estimating a complex system with limited observations. Fewer observations and more system complexity lead to larger estimation errors. The Marchenko-Pastur distribution precisely quantifies this error.

Numerical Verification

import numpy as np
import matplotlib.pyplot as plt

# Parameter setup
n = 1000  # Number of samples
p = 500   # Dimension
gamma = p / n  # Aspect ratio

num_matrices = 30
all_eigenvalues = []

for _ in range(num_matrices):
    # Generate random data matrix
    X = np.random.randn(n, p)
    
    # Compute sample covariance matrix
    S = X.T @ X / n
    
    # Compute eigenvalues
    eigenvalues = np.linalg.eigvalsh(S)
    all_eigenvalues.extend(eigenvalues)

# Plot histogram
plt.figure(figsize=(10, 6))
plt.hist(all_eigenvalues, bins=100, density=True, alpha=0.7,
         label='Empirical distribution')

# Theoretical Marchenko-Pastur distribution
lambda_minus = (1 - np.sqrt(gamma))**2
lambda_plus = (1 + np.sqrt(gamma))**2

x = np.linspace(lambda_minus + 0.01, lambda_plus - 0.01, 1000)
y = np.sqrt((lambda_plus - x) * (x - lambda_minus)) / (2 * np.pi * gamma * x)
plt.plot(x, y, 'r-', linewidth=2, label='Marchenko-Pastur')

plt.axvline(x=lambda_minus, color='g', linestyle='--', 
            label=f'lambda_minus = {lambda_minus:.3f}')
plt.axvline(x=lambda_plus, color='g', linestyle='--', 
            label=f'lambda_plus = {lambda_plus:.3f}')

plt.xlabel('Eigenvalue', fontsize=12)
plt.ylabel('Density', fontsize=12)
plt.legend(fontsize=10)
plt.title(f'Marchenko-Pastur Law (gamma = {gamma:.2f})', fontsize=14)
plt.grid(True, alpha=0.3)
plt.show()

Fine Structure of Eigenvalues

Empirical Eigenvalue Distribution

Let matrixhave eigenvalues. The empirical eigenvalue distribution (or empirical spectral distribution) is defined as:whereis the Dirac delta function at.

Intuitively, the empirical eigenvalue distribution treats each eigenvalue as a "point mass" and averages them.

Eigenvalue Spacing Distribution

Beyond the overall distribution, the spacing between eigenvalues also follows profound patterns.

Define adjacent eigenvalue spacing. For the Gaussian Orthogonal Ensemble (GOE), the normalized spacing distribution approximately follows the Wigner surmise:

Key observation: As,. This means eigenvalues repel each other — they don't like to get too close.

This is completely different from independent random variables! If eigenvalues were independent, spacing would follow an exponential distribution, with maximum at.

Tracy-Widom Distribution: The Limit of the Largest Eigenvalue

For standard Wigner matrices, the limit distribution of the largest eigenvalueis not Gaussian, but the Tracy-Widom distribution.

Specifically, there exists a scaling factor such that:whereis the cumulative distribution function of the Tracy-Widom distribution.

The Tracy-Widom distribution is highly asymmetric: its left tail decays very fast (super-exponentially), while the right tail decays more slowly. This reflects that the largest eigenvalue has a small probability of being anomalously large.

Applications in Wireless Communications

Introduction to MIMO Systems

MIMO (Multiple-Input Multiple-Output) is a core technology in modern wireless communications. The transmitter hasantennas, and the receiver hasantennas. The channel can be described by anmatrix:whereis the transmitted signal,is the received signal, andis noise.

In rich scattering environments (like urban areas), the channel matrixelements can be modeled as independent complex Gaussian random variables.

Channel Capacity and Eigenvalues

The capacity of a MIMO channel (theoretical maximum information rate) is given by:whereare eigenvalues of.

Key insight: Channel capacity is completely determined by eigenvalues!

Application of Random Matrix Theory

When antenna numbers are large, we can use the Marchenko-Pastur distribution to analyze channel eigenvalue distribution and predict system capacity.

Let(assuming). At high SNR, average capacity is approximately:This shows MIMO can achieve linear growth in capacity — doubling antennas doubles capacity! This is the core advantage of MIMO.

Practical Design Implications

Antenna configuration: Aspect ratioaffects the "shape" of eigenvalue distribution, which affects capacity
Power allocation: Knowing eigenvalue distribution allows optimizing power allocation across "channel modes"
Massive MIMO: As antenna numbers approach infinity, random matrix theory gives precise performance predictions

import numpy as np

def estimate_mimo_capacity(n_r, n_t, snr_db, num_realizations=1000):
    """Estimate ergodic capacity of MIMO channel"""
    snr = 10**(snr_db / 10)
    capacities = []
    
    for _ in range(num_realizations):
        # Generate Rayleigh fading channel
        H = (np.random.randn(n_r, n_t) + 
             1j * np.random.randn(n_r, n_t)) / np.sqrt(2)
        
        # Compute eigenvalues of HH^dagger
        eigenvalues = np.linalg.eigvalsh(H @ H.conj().T)
        
        # Compute capacity
        capacity = np.sum(np.log2(1 + snr/n_t * eigenvalues))
        capacities.append(capacity)
    
    return np.mean(capacities)

# Example: 4x4 MIMO system
print(f"4x4 MIMO capacity at 10dB: {estimate_mimo_capacity(4, 4, 10):.2f} bits/s/Hz")
print(f"8x8 MIMO capacity at 10dB: {estimate_mimo_capacity(8, 8, 10):.2f} bits/s/Hz")

Applications in Finance

Portfolios and Covariance Matrices

Modern portfolio theory (Markowitz theory) centers on the covariance matrix. Let there beassets with return covariance matrix. Optimal portfolio weights are:whereis the expected return vector.

Problem: We don't know the true, only the sample covariance matrixto estimate it.

The Curse of Noise

Suppose you track 500 stocks with 5 years (about 1250 trading days) of data. The sample covariance matrix is, withindependent parameters to estimate!

Aspect ratio. According to the Marchenko-Pastur theorem, sample eigenvalues will significantly deviate from true eigenvalues:

When true eigenvalue = 1, sample eigenvalues distribute in
This means the largest eigenvalue is overestimated by about 126%, and the smallest is underestimated by about 86%!

Eigenvalue Cleaning

Random matrix theory provides a method to "clean" noisy eigenvalues:

Step 1: Compute eigenvalue decomposition of sample covariance matrix Step 2: Determine Marchenko-Pastur distribution boundaries Step 3: Treat eigenvalues falling in this interval as "noise" and replace them (e.g., set to average)

Step 4: Reconstruct covariance matrix

import numpy as np

def clean_covariance_matrix(returns, method='average'):
    """
    Clean covariance matrix using random matrix theory
    
    Parameters:
        returns: n x p return matrix (n samples, p assets)
        method: cleaning method
    
    Returns:
        Cleaned covariance matrix
    """
    n, p = returns.shape
    gamma = p / n
    
    # Compute sample covariance matrix
    S = np.cov(returns, rowvar=False)
    
    # Eigenvalue decomposition
    eigenvalues, eigenvectors = np.linalg.eigh(S)
    
    # Marchenko-Pastur boundaries
    sigma_sq = np.mean(eigenvalues)  # Estimate noise variance
    lambda_minus = sigma_sq * (1 - np.sqrt(gamma))**2
    lambda_plus = sigma_sq * (1 + np.sqrt(gamma))**2
    
    # Identify noise eigenvalues
    noise_mask = (eigenvalues >= lambda_minus) & (eigenvalues <= lambda_plus)
    
    # Clean
    cleaned_eigenvalues = eigenvalues.copy()
    if method == 'average':
        # Replace noise eigenvalues with average
        noise_avg = np.mean(eigenvalues[noise_mask])
        cleaned_eigenvalues[noise_mask] = noise_avg
    
    # Reconstruct covariance matrix
    cleaned_S = eigenvectors @ np.diag(cleaned_eigenvalues) @ eigenvectors.T
    
    return cleaned_S, lambda_minus, lambda_plus

Empirical Results

Portfolios constructed using cleaned covariance matrices typically perform better in out-of-sample tests: - Sharpe ratio improvement: About 10-30% - More stable volatility: Reduced extreme fluctuations - Lower turnover: More stable portfolios

Applications in Machine Learning

Challenges of High-dimensional Statistics

Modern machine learning often deals with "high-dimensional small-sample" problems: feature countis comparable to sample count, or even.

In these situations, traditional statistical methods fail. For example: - Sample covariance matrix is singular (when) - Estimated parameters are extremely unstable - Overfitting risk is very high

Random matrix theory provides a theoretical framework and practical tools for these problems.

PCA and Random Matrices

Principal Component Analysis (PCA) is the most common dimensionality reduction method. It extracts directions corresponding to the largest eigenvalues of the covariance matrix.

Problem: In high dimensions, which principal components are "real" and which are just noise?

Random matrix answer: Use the Marchenko-Pastur distribution as a "null hypothesis." Eigenvalues exceedinglikely correspond to true signal.

import numpy as np
from sklearn.decomposition import PCA

def significant_components(X, alpha=0.05):
    """
    Determine number of significant principal components using random matrix theory
    """
    n, p = X.shape
    gamma = p / n
    
    # Standardize data
    X_centered = X - X.mean(axis=0)
    X_std = X_centered / X_centered.std(axis=0)
    
    # Compute sample covariance eigenvalues
    S = X_std.T @ X_std / n
    eigenvalues = np.linalg.eigvalsh(S)[::-1]  # Descending order
    
    # Marchenko-Pastur upper bound
    lambda_plus = (1 + np.sqrt(gamma))**2
    
    # Count eigenvalues exceeding threshold
    n_significant = np.sum(eigenvalues > lambda_plus)
    
    return n_significant, eigenvalues, lambda_plus

# Example
np.random.seed(42)
n, p = 200, 100

# Generate data: 3 true principal components + noise
true_components = 3
signal = np.random.randn(n, true_components) @ np.random.randn(true_components, p)
noise = 0.5 * np.random.randn(n, p)
X = signal + noise

n_sig, eigs, threshold = significant_components(X)
print(f"Detected significant components: {n_sig}")
print(f"True components: {true_components}")
print(f"MP threshold: {threshold:.3f}")

Neural Network Initialization

In deep learning, weight matrix initialization is crucial. Random matrix theory helps understand the effects of different initialization strategies.

The principle of Xavier initialization: Keep output variance stable across layers to avoid gradient vanishing/explosion.

Letbe anweight matrix. Xavier initialization requires:This keeps expected eigenvalues ofnear 1, ensuring stable signal propagation.

Theoretical Understanding of Overfitting

Random matrix theory reveals the essence of overfitting in high-dimensional statistics:

As, the smallest eigenvalue of sample covariance matrix approaches 0, making the matrix nearly singular. This means: - "Spurious" patterns exist in the data - Models fit these noise patterns - Generalization ability drops sharply

Solutions: 1. Regularization: Addto keep eigenvalues away from 0 2. Dimensionality reduction: Keep only significant principal components 3. Collect more data: Reduce

Core Mathematical Tools

Stieltjes Transform

The Stieltjes transform is a powerful tool for studying eigenvalue distributions. Letbe the probability distribution of eigenvalues. Its Stieltjes transform is defined as:whereis the upper half complex plane.

Why is it useful?

Distribution recovery:can be recovered fromvia inverse transform
Equation simplification: Many random matrix problems become concise in Stieltjes transform language

Free Probability Theory

Free probability theory was developed by Voiculescu in the 1980s to study random variables in "non-commutative probability spaces."

Similar to "independence" in classical probability, free probability introduces the concept of free independence. Two random matrices are freely independent if and only if they "don't commute as much as possible."

Key theorem: Large random matrices are freely independent in the limit.

This allows us to treat eigenvalue distributions of sums and products of random matrices like independent random variables.

Proof Sketch of the Semicircle Law

The method of moments proves the Wigner semicircle law:

Step 1: Compute theth moment of eigenvalues

Step 2: Using independence and zero mean, only "paired" summands are nonzero

Step 3: Counting nonzero terms gives Catalan numbers(whenis even)

Step 4: Catalan numbers are exactly the moments of the semicircle distribution

This completes the proof skeleton. Rigor requires handling error terms and convergence.

Deep Understanding: Why is Random Matrix Theory So Universal?

Universality Phenomenon

The most magical feature of random matrix theory is universality: - Regardless of element distribution, limiting eigenvalue distribution is the same - Different physical systems (atomic nuclei, quantum chaos, etc.) exhibit the same statistical patterns

This universality stems from a profound fact about high-dimensional probability: as dimension approaches infinity, details are "averaged out," and only macroscopic structure remains.

Connection to Physics

Random matrix theory was originally developed by Wigner in the 1950s to study atomic nucleus energy level statistics.

Observation: Energy level spacing distributions of complex atomic nuclei are strikingly similar to eigenvalue spacing distributions of GOE random matrices!

Explanation: The Hamiltonian of complex quantum systems "looks like" a random matrix because it is so complex we cannot track every detail.

This embodies statistical mechanics thinking in quantum physics.

Connection to Information Theory

From an information theory perspective, random matrices can be viewed as "maximum entropy" matrices — matrix distributions with maximum entropy under given constraints.

The semicircle distribution is the "least informative" distribution satisfying certain constraints, similar to the normal distribution in one dimension.

Exercises

Basic Conceptual Problems

Exercise 1: Letbe asymmetric matrix withi.i.d. fromandi.i.d. from. Write out expected matrixand covariance structure.

Exercise 2: Explain why Wigner matrices need normalization factor. What happens without normalization?

Exercise 3: Let. Calculate the support intervalof the Marchenko-Pastur distribution. Sketch the theoretical density function.

Computation and Proof Problems

Exercise 4: Verify that semicircle distributionis normalized, i.e., prove:

Exercise 5: Compute second momentand fourth momentof the semicircle distribution.

Exercise 6: Letbe anrandom matrix with i.i.d. elements from. Prove.

Exercise 7: For asymmetric Gaussian random matrixwhere,independent. Derive the joint probability density of eigenvalues.

Programming Problems

Exercise 8: Write a program to verify eigenvalue repulsion. - Generate many GOE matrices - Compute adjacent eigenvalue spacings - Plot spacing distribution histogram - Compare with theoretical Wigner surmise Exercise 9: Implement numerical verification of Marchenko-Pastur distribution. - Generaterandom matrices with differentvalues (e.g., 0.1, 0.5, 1.0, 2.0) - Plot comparison of empirical and theoretical distributions - Study appearance of zero eigenvalues when Exercise 10: Write MIMO channel capacity simulation. - Implement capacity calculation for different antenna configurations - Plot capacity vs SNR curves - Compare with random matrix theory predictions

Application Problems

Exercise 11: An investor tracks 100 stocks with 200 days of return data. 1. Calculate aspect ratio$$2. According to Marchenko-Pastur theorem, what interval will sample eigenvalues distribute in? 3. If a sample eigenvalue is 3.5, what might this represent?

Exercise 12: In anMIMO system, suppose channel matrix elements are i.i.d. complex Gaussian random variables. 1. Write the channel capacity expression 2. Using random matrix theory, estimate capacity at high SNR (30dB) 3. If upgraded toMIMO, approximately how much does capacity increase?

Exercise 13: You have a dataset with 1000 samples, each with 500 features. 1. How many principal components should be retained? Use Marchenko-Pastur criterion. 2. If you want to retain more components, how should experimental design be adjusted?

Advanced Research Problems

Exercise 14: Research the Tracy-Widom distribution. 1. Look up literature and write the definition of Tracy-Widom distribution 2. Explain why the limit distribution of largest eigenvalue is not Gaussian 3. What applications does Tracy-Widom distribution have in statistical hypothesis testing?

Exercise 15: Explore free probability theory. 1. What is free independence? How does it differ from classical independence? 2. Letandbe freely independent random matrices. What is the relationship between eigenvalue distributions ofand those of,? 3. What are similarities and differences between free convolution and classical convolution?

Exercise 16: Random matrices and quantum chaos. 1. What is quantum chaos? How does it relate to classical chaos? 2. Why do energy level statistics of quantum chaotic systems follow GOE distribution? 3. How can you tell from energy level spacing distribution whether a quantum system is "integrable" or "chaotic"?

Chapter Summary

Random matrix theory is a beautiful intersection of linear algebra and probability theory. We learned:

Core Concepts - Definition and main models of random matrices (Wigner, Wishart) - Concept of empirical eigenvalue distribution

Fundamental Theorems - Wigner semicircle law: Limiting distribution of symmetric random matrix eigenvalues - Marchenko-Pastur distribution: Limiting distribution of sample covariance matrix eigenvalues - Tracy-Widom distribution: Limiting distribution of largest eigenvalue - Universality: Different models share the same limiting behavior

Application Areas - Wireless communications: MIMO system capacity analysis - Finance: Covariance matrix denoising, portfolio optimization - Machine learning: High-dimensional statistics, PCA, neural network initialization

Core Insights - High-dimensional randomness gives rise to precisely predictable structure - Noise can be systematically identified and removed - Universality gives the theory broad applicability

Random matrix theory remains an active research area with new discoveries and applications emerging constantly. Mastering these basic concepts provides a solid foundation for deeper research.

References

Bai, Z., & Silverstein, J. W. Spectral Analysis of Large Dimensional Random Matrices. Springer, 2010.
Anderson, G. W., Guionnet, A., & Zeitouni, O. An Introduction to Random Matrices. Cambridge University Press, 2010.
Mehta, M. L. Random Matrices. Academic Press, 2004.
Tulino, A. M., & Verd ú, S. "Random Matrix Theory and Wireless Communications." Foundations and Trends in Communications and Information Theory, 2004.
Bouchaud, J. P., & Potters, M. "Financial Applications of Random Matrix Theory: A Short Review." arXiv:0910.1205, 2009.
Couillet, R., & Debbah, M. Random Matrix Methods for Wireless Communications. Cambridge University Press, 2011.

This is Chapter 14 of the "Essence of Linear Algebra" series.