Orthogonality is one of the most beautiful concepts in linear algebra. When two vectors "don't interfere with each other," computations become simple and understanding becomes clear. From GPS positioning to noise-canceling headphones, from image compression to recommendation systems, applications of orthogonality are everywhere. This chapter will guide you from intuition to understanding the deep meaning of orthogonality and why it forms the cornerstone of modern scientific computing.
What is Orthogonality? Starting from Intuition
Orthogonality in Daily Life
Before diving into mathematical definitions, let's feel what "orthogonality" means in everyday life.
City Streets: Manhattan's streets are arranged in a grid pattern, with north-south streets perpendicular to east-west streets. If you walk 3 blocks east, this has absolutely no effect on your north-south position. This is orthogonality —two directions are independent and don't affect each other.
TV Remote Control: Your remote has volume buttons and channel buttons. Pressing volume doesn't change the channel, and pressing channel doesn't change the volume. Volume and channel are "orthogonal" control dimensions.
Seasonings: When cooking, salt controls saltiness and sugar controls sweetness. Within a certain range, adding salt doesn't make the dish sweeter, and adding sugar doesn't make it saltier (though culinary experts might dispute this simplification).
正交性与投影/fig1.png)
This property of "not interfering with each other" corresponds mathematically to a simple condition: dot product equals zero.
Mathematical Definition of Orthogonality
Two vectors
Why does zero dot product mean perpendicular? Recall
the geometric formula for dot product:
Several important special cases:
- The zero vector is orthogonal to any vector:
holds for all - Standard basis vectors are pairwise
orthogonal:
and are orthogonal - A vector is orthogonal to itself if and only if it's the
zero vector:
The Deep Meaning of Orthogonality: Information Independence
The essence of orthogonality is information independence. When two vectors are orthogonal, knowing the component of a vector in one direction tells you nothing about its component in the other direction.
Imagine you're describing someone's appearance: - "Height" and "weight" are not completely orthogonal (taller people tend to be heavier) - "Height" and "eye color" are closer to orthogonal (knowing height doesn't predict eye color)
In data analysis, we often want to find "orthogonal" features because they provide independent information without redundancy. This is the core idea behind Principal Component Analysis (PCA).
Orthogonal Sets and Orthogonal Bases
Definition of Orthogonal Sets
A set of vectors
Example: The standard basis in three-dimensional
space
Key Property of Orthogonal Sets: Automatic Linear Independence
Theorem: An orthogonal set that doesn't contain the zero vector is necessarily linearly independent.
Intuitive Explanation: Imagine you have three mutually perpendicular sticks. You cannot combine two of them to "simulate" the third — they point in completely different, non-interfering directions.
Proof: Suppose
Orthonormal Bases
If every vector in an orthogonal set is a unit vector (length equals 1), it's called an orthonormal set.
If an orthonormal set also spans the entire space (i.e., forms a basis), it's called an orthonormal basis.
Characteristics of orthonormal bases: Let
正交性与投影/fig2.png)
Computational Advantage of Orthogonal Bases
Why do we favor orthogonal bases so much? Because coordinate computation becomes extremely simple.
General basis case: Given basis
Orthogonal basis case: If the basis is orthogonal,
coordinates can be computed directly using dot products:
Orthonormal basis case is even simpler:
Vector Projection: Finding the Closest Point
One-Dimensional Projection: The Mathematics of Shadows
Imagine sunlight shining straight down, and a tilted stick casts a shadow on the ground. This "shadow" is the projection of the stick onto the ground direction.
正交性与投影/fig3.png)
The orthogonal projection of vector
: Measures how much "component" has in the direction of$ = ||^2 $'s length, used for normalization : This is a scalar, representing the "scaling coefficient" of the projection- Multiply by
: Get a vector in the same direction as Length of projection (scalar projection): Note: The scalar projection can be negative (when points opposite to ).
Geometric Essence of Projection: Shortest Distance
Projection has a profound geometric meaning:
Why? Let
Now, take any other point
This expression is minimized when
Orthogonal Decomposition
Every vector
Application: Force Decomposition in Physics. Decomposing gravity into components along and perpendicular to an inclined plane is a typical application of orthogonal decomposition.
Subspace Projection: From Lines to Planes
Projecting onto Subspaces
What if we're not projecting onto a line, but onto a plane or higher-dimensional subspace?
Let
正交性与投影/fig4.png)
Projection Matrix
When
Let
Projection Matrix:
Properties of the Projection Matrix:
Idempotent:
(the projection of a projection is still itself)Symmetric:
Rank:
Intuition for Idempotence: Imagine projecting an object's shadow again — the shadow doesn't change — it's already "on the ground."
Normal Equations
The coefficient vector
Derivation: The key condition for projection is
that
Orthogonal Complement
An important concept related to subspace
Important Property:
Orthogonal Relationships of the Four Fundamental Subspaces:
-
Gram-Schmidt Orthogonalization: Manufacturing Orthogonal Bases
Problem Statement
Suppose you have a set of linearly independent vectors, but they're not orthogonal. Can you "adjust" them to become orthogonal while still spanning the same space?
The answer is yes, and the method is Gram-Schmidt orthogonalization.
Algorithm Idea
The core idea is to progressively remove components from previous vectors.
Let the original vectors be
Steps:
First vector: Directly take
Second vector: Subtract the projection of
ontoThird vector: Subtract projections of
onto both andGeneral formula:
Normalization: If an orthonormal basis is needed, finally divide each vector by its length:
Intuitive Explanation
Imagine you're building an orthogonal coordinate system:
- First axis: Pick any direction (
) - Second axis:
roughly points in a second direction, but might not be perpendicular to the first axis. We "subtract" its component along the first axis from , and what's left is perpendicular to the first axis - Third axis: Subtract components along the first two axes from
, and what's left is perpendicular to both previous axes
Each step "clears" the "contamination" from previous directions, keeping only the new, independent information.
正交性与投影/fig5.png)
Detailed Example
Perform Gram-Schmidt orthogonalization on
Step 1:
Step 3: Compute projections of
Numerical Stability Issues
The classical Gram-Schmidt algorithm can accumulate errors in
numerical computation, causing later vectors to not be sufficiently
orthogonal. Modified Gram-Schmidt updates the vector
immediately after each projection rather than using the original
1 | def modified_gram_schmidt(A): |
QR Decomposition: Matrix Representation of Orthogonalization
Definition of QR Decomposition
Any
Relationship with Gram-Schmidt
The columns of
The elements of
Why is
QR Decomposition Example
Perform QR decomposition on matrix
After Gram-Schmidt orthogonalization (and normalization):
Applications of QR Decomposition
Improved Least Squares: The normal equations
From
Why is QR decomposition better? Directly
computing
Least Squares: When Equations Have No Solution
Problem Statement
In reality, data often comes with noise. Suppose you measure 5 data
points and want to fit a line to them. You need to solve:
Least Squares Solution
The idea of least squares: Since we can't find an exact solution, find one that minimizes the sum of squared errors.
The error vector is
Geometric Interpretation:
Therefore, the least squares solution
正交性与投影/fig6.png)
Derivation of Normal Equations
The least squares solution satisfies the normal
equations:
Derivation Method 1 (Geometric): The projection
error
Derivation Method 2 (Calculus): Let
Linear Regression Example
Problem: Fit data points
Build the matrix:
Compute normal equations:
Solve:
Weighted Least Squares
Sometimes different data points have different reliability.
Weighted least squares gives each data point a
weight
Orthogonal Matrices: Preserving Distance and Angle
Definition and Properties
A square matrix
Column vector property: The columns of
Row vector property: The rows of
Orthogonal Matrices Preserve Geometry
Orthogonal matrices are "rigid transformations"— they preserve
length, angle, and orientation (if
Preserving length:
Preserving inner product:
Preserving angle: Since inner product is preserved,
and angle is defined by
Determinant of Orthogonal Matrices
Theorem:
-
Common Orthogonal Matrices
2D Rotation Matrix:
2D Reflection Matrix: Reflection along unit
vector
Householder Reflection: Another implementation of QR decomposition with better numerical stability than Gram-Schmidt.
Permutation Matrix: Only one 1 per row and column,
rest are 0. For example:
Numerical Advantage of Orthogonal Matrices
The condition number of an orthogonal matrix is
1:
Applications in Signal Processing
Fourier Basis: The Most Important Orthogonal Basis
The core tool of signal processing —Fourier Transform— is essentially a coordinate transformation in an orthogonal basis.
Consider a discrete signal with period
Discrete Fourier Transform (DFT) decomposes
signal
Signal Decomposition and Filtering
Because Fourier bases are orthogonal, signals can be
losslessly decomposed into different frequency
components:
Low-pass filter: Keep only low-frequency components
(small
High-pass filter: Keep only high-frequency components, extract edges or sudden changes
Band-pass filter: Keep only a specific frequency range
How Noise-Canceling Headphones Work
Noise-canceling headphones use microphones to capture external noise, then:
- Perform Fourier transform on the noise signal
- Generate a signal with opposite phase
- Play it through the headphones
Because orthogonal components can be processed independently, noise can be "precisely canceled" without affecting the music you're listening to.
Image Compression: DCT in JPEG
JPEG image compression uses Discrete Cosine Transform (DCT), the real-number version of Fourier transform.
Images are divided into
When reconstructing images, only a few low-frequency components are needed to approximate the original — this is the power of orthogonal decomposition.
正交性与投影/fig7.png)
CDMA in Mobile Communications
In mobile communications, multiple users use the same frequency simultaneously. How do you distinguish different users' signals?
CDMA solution: Give each user a "code," and these codes are pairwise orthogonal.
User A's signal:
Principal Component Analysis (PCA): Finding the Most Important Directions
Problem Statement
Suppose you have high-dimensional data (say 1000 features) and want to describe it with just a few "principal components." How do you find these principal components?
Basic Idea of PCA
PCA looks for directions with maximum variance in the data.
First principal component: The direction where data projection has maximum variance
Second principal component: Among directions orthogonal to the first, the one with maximum variance
...and so on
Why require orthogonality? Orthogonality ensures principal components "don't interfere with each other"— each component captures independent information.
Mathematical Formulation
Let the data matrix be
Covariance matrix:
The columns of
Dimensionality Reduction
Keep the first
This is the "optimal" linear dimensionality reduction — it maximally
preserves the variance (information) of original data in
Python Implementation Examples
Gram-Schmidt Orthogonalization
1 | import numpy as np |
QR Decomposition
1 | def qr_decomposition(A): |
Least Squares
1 | def least_squares(A, b): |
Projection Visualization
1 | import matplotlib.pyplot as plt |
Exercises
Basic Problems
Exercise 1: Verify whether vectors
Exercise 2: Compute the projection of vector
Exercise 3: Perform Gram-Schmidt orthogonalization
on vectors
Exercise 4: Let
Exercise 5: Is the rotation matrix
Advanced Problems
Exercise 6: Prove that if
Exercise 7: Prove that projection matrix
Exercise 8: Perform QR decomposition on matrix
Exercise 9: Use least squares to fit data
points
Exercise 10: Let
Exercise 11: Prove that the product of two orthogonal matrices is still an orthogonal matrix.
Exercise 12: Let
Application Problems
Exercise 13: An experiment measured 5 data
pairs
Data:
Exercise 15: Explain why JPEG image compression uses Discrete Cosine Transform (DCT) instead of directly storing pixel values. What role does orthogonal transformation play?
Programming Problems
Exercise 16: Implement the Modified Gram-Schmidt algorithm and compare numerical stability with the classical version. Hint: Test with nearly linearly dependent vectors.
Exercise 17: Implement projection matrix computation in Python and visualize vector projection onto a plane.
Exercise 18: Implement a simple Principal Component Analysis (PCA) for dimensionality reduction and visualization of 2D data.
Exercise 19: Implement least squares using QR
decomposition and compare results with NumPy's
np.linalg.lstsq.
Exercise 20: Simulate a simple CDMA system: - Generate 3 orthogonal codes (length 4) - 3 users each send bit 0 or 1 - Simulate receiving mixed signal - Use orthogonality to separate each user's signal
Chapter Summary
Core Concepts
Orthogonality:
, meaning vectors "don't interfere"Projection:
- 1D:
- Subspace:
- 1D:
Normal equations:
, the core of least squares problemsGram-Schmidt: Transforms any linearly independent vector set into an orthogonal basis
QR decomposition:
, orthogonal, upper triangular, numerically stableOrthogonal matrix:
, preserves length and angle
Application Areas
| Field | Application | Concepts Used |
|---|---|---|
| Data Analysis | Linear Regression | Least Squares |
| Machine Learning | PCA Dimensionality Reduction | Orthogonal Decomposition |
| Signal Processing | FFT, Filtering | Orthogonal Bases |
| Image Processing | JPEG Compression | DCT Transform |
| Communications | CDMA | Orthogonal Codes |
| Numerical Computing | Solving Linear Systems | QR Decomposition |
Next Chapter Preview
"Symmetric Matrices and Quadratic Forms" will explore:
- Spectral theorem for symmetric matrices
- Real symmetric matrices have real eigenvalues
- Determining and applying positive definite matrices
- Geometric meaning of quadratic forms
- Principal axis theorem
- Hessian matrices in optimization problems
References
- Strang, G. (2019). Introduction to Linear Algebra. Chapters 4, 10.
- Trefethen, L. N. & Bau, D. (1997). Numerical Linear Algebra. Lectures 7-11.
- 3Blue1Brown. Essence of Linear Algebra, Chapters 9, 11.
- Golub, G. H. & Van Loan, C. F. (2013). Matrix Computations. Chapter 5.
Next Chapter: Symmetric Matrices and Quadratic Forms →
Previous Chapter: ← Eigenvalues and Eigenvectors
This is Chapter 7 of the 18-part "Essence of Linear Algebra" series.
- Post title:Essence of Linear Algebra (7): Orthogonality and Projections
- Post author:Chen Kai
- Create time:2019-02-06 15:30:00
- Post link:https://www.chenk.top/chapter-07-orthogonality-and-projections/
- Copyright Notice:All articles in this blog are licensed under BY-NC-SA unless stating additionally.