Essence of Linear Algebra (7): Orthogonality and Projections

Orthogonality is one of the most beautiful concepts in linear algebra. When two vectors "don't interfere with each other," computations become simple and understanding becomes clear. From GPS positioning to noise-canceling headphones, from image compression to recommendation systems, applications of orthogonality are everywhere. This chapter will guide you from intuition to understanding the deep meaning of orthogonality and why it forms the cornerstone of modern scientific computing.

What is Orthogonality? Starting from Intuition

Orthogonality in Daily Life

Before diving into mathematical definitions, let's feel what "orthogonality" means in everyday life.

City Streets: Manhattan's streets are arranged in a grid pattern, with north-south streets perpendicular to east-west streets. If you walk 3 blocks east, this has absolutely no effect on your north-south position. This is orthogonality —two directions are independent and don't affect each other.

TV Remote Control: Your remote has volume buttons and channel buttons. Pressing volume doesn't change the channel, and pressing channel doesn't change the volume. Volume and channel are "orthogonal" control dimensions.

Seasonings: When cooking, salt controls saltiness and sugar controls sweetness. Within a certain range, adding salt doesn't make the dish sweeter, and adding sugar doesn't make it saltier (though culinary experts might dispute this simplification).

This property of "not interfering with each other" corresponds mathematically to a simple condition: dot product equals zero.

Mathematical Definition of Orthogonality

Two vectors andare orthogonal, if and only if:Written in component form:

Why does zero dot product mean perpendicular? Recall the geometric formula for dot product:When,, so the dot product is zero.

Several important special cases:

The zero vector is orthogonal to any vector:holds for all
Standard basis vectors are pairwise orthogonal:andare orthogonal
A vector is orthogonal to itself if and only if it's the zero vector:

The Deep Meaning of Orthogonality: Information Independence

The essence of orthogonality is information independence. When two vectors are orthogonal, knowing the component of a vector in one direction tells you nothing about its component in the other direction.

Imagine you're describing someone's appearance: - "Height" and "weight" are not completely orthogonal (taller people tend to be heavier) - "Height" and "eye color" are closer to orthogonal (knowing height doesn't predict eye color)

In data analysis, we often want to find "orthogonal" features because they provide independent information without redundancy. This is the core idea behind Principal Component Analysis (PCA).

Orthogonal Sets and Orthogonal Bases

Definition of Orthogonal Sets

A set of vectorsforms an orthogonal set if any two different vectors in the set are orthogonal:

Example: The standard basis in three-dimensional spaceis an orthogonal set:

Key Property of Orthogonal Sets: Automatic Linear Independence

Theorem: An orthogonal set that doesn't contain the zero vector is necessarily linearly independent.

Intuitive Explanation: Imagine you have three mutually perpendicular sticks. You cannot combine two of them to "simulate" the third — they point in completely different, non-interfering directions.

Proof: Suppose. For any, take the dot product of both sides with:By orthogonality, (when), so only one term remains:Since, we have, therefore. This holds for all, hence linear independence.

Orthonormal Bases

If every vector in an orthogonal set is a unit vector (length equals 1), it's called an orthonormal set.

If an orthonormal set also spans the entire space (i.e., forms a basis), it's called an orthonormal basis.

Characteristics of orthonormal bases: Letbe an orthonormal basis, then:Hereis the Kronecker delta.

Computational Advantage of Orthogonal Bases

Why do we favor orthogonal bases so much? Because coordinate computation becomes extremely simple.

General basis case: Given basis, to find vector's coordinates in this basis, we need to solve a linear system:This typically requires Gaussian elimination with complexity.

Orthogonal basis case: If the basis is orthogonal, coordinates can be computed directly using dot products:

Orthonormal basis case is even simpler:Complexity is only Life Analogy: Decomposing a vector using an orthogonal basis is like weighing luggage — you don't need to put all luggage together and solve equations; you can weigh each piece separately.

Vector Projection: Finding the Closest Point

One-Dimensional Projection: The Mathematics of Shadows

Imagine sunlight shining straight down, and a tilted stick casts a shadow on the ground. This "shadow" is the projection of the stick onto the ground direction.

The orthogonal projection of vectoronto vectoris defined as:Let's break down this formula:

: Measures how much "component"has in the direction of$ = ||^2$'s length, used for normalization
: This is a scalar, representing the "scaling coefficient" of the projection
Multiply by: Get a vector in the same direction as Length of projection (scalar projection):Note: The scalar projection can be negative (whenpoints opposite to).

Geometric Essence of Projection: Shortest Distance

Projection has a profound geometric meaning: is the point on lineclosest to.

Why? Let, and the error vector be. The definition of projection guarantees that.

Now, take any other pointon the line. The squared distance is: (The second step uses the Pythagorean theorem, since)

This expression is minimized when, because.

Orthogonal Decomposition

Every vectorcan be decomposed into a component parallel to and a component perpendicular to:Written as:This decomposition is unique, and the two components are orthogonal.

Application: Force Decomposition in Physics. Decomposing gravity into components along and perpendicular to an inclined plane is a typical application of orthogonal decomposition.

Subspace Projection: From Lines to Planes

Projecting onto Subspaces

What if we're not projecting onto a line, but onto a plane or higher-dimensional subspace?

Letbe a subspace of. The projectionof vectorontosatisfies:This means the error vectoris orthogonal to every vector in.

Projection Matrix

When(the column space of matrix), projection has an elegant matrix representation.

Letbe anmatrix with linearly independent columns. The formula for projectingontois:

Projection Matrix:

Properties of the Projection Matrix:

Idempotent:(the projection of a projection is still itself)
Symmetric:
Rank: Intuition for Idempotence: Imagine projecting an object's shadow again — the shadow doesn't change — it's already "on the ground."

Normal Equations

The coefficient vectorin the projection formulasatisfies the normal equations:

Derivation: The key condition for projection is thatis perpendicular to, meaningis orthogonal to every column of. In matrix language:Expanding gives the normal equations.

Orthogonal Complement

An important concept related to subspaceis its orthogonal complement: $Extra close brace or missing open braceW^{\perp} = \{\vec{v} \in \mathbb{R}^n : \vec{v} \cdot \vec{w} = 0, \forall \vec{w} \in W}$ The orthogonal complement collects all vectors orthogonal to.

Important Property:That is, every vector can be uniquely decomposed into a component inand a component in.

Orthogonal Relationships of the Four Fundamental Subspaces:

-(the orthogonal complement of column space is the left nullspace) -(the orthogonal complement of nullspace is the row space)

Gram-Schmidt Orthogonalization: Manufacturing Orthogonal Bases

Problem Statement

Suppose you have a set of linearly independent vectors, but they're not orthogonal. Can you "adjust" them to become orthogonal while still spanning the same space?

The answer is yes, and the method is Gram-Schmidt orthogonalization.

Algorithm Idea

The core idea is to progressively remove components from previous vectors.

Let the original vectors be, and we want to construct orthogonal vectors.

Steps:

First vector: Directly take
Second vector: Subtract the projection ofonto
Third vector: Subtract projections ofonto bothand
General formula:
Normalization: If an orthonormal basis is needed, finally divide each vector by its length:

Intuitive Explanation

Imagine you're building an orthogonal coordinate system:

First axis: Pick any direction ()
Second axis:roughly points in a second direction, but might not be perpendicular to the first axis. We "subtract" its component along the first axis from, and what's left is perpendicular to the first axis
Third axis: Subtract components along the first two axes from, and what's left is perpendicular to both previous axes

Each step "clears" the "contamination" from previous directions, keeping only the new, independent information.

Detailed Example

Perform Gram-Schmidt orthogonalization on,,.

Step 1: Step 2: Compute the projection ofonto Verification:✓

Step 3: Compute projections ofontoand

Numerical Stability Issues

The classical Gram-Schmidt algorithm can accumulate errors in numerical computation, causing later vectors to not be sufficiently orthogonal. Modified Gram-Schmidt updates the vector immediately after each projection rather than using the original.

def modified_gram_schmidt(A):
    """Modified Gram-Schmidt orthogonalization"""
    m, n = A.shape
    Q = A.copy().astype(float)
    R = np.zeros((n, n))
    
    for j in range(n):
        R[j, j] = np.linalg.norm(Q[:, j])
        Q[:, j] = Q[:, j] / R[j, j]
        for k in range(j + 1, n):
            R[j, k] = Q[:, j] @ Q[:, k]
            Q[:, k] = Q[:, k] - R[j, k] * Q[:, j]
    
    return Q, R

QR Decomposition: Matrix Representation of Orthogonalization

Definition of QR Decomposition

Anymatrixwith linearly independent columns can be decomposed as:Where: -is anmatrix whose columns are orthonormal vectors () -is anupper triangular matrix with positive diagonal elements

Relationship with Gram-Schmidt

The columns ofare the orthonormal vectors obtained by Gram-Schmidt orthogonalization of's columns.

The elements ofrecord the projection coefficients of original vectors onto orthogonal vectors:Specifically:

Why isupper triangular? Becausecan only be expressed using; it doesn't need later orthogonal vectors.

QR Decomposition Example

Perform QR decomposition on matrix.

After Gram-Schmidt orthogonalization (and normalization): matrix:

Applications of QR Decomposition

Improved Least Squares: The normal equationscan be simplified using QR decomposition.

From: Normal equations become, multiplying both sides by:This is an upper triangular system that can be efficiently solved using back substitution!

Why is QR decomposition better? Directly computingsquares the condition number (), leading to numerical instability. QR decomposition avoids this problem.

Least Squares: When Equations Have No Solution

Problem Statement

In reality, data often comes with noise. Suppose you measure 5 data points and want to fit a line to them. You need to solve:This is 5 equations with 2 unknowns — more equations than unknowns, called an overdetermined system. Unless the 5 points are exactly collinear, no exact solution exists.

Least Squares Solution

The idea of least squares: Since we can't find an exact solution, find one that minimizes the sum of squared errors.

The error vector is, and we want to minimize:

Geometric Interpretation:is a vector in column space. Minimizingmeans finding the point inclosest to— this is exactly projection!

Therefore, the least squares solutionsatisfies, whereis the projection ofonto.

Derivation of Normal Equations

The least squares solution satisfies the normal equations:

Derivation Method 1 (Geometric): The projection errormust be orthogonal to the column space, i.e.,:

Derivation Method 2 (Calculus): Let, expand:Take the gradient and set it to zero:This gives the same normal equations.

Linear Regression Example

Problem: Fit data pointsto line.

Build the matrix:

Compute normal equations:

Solve:Solution:,. The best fit line is.

Weighted Least Squares

Sometimes different data points have different reliability. Weighted least squares gives each data point a weight:The normal equations become:Whereis the diagonal weight matrix.

Orthogonal Matrices: Preserving Distance and Angle

Definition and Properties

A square matrixis an orthogonal matrix if:Equivalently,, i.e., the transpose is the inverse.

Column vector property: The columns ofform an orthonormal set.

Row vector property: The rows ofalso form an orthonormal set (because).

Orthogonal Matrices Preserve Geometry

Orthogonal matrices are "rigid transformations"— they preserve length, angle, and orientation (if).

Preserving length:

Preserving inner product:

Preserving angle: Since inner product is preserved, and angle is defined by.

Determinant of Orthogonal Matrices

Theorem: Proof: Therefore.

-: Rotation matrix (proper orthogonal), preserves handedness -: Reflection matrix (improper orthogonal), reverses handedness

Common Orthogonal Matrices

2D Rotation Matrix:Verification:.

2D Reflection Matrix: Reflection along unit vector Verification:,, so.

Householder Reflection: Another implementation of QR decomposition with better numerical stability than Gram-Schmidt.

Permutation Matrix: Only one 1 per row and column, rest are 0. For example:Its action is to permute the order of vector components.

Numerical Advantage of Orthogonal Matrices

The condition number of an orthogonal matrix is 1:This means computations with orthogonal matrices are numerically stable— errors are not amplified.

Applications in Signal Processing

Fourier Basis: The Most Important Orthogonal Basis

The core tool of signal processing —Fourier Transform— is essentially a coordinate transformation in an orthogonal basis.

Consider a discrete signal with period. Define complex exponential vectors:These vectors form an orthogonal basis (under complex inner product):

Discrete Fourier Transform (DFT) decomposes signalonto this orthogonal basis:

Signal Decomposition and Filtering

Because Fourier bases are orthogonal, signals can be losslessly decomposed into different frequency components:

Low-pass filter: Keep only low-frequency components (small), remove high-frequency noise

High-pass filter: Keep only high-frequency components, extract edges or sudden changes

Band-pass filter: Keep only a specific frequency range

How Noise-Canceling Headphones Work

Noise-canceling headphones use microphones to capture external noise, then:

Perform Fourier transform on the noise signal
Generate a signal with opposite phase
Play it through the headphones

Because orthogonal components can be processed independently, noise can be "precisely canceled" without affecting the music you're listening to.

Image Compression: DCT in JPEG

JPEG image compression uses Discrete Cosine Transform (DCT), the real-number version of Fourier transform.

Images are divided intoblocks, and each block undergoes DCT. Since high-frequency components of natural images are usually small, they can be discarded or quantized to achieve compression.

When reconstructing images, only a few low-frequency components are needed to approximate the original — this is the power of orthogonal decomposition.

CDMA in Mobile Communications

In mobile communications, multiple users use the same frequency simultaneously. How do you distinguish different users' signals?

CDMA solution: Give each user a "code," and these codes are pairwise orthogonal.

User A's signal:User B's signal:Received mixed signal:To extract user A's signal:Because, user B's signal is completely filtered out!

Principal Component Analysis (PCA): Finding the Most Important Directions

Problem Statement

Suppose you have high-dimensional data (say 1000 features) and want to describe it with just a few "principal components." How do you find these principal components?

Basic Idea of PCA

PCA looks for directions with maximum variance in the data.

First principal component: The direction where data projection has maximum variance

Second principal component: Among directions orthogonal to the first, the one with maximum variance

...and so on

Why require orthogonality? Orthogonality ensures principal components "don't interfere with each other"— each component captures independent information.

Mathematical Formulation

Let the data matrix be(each row is a sample, already centered).

Covariance matrix:PCA is equivalent to eigenvalue decomposition of:Whereis an orthogonal matrix (eigenvectors), andis a diagonal matrix of eigenvalues.

The columns ofare principal component directions, and eigenvalues represent variance in each direction.

Dimensionality Reduction

Keep the firstprincipal components, reducing data fromdimensions todimensions:Whereis the firstcolumns of.

This is the "optimal" linear dimensionality reduction — it maximally preserves the variance (information) of original data in-dimensional space.

Python Implementation Examples

Gram-Schmidt Orthogonalization

import numpy as np

def gram_schmidt(A):
    """
    Classical Gram-Schmidt orthogonalization
    Input: A - matrix with columns as vectors to orthogonalize
    Output: Q - matrix with orthonormal column vectors
    """
    m, n = A.shape
    Q = np.zeros((m, n))
    
    for j in range(n):
        v = A[:, j].copy()
        for i in range(j):
            v = v - (Q[:, i] @ A[:, j]) * Q[:, i]
        Q[:, j] = v / np.linalg.norm(v)
    
    return Q

# Test
A = np.array([[1, 1, 0],
              [1, 0, 1],
              [0, 1, 1]], dtype=float).T
Q = gram_schmidt(A)
print("Orthogonal matrix Q:")
print(Q)
print("\nVerify Q^T Q = I:")
print(Q.T @ Q)

QR Decomposition

def qr_decomposition(A):
    """
    QR Decomposition
    Input: A - m × n matrix with linearly independent columns
    Output: Q - m × n orthogonal matrix, R - n × n upper triangular matrix
    """
    m, n = A.shape
    Q = np.zeros((m, n))
    R = np.zeros((n, n))
    
    for j in range(n):
        v = A[:, j].copy()
        for i in range(j):
            R[i, j] = Q[:, i] @ A[:, j]
            v = v - R[i, j] * Q[:, i]
        R[j, j] = np.linalg.norm(v)
        Q[:, j] = v / R[j, j]
    
    return Q, R

# Test
A = np.array([[1, 1],
              [1, 0],
              [0, 1]], dtype=float)
Q, R = qr_decomposition(A)
print("Q =\n", Q)
print("\nR =\n", R)
print("\nVerify A = QR:\n", Q @ R)

Least Squares

def least_squares(A, b):
    """
    Solve least squares problem using normal equations
    """
    return np.linalg.solve(A.T @ A, A.T @ b)

def least_squares_qr(A, b):
    """
    Solve least squares problem using QR decomposition (more stable)
    """
    Q, R = np.linalg.qr(A)
    return np.linalg.solve(R, Q.T @ b)

# Linear fitting example
x = np.array([1, 2, 3, 4, 5])
y = np.array([2.1, 3.9, 6.2, 7.8, 10.1])

# Build design matrix
A = np.column_stack([np.ones(len(x)), x])

# Solve
coeffs = least_squares_qr(A, y)
print(f"Fitting result: y = {coeffs[0]:.4f} + {coeffs[1]:.4f}x")

Projection Visualization

import matplotlib.pyplot as plt

def visualize_projection():
    """Visualize vector projection"""
    fig, ax = plt.subplots(figsize=(8, 8))
    
    # Define vectors
    a = np.array([3, 1])
    b = np.array([1, 3])
    
    # Compute projection
    proj = (np.dot(a, b) / np.dot(a, a)) * a
    error = b - proj
    
    # Plot
    ax.quiver(0, 0, a[0], a[1], angles='xy', scale_units='xy', scale=1, 
              color='blue', label=r'$\vec{a}$', width=0.02)
    ax.quiver(0, 0, b[0], b[1], angles='xy', scale_units='xy', scale=1, 
              color='green', label=r'$\vec{b}$', width=0.02)
    ax.quiver(0, 0, proj[0], proj[1], angles='xy', scale_units='xy', scale=1, 
              color='red', label=r'proj$_{\vec{a }}\vec{b}$', width=0.02)
    ax.quiver(proj[0], proj[1], error[0], error[1], angles='xy', 
              scale_units='xy', scale=1, color='orange', 
              label='Error vector', width=0.02)
    
    ax.set_xlim(-1, 5)
    ax.set_ylim(-1, 5)
    ax.set_aspect('equal')
    ax.grid(True)
    ax.legend()
    ax.set_title('Vector Projection Illustration')
    plt.show()

visualize_projection()

Exercises

Basic Problems

Exercise 1: Verify whether vectorsandare orthogonal. If orthogonal, normalize them to obtain an orthonormal basis.

Exercise 2: Compute the projection of vectoronto vector. Verify that the error vector is orthogonal to.

Exercise 3: Perform Gram-Schmidt orthogonalization on vectorsand verify the resulting vectors are orthogonal.

Exercise 4: Let. Verify thatis a projection matrix (i.e.,and).

Exercise 5: Is the rotation matrixan orthogonal matrix? Verify.

Advanced Problems

Exercise 6: Prove that ifis an orthogonal set (not containing the zero vector), then they are linearly independent.

Exercise 7: Prove that projection matrixsatisfies. Explain geometrically why "the projection of a projection is still itself."

Exercise 8: Perform QR decomposition on matrixand verify.

Exercise 9: Use least squares to fit data pointsto line.

Exercise 10: Let. Find the projection of vectoronto.

Exercise 11: Prove that the product of two orthogonal matrices is still an orthogonal matrix.

Exercise 12: Letbe anmatrix (), andbe its QR decomposition. Prove that the column space ofequals the column space of.

Application Problems

Exercise 13: An experiment measured 5 data pairs, suspected to satisfy. Through variable substitution, convert the problem to linear fitting, and use least squares to findand.

Data: Exercise 14: In signal processing, given a noisy signal, design a least squares method to estimateand.

Exercise 15: Explain why JPEG image compression uses Discrete Cosine Transform (DCT) instead of directly storing pixel values. What role does orthogonal transformation play?

Programming Problems

Exercise 16: Implement the Modified Gram-Schmidt algorithm and compare numerical stability with the classical version. Hint: Test with nearly linearly dependent vectors.

Exercise 17: Implement projection matrix computation in Python and visualize vector projection onto a plane.

Exercise 18: Implement a simple Principal Component Analysis (PCA) for dimensionality reduction and visualization of 2D data.

Exercise 19: Implement least squares using QR decomposition and compare results with NumPy's np.linalg.lstsq.

Exercise 20: Simulate a simple CDMA system: - Generate 3 orthogonal codes (length 4) - 3 users each send bit 0 or 1 - Simulate receiving mixed signal - Use orthogonality to separate each user's signal

Chapter Summary

Core Concepts

Orthogonality:, meaning vectors "don't interfere"
Projection:
- 1D: - Subspace:
Normal equations:, the core of least squares problems
Gram-Schmidt: Transforms any linearly independent vector set into an orthogonal basis
QR decomposition:,orthogonal,upper triangular, numerically stable
Orthogonal matrix:, preserves length and angle

Application Areas

Field	Application	Concepts Used
Data Analysis	Linear Regression	Least Squares
Machine Learning	PCA Dimensionality Reduction	Orthogonal Decomposition
Signal Processing	FFT, Filtering	Orthogonal Bases
Image Processing	JPEG Compression	DCT Transform
Communications	CDMA	Orthogonal Codes
Numerical Computing	Solving Linear Systems	QR Decomposition

Next Chapter Preview

"Symmetric Matrices and Quadratic Forms" will explore:

Spectral theorem for symmetric matrices
Real symmetric matrices have real eigenvalues
Determining and applying positive definite matrices
Geometric meaning of quadratic forms
Principal axis theorem
Hessian matrices in optimization problems

References

Strang, G. (2019). Introduction to Linear Algebra. Chapters 4, 10.
Trefethen, L. N. & Bau, D. (1997). Numerical Linear Algebra. Lectures 7-11.
3Blue1Brown. Essence of Linear Algebra, Chapters 9, 11.
Golub, G. H. & Van Loan, C. F. (2013). Matrix Computations. Chapter 5.

Next Chapter: Symmetric Matrices and Quadratic Forms →

Previous Chapter: ← Eigenvalues and Eigenvectors

This is Chapter 7 of the 18-part "Essence of Linear Algebra" series.