Essence of Linear Algebra (18): Frontiers and Summary

We've completed a long journey through linear algebra. From vectors and matrices to eigenvalue decomposition, SVD, tensor analysis, and applications in machine learning and deep learning — each chapter has revealed the remarkable universality of this discipline. Now, let's turn our gaze to the frontiers: quantum computing, graph neural networks, large language models, and technologies that are changing the world. These fields may seem mysterious, but their core remains the familiar linear algebra we've studied.

Linear Algebra in Quantum Computing

From Classical Bits to Quantum Bits

A classical computer's basic unit is the bit, which can only be in one of two states: or. Like a light switch — either on or off, no in-between.

Quantum bits (qubits) are entirely different. They can simultaneously exist in a superposition ofand. In linear algebra terms, a qubit is a unit vector in a two-dimensional complex vector space:Hereandare two basis vectors (called the computational basis), represented as matrices:Coefficientsandare complex numbers satisfying the normalization condition. This condition is precisely the mathematical expression of "unit vector."

Intuitive Understanding: Imagine a globe. A classical bit can only be at the north pole or south pole, while a qubit can be at any point on the sphere's surface. This sphere is called the Bloch sphere, perfectly visualizing a qubit's state space. The north pole is, south pole is, and equatorial points are equal-probability superpositions (like).

Why is superposition so powerful? Withqubits, we can simultaneously represent a superposition ofclassical states. This means 50 qubits can "simultaneously" processstates — the source of quantum parallelism.

Quantum Gates: Physical Realization of Unitary Matrices

In quantum computing, "operations" are implemented through quantum gates. Mathematically, quantum gates are unitary matrices. A matrixis unitary if and only if:whereis's conjugate transpose. The geometric meaning of unitary matrices: they preserve inner products (and thus lengths and angles). In quantum mechanics, this corresponds to probability conservation — the sum of probabilities for all possible measurement outcomes always equals 1.

Most Important Single-Qubit Gates:

Hadamard Gate: Arguably the most frequently used gate in quantum computing, it creates superposition:Its effect:Note that, meaning Hadamard is self-inverse. Geometrically, it's arotation about a certain axis on the Bloch sphere.

Pauli Gates: Three fundamental rotationsThegate is the "quantum NOT gate," swappingand. Thegate adds a negative sign to (phase flip)..

Rotation Gates: Parameterized rotationsSimilarly forand. Any single-qubit gate can be decomposed into combinations of these three rotations.

Two-Qubit Gate: CNOT

The Controlled-NOT (CNOT) gate is the archetypal two-qubit gate:Its action: when the control qubit is, flip the target qubit; otherwise, leave it unchanged. CNOT gates can create quantum entanglement — a correlation that doesn't exist in the classical world.

The famous Bell state (maximally entangled state) can be prepared as:The magic of this state: no matter how far apart the two qubits are, measuring one instantly determines the other's state. This isn't faster-than-light communication, but quantum mechanics' nonlocality.

Linear Algebra Perspective on Quantum Algorithms

Deutsch-Jozsa Algorithm: Demonstrating Quantum Parallelism

Suppose there's a functionthat's either constant (same output for all inputs) or balanced (exactly half outputs 0, half outputs 1). Classically, worst case requiresqueries to distinguish. The quantum algorithm needs only one.

The core idea uses Hadamard transforms to create superposition of all inputs, then lets quantum interference amplify the correct answer and cancel wrong ones.

Grover's Search Algorithm

To search for a specific target inunordered elements, classical algorithms needqueries; Grover's algorithm needs only.

The core of Grover's algorithm iterates two operations: 1. Oracle Flip: Mark the target state's phase 2. Diffusion Transform: Reflect about the mean

In matrix language, the diffusion transform is, whereis the uniform superposition state. This is a reflection matrix — a concept we studied in orthogonal matrices!

Shor's Algorithm and RSA's Crisis

Shor's algorithm can factor large integers in polynomial time, directly threatening RSA encryption (whose security relies on the difficulty of factoring large numbers). The key step is the Quantum Fourier Transform (QFT):This is the quantum version of discrete Fourier transform. Classical FFT needsoperations; quantum QFT needs onlygates — that's quantum computing's power.

Quantum Machine Learning

The intersection of quantum computing and machine learning is an active research area. Key directions:

Variational Quantum Eigensolver (VQE): For finding molecular ground state energies, a core problem in chemistry and drug design. VQE uses parameterized quantum circuits, with classical optimizers adjusting parameters to minimize energy expectation.

Quantum Neural Networks: Building neural networks with parameterized quantum gates. Theoretically may achieve quantum speedup for certain tasks, but currently limited by quantum hardware noise.

Quantum Kernel Methods: Using quantum state space as feature space, potentially accessing kernel functions that are classically hard to compute.

Linear Algebra in Graph Neural Networks

Matrix Representations of Graphs

Graphs are fundamental data structures for describing relationships and connections. Social networks, molecular structures, transportation systems, the internet — all can be modeled as graphs.

A graphconsists of node setand edge set. Several ways to represent graphs with matrices:

Adjacency Matrix:if nodesandare connected, 0 otherwise. For weighted graphs,can be the edge weight.

Degree Matrix: Diagonal matrix whereis node's degree (number of connections).

Graph Laplacian Matrix: This is one of the most important matrices in graph theory. It has many magical properties: -is positive semi-definite - Smallest eigenvalue is, corresponding eigenvector is all-ones vector - Multiplicity of eigenvalueequals the number of connected components -(quadratic form's physical meaning: sum of differences between adjacent node values)

Normalized Laplacian: Eigenvalues in, convenient for spectral analysis.

Spectral Graph Theory: Fourier Analysis on Graphs

Fourier transform decomposes time signals into sine waves of different frequencies. How do we define "frequency" on graphs?

The answer comes from the Laplacian matrix's eigendecomposition. Let, where's columns are eigenvectors andis the diagonal eigenvalue matrix.

Graph Fourier Transform is defined as:whereis a signal on the graph (each node has one value). Inverse transform is.

Eigenvaluesplay the role of "frequencies": - Small eigenvalues correspond to "low frequency": eigenvectors have similar values on neighboring nodes (smooth variation) - Large eigenvalues correspond to "high frequency": eigenvectors have sharp changes between neighbors

Spectral Clustering leverages this idea: use the Laplacian's smallest non-zero eigenvectors to embed graph nodes, then do K-means clustering in the embedding space. Intuitively, nodes in the same community have similar values in low-frequency eigenvectors.

Graph Convolutional Networks (GCN)

Traditional CNNs work well on regular grids (like images) but can't directly apply to graph-structured data. Graph Convolutional Networks solve this.

Spectral Convolution definition:This amounts to element-wise multiplication in frequency domain, then transforming back. The problem is needing to compute the entire Laplacian's eigendecomposition, withcomplexity.

ChebNet's Approximation: Approximate filters with Chebyshev polynomials:whereis the scaled Laplacian andare Chebyshev polynomials. This only needs-hop neighbor information, reducing complexity to.

GCN Simplification: Takewith some approximations to get the famous GCN layer:Here(adding self-loops),is's degree matrix.is the normalized adjacency matrix.

Intuitive Understanding: Each GCN layer does: 1. Aggregate neighbor features () 2. Symmetric normalization (prevent high-degree nodes from dominating) 3. Linear transformation () 4. Nonlinear activation ()

Multiple GCN layers aggregate multi-hop neighbor information.

Message Passing Neural Networks

GCN can be viewed as a special case of the message passing framework. In this framework, each layer's update has three steps:

Message Computation:
Message Aggregation:
Node Update:Different GNN variants correspond to different MSG, AGG, UPD function choices:

GraphSAGE: Uses sampling and different aggregators (mean, max, LSTM)
GAT (Graph Attention Network): Attention-weighted aggregation
GIN (Graph Isomorphism Network): Theoretically most powerful message passing GNN

Graph Neural Network Applications

Molecular Property Prediction: Molecules can be viewed as graphs (atoms are nodes, chemical bonds are edges). GNNs can predict various molecular properties like solubility, toxicity, drug activity. AlphaFold for protein structure prediction also uses similar techniques.

Recommender Systems: User-item interactions can be modeled as bipartite graphs. GNNs can learn user and item embeddings for recommendations.

Knowledge Graphs: Entities are nodes, relations are edges. GNNs can do link prediction (predicting missing relations) and node classification (entity classification).

Physical Simulation: Particle systems can be modeled as dynamic graphs. Graph Network Simulators can learn complex physical dynamics.

Linear Algebra in the Era of Large Models

Mathematical Structure of Transformers

Transformers are the foundational architecture for modern large language models (GPT, BERT, LLaMA). Their core is self-attention, which is entirely linear algebra operations.

Given input sequence embedding matrix(is sequence length,is embedding dimension), self-attention computes:

Breaking down this formula:

1.is anmatrix,represents position's "attention score" to position— essentially the inner product of query and key vectors, measuring their similarity.

Dividing byprevents softmax saturation (vanishing gradients) when inner products are too large.
Softmax normalizes attention scores into a probability distribution, with each row summing to 1.
Multiplying byis weighted summation: each position's output is a weighted combination of all positions' value vectors, with weights being attention scores.

Geometric Intuition: Attention performs soft retrieval. Query is "what I'm looking for," Key is "what I have," Value is "what I can provide." By matching queries and keys via inner products, then aggregating from relevant values.

Multi-Head Attention projects input into multiple subspaces:Each head focuses on different "patterns" (like grammatical relations, semantic relations, positional relations).

Mathematics of Positional Encoding

Transformers have no recurrence or convolution, so how do they perceive position? The answer is positional encoding.

Sinusoidal Positional Encoding (original Transformer):$

$ This design has elegant mathematical properties: - Different positions have different encodings - Relative positions can be expressed via linear transformation:can be linearly transformed from - Encoding vectors have constant norm

Rotary Position Embedding (RoPE) is a more modern approach, encoding positions through complex rotations:RoPE's advantage is that relative position information naturally integrates into inner product computation.

Parameter-Efficient Fine-Tuning for Large Models

Large language models have billions or even trillions of parameters, making full fine-tuning extremely expensive. Linear algebra provides elegant solutions.

LoRA (Low-Rank Adaptation): Assumes weight changes during fine-tuning are low-rank. Instead of directly updating, add a low-rank decomposition:where,, and.

Trainable parameters drop fromto. For example, withand, parameters reduce by Intuition: Neural network weight matrices are often "intrinsically low-rank." Fine-tuning only needs adjustments in this low-dimensional subspace, not all parameters.

QLoRA combines quantization: base model stored in 4-bit quantization, only LoRA part is full precision. This makes fine-tuning 65B models on consumer GPUs possible.

KV Cache and Inference Optimization

During autoregressive generation, each new token needs attention over all previous tokens. Naive implementation requirescomputation per token,for length-sequences.

KV Cache exploits the fact that previous tokens' Keys and Values don't change. We cache them, computing only the new token's Q, K, V per step, then concatenating with the cache. This reduces per-step computation to(attention score computation is still).

This is classic space-time tradeoff. KV cache size is(is number of layers), potentially becoming a memory bottleneck for long sequences.

Sparse Computation and Efficient Inference

Sparse Attention

Standard attention hascomplexity. When sequence lengthis large (like processing an entire book or long video), this becomes prohibitive.

Sparse Attention reduces complexity by computing only some position pairs' attention:

Local Attention: Each position only attends to nearby windows. Complexity, whereis window size.

Dilated Attention: Attend to tokens at intervals, can cover globally with.

Longformer/BigBird: Combines local attention, global attention (certain special tokens see all positions), and random attention.

Mathematically, sparse attention is equivalent to setting most elements ofto(becoming 0 after softmax). This is sparse matrix multiplication in matrix operations.

Linear Attention and Kernel Approximation

Another approach uses kernel methods to approximate softmax attention:By first computing(matrix), then multiplying by, complexity drops fromto.

Performer uses random features to approximate the softmax kernel. Linear Transformer directly removes softmax, but may lose expressiveness.

Model Quantization

Quantization converts high-precision (FP32, FP16) weights and activations to low-precision (INT8, INT4) representations.

Linear Algebra Perspective: Quantization can be viewed as finding a discrete grid to approximate continuous values. Given original weightsand quantization function, we want:

Symmetric Quantization:, whereis the scale factor.

Asymmetric Quantization:, whereis the zero-point.

Per-Channel vs Per-Tensor Quantization: Different channels (or layers) may have very different value ranges. Per-channel quantization uses different scale factors for each channel, higher precision but more overhead.

GPTQ is a second-order information-based quantization method. It considers the Hessian matrix to minimize quantization error:This is a weighted matrix approximation problem, efficiently solvable with Cholesky decomposition.

Model Pruning

Pruning removes unimportant weights (sets to zero), creating sparsity.

Unstructured Pruning: Any position's weights can be pruned. Can achieve high sparsity (90%+), but hardware acceleration is difficult.

Structured Pruning: Prune entire rows, columns, or convolution kernels. Easier to accelerate, but typically lower sparsity.

Importance Metrics: - Magnitude: Smallweights are unimportant - Gradient: Smallweights are unimportant - Second-order: Consider Hessian diagonal

Sparse Matrix Storage: CSR, CSC, COO formats efficiently store sparse matrices. Modern GPUs (like NVIDIA Ampere) have specialized sparse tensor cores supporting 2:4 sparsity (at most 2 non-zeros per 4 elements).

Frontier Research Directions in Linear Algebra

Tensor Networks and Quantum State Representation

Tensor Networks are compact representations of high-dimensional tensors. An-dimensional tensor haselements (is each dimension's size), exponential growth quickly exceeds storage capacity.

Matrix Product States (MPS) are the simplest tensor networks:Storage drops fromto, whereis the "bond dimension."

MPS is used in quantum physics to represent 1D quantum system ground states. DMRG (Density Matrix Renormalization Group) is an MPS-based variational method.

More complex tensor networks include PEPS (2D), MERA (Multi-scale Entanglement Renormalization), etc. They have important applications in quantum many-body physics and quantum machine learning.

Randomized Numerical Linear Algebra

Randomized methods are transforming numerical linear algebra. Traditional algorithms are deterministic; randomized algorithms use probabilistic methods for faster speeds.

Randomized SVD: Instead of computing full SVD, use random projections for approximate low-rank factorization: 1. Generate random matrix$^{n (k+p)}Y = AYY = QRB = Q^T ABO(mn(m,n))O(mnk)k (m,n)$.

Johnson-Lindenstrauss Lemma is the theoretical foundation: high-dimensional points can be randomly projected to low dimensions while approximately preserving pairwise distances.

Implicit Neural Representations

Implicit Neural Representations (INR) use neural networks to represent continuous signals (images, 3D shapes, videos). Given coordinates, the network outputs the value at that point.

For example, NeRF (Neural Radiance Fields) uses an MLP to represent 3D scenes:Input is 3D positionand viewing direction, output is densityand color. Volume rendering equation integration produces images.

INR's core learns a mapping from coordinates to values. Positional Encoding (Fourier features) helps networks learn high-frequency details:This is closely related to sinusoidal positional encoding.

Neural PDE Solvers

Physics-Informed Neural Networks (PINN) use neural networks to solve partial differential equations. The network directly parameterizes the solution function, training optimizes:PDE loss penalizes equation residuals, BC/IC losses penalize boundary/initial condition violations.

Automatic differentiation lets us easily compute derivatives of any order. This relies on chain rule — matrix multiplication in linear algebra.

Neural ODE views neural networks as continuous dynamical systems:ResNet's residual connections can be seen as Euler discretization of ODEs. Neural ODE uses adaptive ODE solvers for forward pass and adjoint method for backpropagation.

Complete Series Review and Knowledge Map

Three Perspectives on Linear Algebra

Three ways of viewing linear algebra run through the entire series:

Algebraic Perspective: Matrices are arrays of numbers, operations follow specific rules. This is the foundation for computation.

Geometric Perspective: Matrices are linear transformations, vectors are arrows in space. This is the source of intuition.

Abstract Perspective: Vector spaces are sets satisfying axioms, linear maps preserve structure. This is the key to generalization.

The three perspectives complement each other. Algebra tells us "how to compute," geometry tells us "what it means," abstraction tells us "why it works."

Core Concept Network

                 Vector Space
                /            \
          Basis, Dimension    Subspaces
                \            /
                 Linear Transformation
                /      |      \
           Matrix    Null Space    Range
          /    \       |
     Determinant    Rank    Four Fundamental Subspaces
        |           |
     Invertibility   Rank-Nullity Theorem
        |
    Eigendecomposition
   /              \
Diagonalization    Jordan Normal Form
   \              /
      Spectral Theory
           |
   Spectral Theorem for Symmetric Matrices
           |
         SVD
       / | \
 Low-rank  Pseudoinverse  PCA
 Approximation
           |
 Modern Applications (ML/DL/Quantum)

Key Takeaways from Each Chapter

Chapter	Topic	Core Insight
1	Vectors	Vectors have magnitude and direction, also elements of function spaces
2	Vector Spaces	Eight axioms define spaces where linear combinations make sense
3	Linear Transformations	Matrices and linear transformations correspond one-to-one; basis choice is key
4	Determinants	Determinant is signed volume scaling factor, zero iff singular
5	Linear Systems	Solution space structure determined by four fundamental subspaces
6	Eigenvalues	Eigenvectors are invariant directions, eigenvalues are scaling factors
7	Orthogonality	Inner products provide length and angle, orthogonal bases are best
8	Symmetric Matrices	Real symmetric matrices are orthogonally diagonalizable, all real eigenvalues
9	SVD	Any matrix decomposes as rotation-scaling-rotation
10	Norms & Condition	Condition number measures problem sensitivity
11	Matrix Calculus	Gradient points in direction of fastest change, chain rule is fundamental
12	Sparsity	L1 regularization induces sparsity, compressed sensing breaks Shannon limit
13	Tensors	Tensors are multidimensional arrays, decomposition reveals hidden structure
14	Random Matrices	High-dimensional randomness has surprising regularity (Marchenko-Pastur, semicircle)
15	Machine Learning	PCA maximizes variance, kernel methods are implicit high-dim mappings
16	Deep Learning	Neural networks are layered matrix multiplication plus nonlinearity
17	Computer Vision	Camera is projection matrix, 3D reconstruction is inverse problem
18	Frontiers	Quantum gates are unitary matrices, graph convolution is Laplacian filtering

Most Important Theorems

Dimension Theorem:
Rank-Nullity Theorem:
Spectral Theorem: Real symmetric matrices are orthogonally diagonalizable
SVD Existence: Every matrix has an SVD decomposition
Eckart-Young Theorem: Truncated SVD is optimal low-rank approximation
Johnson-Lindenstrauss Lemma: High-dim points embed in low-dim with low distortion

Learning Advice and Resource Recommendations

Building Intuition

Visualization: Use GeoGebra, Manim (the library 3Blue1Brown uses), or write your own code to visualize linear transformations. Seeing how matrices warp grids is more intuitive than any formula.

Small Examples First: When learning new concepts, manually compute a few 2x2 or 3x3 examples. Only through hands-on calculation does true understanding emerge.

Ask "Why": Don't settle for "this formula is this way." Ask: Why is the determinant defined this way? Why are eigenvalues related to trace and determinant? Why does SVD always exist?

Connect to Applications: For each concept, think about where it's useful. Eigendecomposition for PageRank, SVD for recommender systems, orthogonal matrices for computer graphics...

Advanced Learning Paths

For Deeper Mathematics: - Abstract algebra (groups, rings, fields, modules) - Functional analysis (infinite-dimensional vector spaces) - Algebraic geometry (geometry of polynomial equations)

For Applications Focus: - Numerical Linear Algebra (Trefethen & Bau) - Convex Optimization (Boyd & Vandenberghe) - Statistical Learning Theory (Hastie, Tibshirani, Friedman)

For Research: - Random Matrix Theory - Tensor Decomposition and Multilinear Algebra - Quantum Information and Quantum Computing

Recommended Resources

Classic Textbooks: - Gilbert Strang's Introduction to Linear Algebra: Intuition-first, great for beginners - Sheldon Axler's Linear Algebra Done Right: Elegant proofs, abstract perspective - Trefethen & Bau's Numerical Linear Algebra: Essential for numerical methods - Golub & Van Loan's Matrix Computations: Engineer's bible

Online Courses: - MIT 18.06 (Gilbert Strang): Free on YouTube, classic of classics - 3Blue1Brown's Essence of Linear Algebra: Visual introduction, highly recommended first - Stanford CS229 (Machine Learning): Linear algebra in ML applications

Software Tools: - NumPy/SciPy: Python scientific computing - MATLAB/Octave: Traditional numerical computing - Julia: Modern scientific computing language - PyTorch/JAX: Deep learning frameworks with automatic differentiation

Papers and Surveys: - "The Matrix Cookbook": Matrix formula reference - "Randomized Numerical Linear Algebra" survey - "Attention is All You Need" (Transformer original paper)

Exercises

Quantum Computing Basics

Exercise 1: Prove the Hadamard gateis unitary, i.e., verify.

Exercise 2: Computeand, and describe these quantum states as points on the Bloch sphere.

Exercise 3: Prove Pauli matrices,,satisfy anti-commutation relations:,,.

Exercise 4: Can the Bell statebe written as a tensor product of two single-qubit states? Prove your answer.

Exercise 5: Design a quantum circuit that transformsto.

Graph Neural Networks

Exercise 6: For the graph below, write its adjacency matrix, degree matrix, and Laplacian matrix:

1
2
3

1 -- 2
|    |
3 -- 4

Exercise 7: Compute the Laplacian matrix eigenvalues for the above graph, verify the smallest is 0, and explain its meaning.

Exercise 8: Prove for any graph signal, we have. What's the physical meaning of this quadratic form?

Exercise 9: Explain why the normalized Laplacianhas eigenvalues in.

Exercise 10: In the GCN layer, what problems arise if we don't add self-loops (usingdirectly)?

Large Models and Efficient Computing

Exercise 11: In self-attention, why divide by? What happens ifis large without scaling?

Exercise 12: In LoRA, if original weight matrixwith rank, compute: - Original parameter count - LoRA trainable parameter count - Parameter reduction ratio

Exercise 13: Explain why KV cache speeds up autoregressive generation. For sequence length,layers, K/V dimensionper layer, how much storage does KV cache need?

Exercise 14: Design an INT8 symmetric quantization scheme for a tensor with weights in. Write quantization and dequantization formulas.

Exercise 15: What's the complexity of sparse attention (attending to onlytokens before and after)? How much savings compared to standardattention?

Comprehensive Applications

Exercise 16: Design a simple GNN for molecular polarity prediction. - Input: Molecular graph (atoms are nodes, chemical bonds are edges) - Output: Polarity (0 or 1) - Describe what node features, edge features, and network structure you'd use

Exercise 17: Suppose you need to run a 7B parameter language model for inference on an 8GB VRAM GPU. Calculate: - Model weight space (FP16)? - With INT4 quantization? - Is it feasible? What else needs consideration?

Exercise 18: Derive how the GCN layer can be viewed as a 1st-order Chebyshev approximation of spectral convolution. Starting from spectral graph convolution definition, show how to arrive at.

Exercise 19: Compare these three approaches for handling long sequences: - Sparse attention (local window) - Linear attention (kernel approximation) - Sliding window + global tokens

Analyze their pros, cons, and suitable scenarios.

Exercise 20: Design an architecture combining GNN and Transformer for molecular property prediction. Explain how to leverage both molecular graph structure and atomic sequence information.

Programming Practice

Exercise 21: Implement with NumPy: - Hadamard gate and CNOT gate - Simulate a simple quantum circuit: Exercise 22: Implement a simple GCN layer with PyTorch, do node classification on Karate Club dataset.

Exercise 23: Implement a LoRA layer, verify that whenapproaches, LoRA equals full update.

Exercise 24: Implement INT8 symmetric quantization and dequantization functions, test quantization error on a pretrained model's weights.

Exercise 25: Compare standard attention and sparse attention (window size) runtime for different sequence lengths, plot comparison graphs.

Conclusion

Linear algebra is both an ancient and young discipline. Ancient because its basic concepts — vectors, matrices, linear transformations — have over two hundred years of history; young because it finds new life in each generation of technology.

From 19th century equation solving, to 20th century quantum mechanics, to 21st century machine learning and artificial intelligence, linear algebra remains the universal language of science and technology. Quantum computers use unitary matrices to describe quantum gates, graph neural networks use Laplacian matrices to propagate information, large language models use attention matrices to capture semantic relationships — the underlying mathematical essence remains the same.

Learning linear algebra isn't just learning computational techniques, it's learning a way of thinking: - Represent states with vectors - Represent transformations with matrices - Reveal structure through decomposition - Solve problems through optimization

I hope this series helps you: - Build solid conceptual foundations and geometric intuition - See the deep connections between linear algebra and modern technology - Gain motivation and direction for further study

Mathematics is not about memorization, but understanding. The beauty of linear algebra lies in its elegant structure and powerful applications.

Thank you for reading the complete "Essence of Linear Algebra" series!

This is Chapter 18 — the final chapter — of the 18-part "Essence of Linear Algebra" series.