PDE and Machine Learning (2) - Neural Operator Theory

Traditional Physics-Informed Neural Networks (PINNs) have a fatal flaw: they can only solve one problem at a time. Given a specific initial condition, train a network, and obtain the solution to that particular problem. What if the initial condition changes? Sorry, retrain. Need to handle 1000 different initial conditions? Train 1000 times.

This is catastrophic in practical applications. Imagine designing an aircraft wing and needing to test airflow under different wind speeds and angles of attack. Or doing weather forecasting where initial conditions change daily. Do you really need to retrain the network every time?

Neural Operators completely change the game. They learn not "the solution to a specific problem," but the mapping itself from initial conditions to solutions—an operator. Once this operator is learned, given any new initial condition, you only need one forward pass to get the solution. Train once, use forever.

What is the mathematical foundation of this capability? How do we design network architectures to learn infinite-dimensional function mappings? Why can Fourier transforms help us? This article will explore these questions in depth, starting from rigorous theory in functional analysis, analyzing the two mainstream architectures Fourier Neural Operator (FNO) and DeepONet in detail, and verifying the theory through complete experiments.

From Single Solutions to Operator Families

The Dilemma of PINNs

Consider the 1D Burgers equation:

where is the viscosity coefficient. Traditional numerical methods (finite difference, finite element) or PINNs need to re-solve for each different initial condition :

Finite difference method: Reconstruct and solve the linear system
PINN: Retrain the neural network, with loss function depending on

If you need to handle 1000 different initial conditions, you must repeat 1000 times. This is completely infeasible in scenarios like parametric PDEs, uncertainty quantification, and optimization design.

Operator Learning: Learning the Mapping Itself

The core insight of neural operators: The solution operator of a PDE is a mapping from the initial condition space to the solution space. For the Burgers equation, the solution operator is defined as:

That is, given initial condition , the operator directly outputs the solution at time .

Once this operator is learned, for any new initial condition , only one forward pass is needed:

No retraining, no re-solving the PDE.

Fourier Neural Operator (FNO) architecture diagram

Function Space Theory

Banach Spaces and Hilbert Spaces

Function Space Foundations: Banach and Hilbert Spaces

Comparison of operator learning and traditional methods

Why Do We Need Function Spaces?

In machine learning, we often deal with functions. Neural networks themselves are functions, and probability distributions are also functions. But how large is the "set" of these functions? How do we measure the distance between two functions? This is the concept of function spaces.

A deeper question: If I want to learn a mapping where both inputs and outputs are functions (not numbers), what should I do?

Traditional neural networks: Input is a vector (finite-dimensional), output is a vector
Neural operators: Input is a function (infinite-dimensional), output is a function

Example:

Traditional problem: Input image (pixel array), output category
Operator learning problem: Input initial temperature field (function), output temperature field after one hour (function)

🎓 Intuitive Understanding: The "World" of Functions

Analogy: Function space is like a "function library".

Ordinary library: Each book is independent, you can compare the thickness of two books (distance)
Function library: Each function is a "book", we need to define how to measure the "distance" between two functions

Imagine a "world of functions": each point is not a number, but an entire function. Just as we use a meter stick to measure distance in 3D space, in function space we also need to define "distance between functions".

Concrete example: - Function 1: - Function 2:

How "close" are they? We need a concept of "function distance".

📐 Semi-Rigorous Explanation: Banach Spaces

Step 1: Norm

A norm is a generalization of "length". For a function , the norm is a non-negative real number measuring the "size" of the function.

Common function norms: - norm: (the "total mass" of the function) - norm: (the "energy" of the function) - norm: (the "maximum value" of the function)

Step 2: Metric

With a norm, distance is:

Step 3: Completeness

Key property: Banach spaces guarantee "Cauchy sequences converge".

Intuitive understanding: If a series of functions get closer and closer to each other (Cauchy sequence), then there must exist a "limit function" such that .

Analogy: Rational numbers are not complete (the rational approximation sequence of has no rational limit), but real numbers are complete. Similarly, the space of continuous functions is not complete, but space is complete.

Concrete example: - Elements: All square-integrable functions on - Norm: - Completeness: If (as ), then there exists such that

📚 Rigorous Definition

The mathematical framework of operator learning is built on function space theory. We first review several key concepts.

Definition 1 (Banach Space): Let be a vector space over the real or complex field , and be a norm on . If is complete under the norm (i.e., every Cauchy sequence converges), then is called a Banach space.

Common Banach spaces include:

Space of continuous functions: , with norm
spaces: , with norm

Definition 2 (Hilbert Space): Let be a vector space. If there exists an inner product such that is complete under the norm induced by the inner product, then is called a Hilbert space.

The most important Hilbert space is , with inner product defined as:

where denotes complex conjugate (for real functions, it's just ).

Sobolev Spaces

Sobolev spaces are core function spaces in PDE theory, characterizing the "smoothness" of functions.

Definition 3 (Sobolev Space): Let be a bounded open set, , . The Sobolev space is defined as:

where is the weak derivative (derivative in the distributional sense), and is a multi-index. The norm is defined as:

In particular, when , we write , which is a Hilbert space with inner product:

The importance of Sobolev spaces lies in: Weak solutions of PDEs typically belong to some Sobolev space. For example, solutions to elliptic PDEs usually belong to .

Compact Operators and Fredholm Theory

Definition 4 (Compact Operator): Let be Banach spaces. A linear operator is called compact if maps bounded sets to relatively compact sets (i.e., sets with compact closure).

Compact operators are natural generalizations of finite-dimensional operators. In infinite-dimensional spaces, compact operators have many nice properties, similar to finite-dimensional matrices.

Theorem 1 (Fredholm Alternative): Let be a compact operator, . Then either the equation has a unique solution for any , or the homogeneous equation has a non-zero solution.

Fredholm theory is crucial in PDE theory, guaranteeing the existence and uniqueness of solutions to elliptic PDEs.

Universal Approximation Theorem

Why Do We Need Universal Approximation Theorems?

We all know that neural networks can approximate any function (where inputs and outputs are vectors). But now the problem is upgraded: Can neural operators approximate any operator (where inputs and outputs are functions)?

This is not a theoretical game, but a practical question: - Physical simulation: Given initial conditions (function), predict future state (function) - Image processing: Input low-resolution image (function), output high-resolution image (function) - Climate prediction: Input current pressure field (function), output tomorrow's pressure field (function)

🎓 Intuitive Understanding: The Leap from Points to Functions

Classical neural network universal approximation theorem:

Given any continuous function , there exists a sufficiently wide neural network that can approximate it to arbitrary precision.

Analogy: You can build any shape of building with Lego blocks (as long as you have enough blocks).

Neural operator universal approximation theorem:

Given any continuous operator (where are function spaces), there exists a neural operator that can approximate it to arbitrary precision.

Upgraded analogy: With Lego blocks, you can not only build individual buildings, but also build "transformation rules for buildings"—input one building blueprint (function), output another building blueprint (function).

Core challenge: - The input space is infinite-dimensional (functions have infinitely many degrees of freedom) - Need to work for all possible input functions (generalize to function space)

📐 Semi-Rigorous Explanation: Core Idea of Chen-Chen Theorem

Setup: - Input space: (continuous functions) - Output space: - Target operator: , mapping functions to real numbers

Key insight: Although functions are infinite-dimensional, we can capture enough information through finite sampling points!

Geometric picture: Imagine an operator in function space that maps a curve to a number. The neural operator samples a finite number of points on the curve, then uses information from these points to reconstruct the answer.

📚 Rigorous Theorem

Chen-Chen Theorem (1995)

The theoretical foundation of neural operators can be traced back to the universal approximation theorem proved by Chen and Chen in 1995.

Theorem 2 (Chen-Chen, 1995): Let , be compact sets, and be a continuous operator. Then for any , there exist positive integers and a real-valued function (which can be any non-polynomial continuous function), along with parameters , such that:

holds for all and .

This theorem shows: Any continuous operator can be approximated to arbitrary precision by a two-layer neural network (with activation functions). This provides the theoretical foundation for DeepONet.

Theoretical Foundation of DeepONet

The architecture design of DeepONet is directly inspired by the Chen-Chen theorem. The double summation structure in the theorem corresponds to DeepONet's Branch-Trunk decomposition:

where: - is the output of the Branch network (depends on input function ) - is the output of the Trunk network (depends on query point )

Theorem 3 (DeepONet Universal Approximation): Let be a continuous operator from Banach space to Banach space , with compact subsets and of and respectively. Then for any , there exists a DeepONet such that:

Proof idea: Using the Chen-Chen theorem, the Branch network learns , the Trunk network learns , and then obtains the final output through linear combination.