PDE and Machine Learning (2) — Neural Operator Theory

Traditional Physics-Informed Neural Networks (PINNs) have a fatal flaw: they can only solve one problem at a time. Given a specific initial condition, train a network, and get the solution to that particular problem. What if the initial condition changes? Sorry, retrain. Need to handle 1000 different initial conditions? Train 1000 times.

This is catastrophic in practical applications. Imagine designing an aircraft wing and needing to test airflow under different wind speeds and angles of attack. Or doing weather forecasting where initial conditions change daily. Do you really need to retrain the network every time?

Neural Operators completely change the game. They learn not "the solution to a specific problem," but the mapping from initial conditions to solutions itself — an operator. Once this operator is learned, given any new initial condition, you only need one forward pass to get the solution. Train once, use forever.

What is the mathematical foundation of this capability? How to design network architectures to learn infinite-dimensional function mappings? Why can Fourier transforms help us? This article will explore these questions in depth, starting from rigorous theory in functional analysis, analyzing the two mainstream architectures FNO and DeepONet in detail, and verifying theory through complete experiments.

From Single Solutions to Operator Families

PINN's Dilemma

Consider the 1D Burgers equation:

where is the viscosity coefficient. Traditional numerical methods (finite difference, finite element) or PINNs need to re-solve for each different initial condition :

Finite Difference Method: Rebuild and solve the linear system
PINN: Retrain the neural network, loss function depends on

If you need to handle 1000 different initial conditions, you must repeat 1000 times. This is completely infeasible in scenarios like parametric PDEs, uncertainty quantification, and optimization design.

Operator Learning: Learning the Mapping Itself

The core insight of neural operators: The solution operator of a PDE is a mapping from the initial condition space to the solution space. For the Burgers equation, the solution operator is defined as:

That is, given initial condition , operator directly outputs the solution at time .

Once this operator is learned, for any new initial condition , only one forward pass is needed:

No retraining, no re-solving the PDE.

Fourier Neural Operator (FNO) Architecture

Function Space Theory

The mathematical framework of operator learning is built on function space theory. Let's first review several key concepts.

Operator Learning vs Traditional Methods

Function Space Foundations: Banach and Hilbert Spaces

Why Do We Need Function Spaces?

In machine learning, we often deal with functions. For example, neural networks themselves are functions, and probability distributions are also functions. But how large is the "set" of these functions? How do we measure the distance between two functions? This is the concept of function spaces.

A deeper question: If I want to learn a mapping where both inputs and outputs are functions (not numbers), what should I do?

Traditional neural networks: Input is a vector (finite-dimensional), output is a vector
Neural operators: Input is a function (infinite-dimensional), output is a function

Example:

Traditional problem: Input image (pixel array), output category
Operator learning problem: Input initial temperature field (function), output temperature field after one hour (function)

Universal Approximation Theorem

Why Do We Need Universal Approximation Theorems?

We all know neural networks can approximate any function (where inputs and outputs are vectors). But now the problem escalates: Can neural operators approximate any operator (where inputs and outputs are functions)?

This isn't a theoretical game, but a practical question: - Physical simulation: Given initial conditions (function), predict future state (function) - Image processing: Input low-resolution image (function), output high-resolution image (function) - Climate prediction: Input current pressure field (function), output tomorrow's pressure field (function)

Chen-Chen Theorem (1995)

The theoretical foundation of neural operators can be traced back to the universal approximation theorem proved by Chen and Chen in 1995.

Theorem 2 (Chen-Chen, 1995): Let , be compact sets, be a continuous operator. Then for any , there exist positive integers and a real-valued function (can be any non-polynomial continuous function), along with parameters , such that:

holds for all and .

This theorem shows: Any continuous operator can be approximated to arbitrary precision by a two-layer neural network (with activation functions). This provides the theoretical foundation for DeepONet.

Fourier Neural Operator (FNO)

Motivation for FNO

Insights from the Convolution Theorem

Consider a linear PDE:

where is a linear differential operator. If is translation-invariant (like the Laplacian ), the solution can be written in convolution form:

In the frequency domain:

For nonlinear PDEs, although they cannot be directly written as convolutions, a local nonlinearity + global linearity hybrid structure is common. FNO's design philosophy is:

Handle the linear part in the frequency domain (using the convolution theorem)
Handle the nonlinear part in the spatial domain (pointwise nonlinear transformations)

FNO Architecture Details

Mathematical Derivation of Fourier Layers

The overall FNO architecture is as follows:

where: - : Lifting operator, maps input to higher-dimensional space - : Fast Fourier Transform (FFT) - : Learnable frequency-domain multiplication operator (complex matrix) - : Inverse FFT - : Activation function (e.g., GELU) - : Learnable linear transformation (handles aliasing from low-frequency truncation) - : Projection operator, maps features back to output space

Key Design 1: Frequency Truncation

In practice, we only keep low-frequency components:

where is the maximum number of frequency modes. This is based on spectral decay of smooth functions: for sufficiently smooth functions, high-frequency components are small and can be safely truncated.

Key Design 2: Aliasing Handling

FFT requires the input to be a periodic function. For non-periodic functions, high-frequency components will "alias" into low frequencies. The term learns and corrects this aliasing effect.

DeepONet

Architecture Design

Branch Network and Trunk Network

The core idea of DeepONet is operator decomposition:

where: - Branch network: , encodes input function into a -dimensional vector - Trunk network: , encodes query point into a -dimensional vector - Inner product: represents the contribution of the -th "mode" at point

The physical meaning of this decomposition: - Branch network learns "feature modes" of the input function - Trunk network learns the "spatial distribution" of these modes - Inner product combination yields the final output

This article provides a comprehensive theoretical foundation for neural operators, covering function spaces, universal approximation theorems, FNO and DeepONet architectures, and their mathematical properties. The key insight is that neural operators learn mappings between function spaces, enabling resolution invariance and parameter generalization — train once, use for any resolution or parameter configuration.

Summary and Outlook

Neural operator theory brings revolutionary changes to scientific computing: from "one PDE, one solution" to "train once, use forever". FNO and DeepONet, as the two mainstream architectures, each have advantages:

FNO: Based on spectral methods, suitable for periodic boundaries and translation-invariant problems
DeepONet: Based on branch-trunk structure, flexibly handles various operators

Both inherit theoretical guarantees from universal approximation theorems and demonstrate resolution invariance in practice — train on coarse grids, test on fine grids.

Core Takeaways

Function space theory: Banach spaces, Hilbert spaces, Sobolev spaces provide rigorous mathematical frameworks
Universal approximation theorems: Chen-Chen theorem and Kovachki-Stuart theorem guarantee the expressive power of neural operators
DeepONet: Uses branch network to encode input functions, trunk network to encode query locations
FNO: Learns convolution kernels in the frequency domain, achieving global receptive fields and resolution invariance
Spectral analysis: Low-frequency dominated convergence behavior, consistent with Fourier series expansion of PDEs

Challenges and Future Directions

Theoretical aspects:
- More precise error bounds (how to quantify the relationship between approximation error and network capacity)
- Generalization theory (how distribution differences between training and test sets affect performance)
- Handling complex geometries (non-periodic boundaries, irregular domains)
Algorithmic aspects:
- Integration with physical constraints (how to incorporate conservation laws, symmetries)
- Adaptive mesh refinement (increasing resolution in high-gradient regions)
- Multi-scale modeling (operator learning across spatiotemporal scales)
Application aspects:
- Inverse problem solving (parameter identification, data assimilation)
- Uncertainty quantification (Bayesian neural operators)
- Multi-physics coupling (fluid-structure interaction, electromagnetic-thermal coupling)

Neural operators are becoming a new paradigm in scientific computing, showing great potential in weather forecasting, materials design, drug discovery, and other fields. With deepening theoretical research and increasing computational power, we have reason to believe that neural operators will play an even more important role in the future.

References

Li, Z., et al. (2020). Fourier Neural Operator for Parametric Partial Differential Equations. arXiv:2010.08895.
Lu, L., et al. (2019). Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators. Nature Machine Intelligence, 3(3), 218-229.
Kovachki, N., et al. (2023). Neural Operator: Learning Maps Between Function Spaces with Applications to PDEs. Nature Reviews Physics.
Chen, T., & Chen, H. (1995). Universal approximation to nonlinear operators by neural networks with arbitrary activation functions and its application to dynamical systems. IEEE Transactions on Neural Networks, 6(4), 911-917.