Traditional Physics-Informed Neural Networks (PINNs) have a fatal flaw: they can only solve one problem at a time. Given a specific initial condition, train a network, and get the solution to that particular problem. What if the initial condition changes? Sorry, retrain. Need to handle 1000 different initial conditions? Train 1000 times.
This is catastrophic in practical applications. Imagine designing an aircraft wing and needing to test airflow under different wind speeds and angles of attack. Or doing weather forecasting where initial conditions change daily. Do you really need to retrain the network every time?
Neural Operators completely change the game. They learn not "the solution to a specific problem," but the mapping from initial conditions to solutions itself — an operator. Once this operator is learned, given any new initial condition, you only need one forward pass to get the solution. Train once, use forever.
What is the mathematical foundation of this capability? How to design network architectures to learn infinite-dimensional function mappings? Why can Fourier transforms help us? This article will explore these questions in depth, starting from rigorous theory in functional analysis, analyzing the two mainstream architectures FNO and DeepONet in detail, and verifying theory through complete experiments.
From Single Solutions to Operator Families
PINN's Dilemma
Consider the 1D Burgers equation:
where
- Finite Difference Method: Rebuild and solve the linear system
- PINN: Retrain the neural network, loss function
depends on
If you need to handle 1000 different initial conditions, you must repeat 1000 times. This is completely infeasible in scenarios like parametric PDEs, uncertainty quantification, and optimization design.
Operator Learning: Learning the Mapping Itself
The core insight of neural operators: The solution operator
of a PDE is a mapping from the initial condition space to the solution
space. For the Burgers equation, the solution operator
That is, given initial condition
Once this operator
No retraining, no re-solving the PDE.
Function Space Theory
The mathematical framework of operator learning is built on function space theory. Let's first review several key concepts.
Function Space Foundations: Banach and Hilbert Spaces
Why Do We Need Function Spaces?
In machine learning, we often deal with functions. For example, neural networks themselves are functions, and probability distributions are also functions. But how large is the "set" of these functions? How do we measure the distance between two functions? This is the concept of function spaces.
A deeper question: If I want to learn a mapping where both inputs and outputs are functions (not numbers), what should I do?
- Traditional neural networks: Input is a vector (finite-dimensional), output is a vector
- Neural operators: Input is a function (infinite-dimensional), output is a function
Example:
- Traditional problem: Input image (pixel array), output category
- Operator learning problem: Input initial temperature field (function), output temperature field after one hour (function)
Universal Approximation Theorem
Why Do We Need Universal Approximation Theorems?
We all know neural networks can approximate any function (where inputs and outputs are vectors). But now the problem escalates: Can neural operators approximate any operator (where inputs and outputs are functions)?
This isn't a theoretical game, but a practical question: - Physical simulation: Given initial conditions (function), predict future state (function) - Image processing: Input low-resolution image (function), output high-resolution image (function) - Climate prediction: Input current pressure field (function), output tomorrow's pressure field (function)
Chen-Chen Theorem (1995)
The theoretical foundation of neural operators can be traced back to the universal approximation theorem proved by Chen and Chen in 1995.
Theorem 2 (Chen-Chen, 1995): Let
holds for all
This theorem shows: Any continuous operator can be approximated to arbitrary precision by a two-layer neural network (with activation functions). This provides the theoretical foundation for DeepONet.
Fourier Neural Operator (FNO)
Motivation for FNO
Insights from the Convolution Theorem
Consider a linear PDE:
where
In the frequency domain:
For nonlinear PDEs, although they cannot be directly written as convolutions, a local nonlinearity + global linearity hybrid structure is common. FNO's design philosophy is:
- Handle the linear part in the frequency domain (using the convolution theorem)
- Handle the nonlinear part in the spatial domain (pointwise nonlinear transformations)
FNO Architecture Details
Mathematical Derivation of Fourier Layers
The overall FNO architecture is as follows:
where: -
Key Design 1: Frequency Truncation
In practice, we only keep low-frequency components:
where
Key Design 2: Aliasing Handling
FFT requires the input to be a periodic function. For non-periodic
functions, high-frequency components will "alias" into low frequencies.
The
DeepONet
Architecture Design
Branch Network and Trunk Network
The core idea of DeepONet is operator decomposition:
where: - Branch network:
The physical meaning of this decomposition: - Branch network learns "feature modes" of the input function - Trunk network learns the "spatial distribution" of these modes - Inner product combination yields the final output
This article provides a comprehensive theoretical foundation for neural operators, covering function spaces, universal approximation theorems, FNO and DeepONet architectures, and their mathematical properties. The key insight is that neural operators learn mappings between function spaces, enabling resolution invariance and parameter generalization — train once, use for any resolution or parameter configuration.
Summary and Outlook
Neural operator theory brings revolutionary changes to scientific computing: from "one PDE, one solution" to "train once, use forever". FNO and DeepONet, as the two mainstream architectures, each have advantages:
- FNO: Based on spectral methods, suitable for periodic boundaries and translation-invariant problems
- DeepONet: Based on branch-trunk structure, flexibly handles various operators
Both inherit theoretical guarantees from universal approximation theorems and demonstrate resolution invariance in practice — train on coarse grids, test on fine grids.
Core Takeaways
- Function space theory: Banach spaces, Hilbert spaces, Sobolev spaces provide rigorous mathematical frameworks
- Universal approximation theorems: Chen-Chen theorem and Kovachki-Stuart theorem guarantee the expressive power of neural operators
- DeepONet: Uses branch network to encode input functions, trunk network to encode query locations
- FNO: Learns convolution kernels in the frequency domain, achieving global receptive fields and resolution invariance
- Spectral analysis: Low-frequency dominated convergence behavior, consistent with Fourier series expansion of PDEs
Challenges and Future Directions
- Theoretical aspects:
- More precise error bounds (how to quantify the relationship between approximation error and network capacity)
- Generalization theory (how distribution differences between training and test sets affect performance)
- Handling complex geometries (non-periodic boundaries, irregular domains)
- Algorithmic aspects:
- Integration with physical constraints (how to incorporate conservation laws, symmetries)
- Adaptive mesh refinement (increasing resolution in high-gradient regions)
- Multi-scale modeling (operator learning across spatiotemporal scales)
- Application aspects:
- Inverse problem solving (parameter identification, data assimilation)
- Uncertainty quantification (Bayesian neural operators)
- Multi-physics coupling (fluid-structure interaction, electromagnetic-thermal coupling)
Neural operators are becoming a new paradigm in scientific computing, showing great potential in weather forecasting, materials design, drug discovery, and other fields. With deepening theoretical research and increasing computational power, we have reason to believe that neural operators will play an even more important role in the future.
References
- Li, Z., et al. (2020). Fourier Neural Operator for Parametric Partial Differential Equations. arXiv:2010.08895.
- Lu, L., et al. (2019). Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators. Nature Machine Intelligence, 3(3), 218-229.
- Kovachki, N., et al. (2023). Neural Operator: Learning Maps Between Function Spaces with Applications to PDEs. Nature Reviews Physics.
- Chen, T., & Chen, H. (1995). Universal approximation to nonlinear operators by neural networks with arbitrary activation functions and its application to dynamical systems. IEEE Transactions on Neural Networks, 6(4), 911-917.
- Post title:PDE and Machine Learning (2) — Neural Operator Theory
- Post author:Chen Kai
- Create time:2022-01-18 14:30:00
- Post link:https://www.chenk.top/pde-ml-2-neural-operator-theory/
- Copyright Notice:All articles in this blog are licensed under BY-NC-SA unless stating additionally.