The core task of generative models is to sample from data distributions. Traditional approaches like Variational Autoencoders (VAEs) and Generative Adversarial Networks (GANs) achieve this through explicit encoder-decoder structures or adversarial training. However, since 2020, diffusion models have rapidly emerged as the dominant paradigm in generative AI, celebrated for their exceptional generation quality and training stability. From DALL · E 2 to Stable Diffusion, from image generation to text-to-image synthesis, diffusion models are reshaping our understanding of generative AI.
Yet beneath the success of diffusion models lies a profound mathematical structure: they are essentially numerical solvers for partial differential equations (PDEs). When we add Gaussian noise to data, we are actually solving a forward diffusion process whose probability density evolution is governed by the Fokker-Planck equation; when we learn denoising models, we are actually learning Score functions whose gradients guide the reverse diffusion process; when we use DDPM or DDIM sampling, we are actually numerically solving stochastic or deterministic ordinary differential equations. This PDE perspective not only reveals the mathematical essence of diffusion models but also provides a unified framework for understanding their convergence, designing new sampling algorithms, and extending to conditional generation tasks.
This article systematically establishes this theoretical framework. We begin with classical heat equations, introducing fundamental tools such as Fick's law, Gaussian kernels, and Fourier transforms. We then introduce stochastic differential equations (SDEs) and the Fokker-Planck equation, demonstrating how diffusion processes can be formalized as probability density evolution. Next, we focus on Score-Based generative models, deriving Score Matching objective functions and establishing connections between Langevin dynamics and sampling processes. Finally, we delve into DDPM and DDIM, showing how they serve as discretization schemes for SDEs/ODEs, and validate theoretical predictions through four complete experiments.
Heat Equation and Diffusion Processes: From Fick's Law to Gaussian Kernels
Fick's Law and the Diffusion Equation
Diffusion phenomena are ubiquitous in nature: a drop of ink gradually disperses in clear water, heat transfers from high-temperature regions to low-temperature regions, molecules move driven by concentration gradients. The mathematical description of these processes all reduces to the diffusion equation, also known as the heat equation.
Fick's First Law (1855): The diffusion flux
Mass Conservation: Consider a spatial region
Higher-dimensional form:
Gaussian Kernels: Fundamental Solutions of the Diffusion Equation
The diffusion equation has analytical solutions, and its fundamental solution is the Gaussian kernel.
One-dimensional case: Consider the initial
condition
Higher-dimensional case: In
Physical interpretation: - Sharp distributions at
the initial time (such as Dirac delta) gradually "diffuse" over time,
with increasing variance - Any initial distribution can be viewed as a
linear combination of Dirac deltas, so the solution is the convolution
of the initial distribution with the Gaussian kernel - As
Fourier Transform and Spectral Methods
The diffusion equation has a concise form in the Fourier domain, providing powerful tools for theoretical analysis and numerical solution.
Fourier transform: Define the Fourier transform of a
function
Key properties: - Derivative
property:
Physical interpretation: - High-frequency components
(large
Numerical solution: In the Fourier domain, solutions
to the diffusion equation can be written explicitly, providing a
foundation for efficient numerical methods. For periodic boundary
conditions, Fast Fourier Transform (FFT) can achieve
Stochastic Differential Equations and the Fokker-Planck Equation
It ô Integral and Stochastic Differential Equations
Diffusion processes can be naturally described using Stochastic Differential Equations (SDEs). This provides a rigorous framework for understanding the randomness in diffusion models.
Brownian Motion: Standard Brownian motion
It ô Integral: For an adapted process
Key properties: - Zero mean:
Stochastic Differential Equation: The general form
of an SDE is:
Forward diffusion SDE: In diffusion models, the
forward process is typically written as:
Fokker-Planck Equation: Evolution of Probability Density
The Fokker-Planck equation describes how the probability density of an SDE solution evolves over time, serving as a bridge between stochastic processes and PDEs.
Theorem (Fokker-Planck Equation): Let
Kolmogorov Backward Equation
The Kolmogorov backward equation describes the evolution of conditional expectations and plays a key role in the sampling process of diffusion models.
Theorem (Kolmogorov Backward Equation): Let
Physical interpretation: - The forward equation describes probability density evolution forward from the initial time - The backward equation describes conditional expectation evolution backward from the terminal time - They are connected through the Feynman-Kac formula
Application in diffusion models: The backward equation is used to derive the reverse diffusion SDE, which is the theoretical foundation of Score-Based generative models.
Score-Based Generative Models: From Score Functions to Langevin Dynamics
Score Function: Logarithmic Gradient of Probability Density
Definition (Score Function): Let
Score Matching: Learning Score Functions
Objective function: Given a data distribution
Explicit Score Matching (ESM): Minimize:
Implicit Score Matching (ISM): Through integration
by parts, it can be shown that:
Denoising Score Matching (DSM): Add noise to
data
Langevin dynamics: Given a Score function
Theoretical guarantee: Under mild conditions,
as
Geometric intuition: - The Score function
Forward Diffusion and Reverse Sampling
Forward diffusion SDE: Starting from the data
distribution
Key insight: - The drift term of the reverse SDE
contains the Score function - If we can learn the Score function
DDPM and DDIM: A Discretization Perspective
DDPM: Discretization of Forward and Reverse Processes
Denoising Diffusion Probabilistic Models (DDPM) (Ho et al., 2020) is one of the earliest successful diffusion models, discretizing the continuous diffusion process into finite steps.
Forward process: Define discrete time steps
Key properties: - One can analytically compute
- When
and , the distribution of tends to standard Gaussian .
Reverse process: Learn the reverse
distribution:
Connection to Score Matching: It can be shown that
the DDPM loss function is equivalent to weighted Score Matching:
DDIM: Deterministic Sampling
Denoising Diffusion Implicit Models (DDIM) (Song et al., 2021) converts DDPM's stochastic sampling process into a deterministic process, enabling fast sampling through ODE solving.
Key observation: The DDPM forward process can be
viewed as a discretization of the following SDE:
Unified Continuous-Time Perspective
SDE form: Forward diffusion SDE:
Trade-off between sampling quality and speed: - SDE sampling: Better randomness but requires more steps - ODE sampling: Deterministic, can use large step sizes, but may lose details - Hybrid methods: Use SDE exploration in early stages, ODE refinement in later stages
Experiments: From Theory to Practice
Experiment 1: One-Dimensional Diffusion Process Visualization
We first visualize the diffusion process in one dimension to validate theoretical predictions.
Setup: Initial distribution is a mixture of
Gaussians:
Theoretical predictions: - Probability density
evolution is governed by the Fokker-Planck equation - Analytical
solutions can be computed via convolution - As
Implementation: We use numerical methods to solve
the Fokker-Planck equation and visualize the evolution of probability
density. The code is provided in the accompanying Python file
diffusion_pde_experiments.py.
Results analysis: - The initial bimodal distribution
gradually "diffuses" over time, with decreasing peak heights -
Around
Experiment 2: Score Function Learning and Visualization
We learn the Score function of a simple two-dimensional distribution and visualize its gradient field.
Setup: Target distribution has a "double moon"
shape:
Network architecture: Use a simple MLP to learn the
Score function
Training: Use denoising Score Matching loss. See the code implementation for details.
Results analysis: - The learned Score function is visually highly consistent with the true Score function - Score vectors point in the direction of increasing probability density - In low-probability regions, Score vectors have larger magnitudes, pushing samples toward high-probability regions
Experiment 3: Comparison of Different SDE/ODE Samplers
We compare the effectiveness of different numerical methods for solving reverse diffusion SDEs/ODEs.
Setup: Use the Score network trained in Experiment 2, sampling from a standard Gaussian prior.
Methods compared: 1. Euler-Maruyama (SDE, first-order) 2. Heun's method (ODE, second-order) 3. Runge-Kutta 4 (ODE, fourth-order)
Results analysis: - Euler-Maruyama: Best randomness but requires more steps - Heun/RK4: Deterministic sampling, similar quality, RK4 slightly better - Sampling quality: All methods generate reasonable samples, validating the effectiveness of Score functions
Experiment 4: PDE-Constrained Conditional Generation
We implement a simple PDE-constrained conditional generation task: given boundary conditions, generate samples satisfying the PDE.
Setup: Consider the Poisson equation:
Method: Use diffusion models to generate samples satisfying the PDE.
Results analysis: - Conditional generation models can generate samples based on given conditions (such as boundary values) - Generated samples statistically satisfy PDE constraints - This demonstrates the potential of diffusion models in scientific computing applications
Summary and Outlook
This article systematically establishes the PDE theoretical framework for diffusion models. Starting from classical heat equations, we demonstrated the mathematical essence of diffusion processes; introduced stochastic differential equations and the Fokker-Planck equation, revealing the laws of probability density evolution; focused on Score-Based generative models, establishing connections between Score function learning and Langevin dynamics sampling; delved into DDPM and DDIM, showing their essence as discretization schemes for SDEs/ODEs; and finally validated theoretical predictions through four complete experiments.
Key insights: 1. Diffusion models are PDE solvers: Forward diffusion corresponds to the Fokker-Planck equation, reverse sampling corresponds to reverse SDEs or probability flow ODEs 2. Score function is central: Learning Score functions is equivalent to learning gradients of probability densities, avoiding computation of normalization constants 3. Choice of discretization schemes: SDE sampling has better randomness but is slower, ODE sampling is deterministic and faster, hybrid methods balance both 4. Conditional generation extension: PDE constraints can be naturally incorporated into the diffusion model framework, opening new paths for scientific computing applications
Future directions: - More efficient sampling algorithms: Fast sampling methods based on PDE theory - Conditional generation theory: Score Matching theory under PDE constraints - Multi-scale diffusion: Combining multi-resolution PDE solving techniques - Application expansion: Physical simulation, inverse problem solving, scientific discovery
The PDE nature of diffusion models not only provides profound theoretical insights but also points the way for future algorithm design and application expansion. As PDE theory and deep learning further integrate, we can expect to see more breakthrough progress.
References
Song, Y., Sohl-Dickstein, J., Kingma, D. P., Kumar, A., Ermon, S., & Poole, B. (2020). Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456. arXiv:2011.13456
Ho, J., Jain, A., & Abbeel, P. (2020). Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, 33, 6840-6851. arXiv:2006.11239
Song, J., Meng, C., & Ermon, S. (2021). Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502. arXiv:2010.02502
Anderson, B. D. (1982). Reverse-time diffusion equation models. Stochastic Processes and their Applications, 12(3), 313-326. DOI:10.1016/0304-4149(82)90051-5
Hyv ä rinen, A., & Dayan, P. (2005). Estimation of non-normalized statistical models by score matching. Journal of Machine Learning Research, 6(4), 695-709.
Vincent, P. (2011). A connection between score matching and denoising autoencoders. Neural Computation, 23(7), 1661-1674.
Song, Y., & Ermon, S. (2019). Generative modeling by estimating gradients of the data distribution. Advances in Neural Information Processing Systems, 32. arXiv:1907.05600
Song, Y., & Ermon, S. (2020). Improved techniques for training score-based generative models. Advances in Neural Information Processing Systems, 33, 12438-12448. arXiv:2006.09011
Karras, T., Aittala, M., Aila, T., & Laine, S. (2022). Elucidating the design space of diffusion-based generative models. Advances in Neural Information Processing Systems, 35, 26565-26577. arXiv:2206.00364
Lu, C., Zhou, Y., Bao, F., Chen, J., Li, C., & Zhu, J. (2022). DPM-Solver: A fast ODE solver for diffusion probabilistic model sampling in around 10 steps. Advances in Neural Information Processing Systems, 35, 5775-5787. arXiv:2206.00927
Dockhorn, T., Vahdat, A., & Kreis, K. (2022). Score-based generative modeling with score-matching objectives. Advances in Neural Information Processing Systems, 35, 35289-35304.
Chung, H., Kim, J., Mccann, M. T., Klasky, M. L., & Ye, J. C. (2023). Diffusion posterior sampling for general noisy inverse problems. arXiv preprint arXiv:2209.14687. arXiv:2209.14687
Song, Y., Durkan, C., Murray, I., & Ermon, S. (2021). Maximum likelihood training of score-based diffusion models. Advances in Neural Information Processing Systems, 34, 1415-1428. arXiv:2101.09258
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., & Ommer, B. (2022). High-resolution image synthesis with latent diffusion models. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 10684-10695. arXiv:2112.10752
Saharia, C., Chan, W., Chang, H., Lee, C., Ho, J., Salimans, T., ... & Norouzi, M. (2022). Palette: Image-to-image diffusion models. ACM SIGGRAPH 2022 Conference Proceedings. arXiv:2111.05826
- Post title:PDE and Machine Learning (7): Diffusion Models and Score Matching
- Post author:Chen Kai
- Create time:2022-03-05 09:30:00
- Post link:https://www.chenk.top/pde-ml-7-diffusion-models/
- Copyright Notice:All articles in this blog are licensed under BY-NC-SA unless stating additionally.