Variational Inference transforms Bayesian inference into an optimization problem — when the posterior distribution is difficult to compute exactly, variational inference optimizes over a tractable family of distributions to approximate the true posterior, converting an integration problem into an optimization problem. From variational EM to variational autoencoders, from topic models to deep generative models, variational inference has become a core technique in modern machine learning. This chapter systematically derives the mathematical principles of variational inference, mean-field approximation, coordinate ascent algorithms, and black-box variational inference.
Bayesian Inference and the Posterior Challenge
Bayesian Inference Framework
Observed data:
Latent variables:
Parameters:
Objective: Compute the posterior distribution
Difficulty: The marginal likelihood (evidence)
Exact Inference vs Approximate Inference
Exact inference: - Conjugate priors: Some models have closed-form posteriors - Graphical models: Variable elimination, belief propagation (tree structures)
Approximate inference (needed in most cases): 1. Sampling methods: MCMC (Markov Chain Monte Carlo) - Advantage: Asymptotically exact - Disadvantage: Slow convergence, difficult to diagnose 2. Variational methods: Convert inference to optimization - Advantage: Fast, deterministic - Disadvantage: Biased approximation
Basic Principles of Variational Inference
ELBO Derivation


Idea: Use a simple distribution
Optimization objective: Minimize KL divergence
Problem: Contains the unknown
where the Evidence Lower Bound (ELBO):
Key relationship:
Variational inference objective:
Mean-Field Approximation

Assumption: The variational distribution fully factorizes
Or more concisely, assuming latent variables and parameters are
partitioned into
Optimization: For each factor
Coordinate Ascent Variational Inference
ELBO expansion:
where
Optimize for
Optimal
Algorithm: Cyclically update each factor until convergence
Variational EM Algorithm
Connection between EM and Variational Inference
Standard EM: - E-step:
Variational EM: - E-step:
Variational Bayes GMM
Model: - Prior:
Update formulas (conjugacy properties):
Black-Box Variational Inference (BBVI)
Gradient Estimation Problem
ELBO:
Gradient:
Difficulty: Gradient and expectation cannot be
exchanged (
REINFORCE Gradient Estimator
Log-derivative trick:
ELBO gradient:
Monte Carlo estimate:
where
Reparameterization Trick
Idea: Separate randomness from
Example (Gaussian):
Monte Carlo estimate:
Advantages: Low variance, amenable to automatic differentiation
Implementation Example
1 | import numpy as np |
Q&A
Q1: Variational Inference vs MCMC?
A: - Variational: Fast, deterministic, biased (non-zero KL divergence) - MCMC: Slow, stochastic, asymptotically unbiased
Variational is suitable for large-scale data and online learning; MCMC is suitable for exact inference.
Q2: Why use KL(q||p) instead of KL(p||q)?
A: KL(q||p) is the "reverse KL", making
Q3: When does the mean-field assumption fail?
A: When variables are strongly correlated. Solutions: - Structured variational (preserve some dependencies) - Richer variational families (normalizing flows)
Q4: Variational Bayes vs point estimates (MAP/MLE)?
A: Variational Bayes preserves uncertainty and prevents overfitting. Cost: Higher computational complexity. Use variational Bayes for small data/high regularization needs; use point estimates for large data/speed requirements.
Q5: When is the reparameterization trick applicable?
A: Requires continuous differentiable distributions. Applicable: Gaussian, Logistic, Laplace. Not applicable: Discrete distributions (need REINFORCE or Gumbel-Softmax).
✏️ Exercises and Solutions
Exercise 1: ELBO Derivation
Problem: Prove
Exercise 2: Mean Field
Problem:
Exercise 3: Variational EM
Problem: What do E-step and M-step optimize?
Solution: E-step: fix
Exercise 4: VAE Reparameterization
Problem: Why
Exercise 5: VI vs MCMC
Problem: When to use VI vs MCMC? Solution: VI: fast but biased, good for large data. MCMC: asymptotically unbiased but slow.
✏️ Exercises and Solutions
Exercise 1: ELBO Derivation
Problem: Prove
Exercise 2: Mean Field
Problem:
Exercise 3: Variational EM
Problem: What do E-step and M-step optimize?
Solution: E-step: fix
Exercise 4: VAE Reparameterization
Problem: Why
Exercise 5: VI vs MCMC
Problem: When to use VI vs MCMC? Solution: VI: fast but biased, good for large data. MCMC: asymptotically unbiased but slow.
Referencess
- Jordan, M. I., et al. (1999). An introduction to variational methods for graphical models. Machine Learning, 37(2), 183-233.
- Blei, D. M., Kucukelbir, A., & McAuliffe, J. D. (2017). Variational inference: A review for statisticians. JASA, 112(518), 859-877.
- Kingma, D. P., & Welling, M. (2014). Auto-encoding variational Bayes. ICLR.
- Ranganath, R., Gerrish, S., & Blei, D. (2014). Black box variational inference. AISTATS.
Variational inference transforms the integration challenge of Bayesian inference into an optimization problem, trading off deterministic algorithms for computational efficiency. From classical mean-field approximation to modern black-box variational inference, from VAE to deep generative models, variational methods have become foundational tools in machine learning. Understanding variational inference is a necessary path toward probabilistic programming and Bayesian deep learning.
- Post title:Machine Learning Mathematical Derivations (14): Variational Inference and Variational EM
- Post author:Chen Kai
- Create time:2021-11-11 14:30:00
- Post link:https://www.chenk.top/Machine-Learning-Mathematical-Derivations-14-Variational-Inference-and-Variational-EM/
- Copyright Notice:All articles in this blog are licensed under BY-NC-SA unless stating additionally.