• Machine Learning Mathematical Derivations (10): Semi-Naive Bayes and Bayesian Networks

    Naive Bayes's conditional independence assumption is too strict; real-world features often have complex dependencies. How can we relax the independence assumption while maintaining computational efficiency and learning more accurate probabilistic models? Semi-Naive Bayes provides an elegant answer — by introducing limited attribute dependencies, it achieves balance between model complexity and expressive power. This chapter delves into SPODE, TAN, AODE and other semi-naive Bayes models, and introduces Bayesian network structure learning and parameter estimation.

  • Machine Learning Mathematical Derivations (9): Naive Bayes

    Naive Bayes is the simplest yet most elegant probabilistic classifier — based on Bayes' theorem and conditional independence assumption, it decomposes complex joint probabilities into simple products of conditional probabilities, enabling efficient classification learning. Despite the "naive" assumption often not holding in reality, Naive Bayes shows remarkable effectiveness in text classification, spam filtering, and sentiment analysis. This chapter systematically derives the theoretical foundations, parameter estimation methods, smoothing techniques, and performance analysis of Naive Bayes.

  • Machine Learning Mathematical Derivations (8): Support Vector Machines

    Support Vector Machine (SVM) is one of the most elegant algorithms in modern machine learning — it perfectly combines geometric intuition, convex optimization theory, and kernel methods, achieving efficient classification by finding the maximum margin hyperplane. From linear separability to nonlinear mapping, from hard margin to soft margin, from primal form to dual problem, SVM's mathematical derivations showcase the depth and beauty of machine learning theory. This chapter systematically derives the complete theoretical framework of SVM, including Lagrangian duality, KKT conditions, SMO algorithm, kernel function construction, and theoretical guarantees.

  • Machine Learning Mathematical Derivations (7): Decision Trees

    Decision trees are among the most intuitive machine learning models — like human decision-making, they narrow down answers through a series of "yes/no" questions. But beneath them lies deep foundations in information theory and probability: How to choose the optimal split point? How to avoid overfitting? How to handle continuous features and missing values? This chapter systematically derives the mathematical principles of decision trees, from entropy definition to ID3, C4.5, and CART algorithm details, from pruning theory to random forest ensemble thinking, comprehensively revealing the inner logic of tree models.

  • Machine Learning Mathematical Derivations (6): Logistic Regression and Classification

    The leap from linear regression to logistic regression marks an important transition in machine learning from regression to classification tasks. Although named "regression," logistic regression is fundamentally a classification algorithm, establishing a bridge between linear models and probability predictions through the Sigmoid function. This chapter delves into the mathematical essence of logistic regression: from likelihood function construction to gradient computation details, from binary to multi-class extension, from optimization algorithms to regularization techniques, comprehensively revealing the probabilistic modeling approach to classification.

  • Machine Learning Mathematical Derivations (5): Linear Regression

    In 1886, Francis Galton discovered a peculiar phenomenon while studying the relationship between parents' and children's heights: parents with extreme heights tended to have children whose heights were closer to the average. He coined the term "regression toward the mean," which is where "regression" comes from. However, the true power of linear regression lies not in statistical description, but in its role as the mathematical foundation for almost all machine learning algorithms — from neural networks to support vector machines, all can be viewed as generalizations of linear regression.

    The essence of linear regression is finding the optimal hyperplane in data space. This seemingly simple problem conceals deep connections between linear algebra, probability theory, and optimization. This chapter provides complete mathematical derivations of linear regression from multiple perspectives.

  • Machine Learning Mathematical Derivations (4): Convex Optimization Theory

    In 1947, George Dantzig developed the simplex method for linear programming while working for the U.S. Air Force. This breakthrough marked the birth of modern optimization theory. Seven decades later, optimization has become the theoretical pillar of machine learning — nearly all learning algorithms can be formulated as optimization problems. Among all optimization problems, convex optimization holds a unique position: local optima are global optima, and efficient algorithms guarantee convergence.

    Why can neural network training find good solutions even when the loss function is non-convex? Why does gradient descent converge rapidly in certain cases? The answers lie deeply embedded in the mathematical structure of convex optimization theory. This chapter rigorously derives the core theory and algorithms of convex optimization, starting from the definitions of convex sets and convex functions.

  • Machine Learning Mathematical Derivations (3): Probability Theory and Statistical Inference

    In 1912, Fisher proposed the idea of Maximum Likelihood Estimation (MLE), fundamentally transforming statistics. His core insight was: The best estimate of parameters should maximize the probability of observed data. Behind this seemingly simple idea lies profound mathematical structure — from the axiomatic definition of probability spaces, to asymptotic properties of statistical inference, to philosophical disputes between Bayesian and frequentist schools.

    The core of machine learning is uncertainty modeling. Linear regression assumes errors follow Gaussian distribution; logistic regression assumes labels follow Bernoulli distribution; Hidden Markov Models assume state transitions follow Markov chains. All these models are built on the solid foundation of probability theory. This chapter derives the mathematical theory of statistical inference starting from Kolmogorov axioms.

  • Machine Learning Mathematical Derivations (2): Linear Algebra and Matrix Theory

    In Google's PageRank algorithm, the web ranking problem was transformed into a massive eigenvalue problem: finding the principal eigenvector of a transition matrix. Behind this elegant mathematical formulation lies the profound structure of linear algebra. Linear algebra is not merely the language of machine learning — it is the key to understanding the geometric structure of data.

    Machine learning is fundamentally about finding optimal linear or nonlinear transformations in high-dimensional spaces. From the simplest linear regression (solving ), to complex deep neural networks (chains of matrix multiplications), to principal component analysis (eigenvalue decomposition), linear algebra is everywhere. This chapter derives all the linear algebra tools needed for machine learning from first principles.

  • Machine Learning Mathematical Derivations (1): Introduction and Mathematical Foundations

    In 2005, Google Research published a paper claiming that their simple statistical models outperformed carefully engineered expert systems in machine translation tasks. This raised a profound question: Why can simple models learn effective patterns from data? The answer lies in the mathematical theory of machine learning.

    The central problem in machine learning is: given finite training samples, how can we guarantee that the learned model will perform well on unseen data? This is not an engineering problem, but a mathematical one — involving deep structures from probability theory, functional analysis, and optimization theory. This series derives the theoretical foundations of machine learning from mathematical first principles.