• Linux User Management: Users, Groups, UID/GID, sudo, and Password Policies

    In Linux's multi-user, multi-task environment, user and group management is not "just admin work"— it directly determines who can log in, which processes run under which identities, how the permission model executes, and how sudo privileges are allocated. This post starts from the conceptual model of users and groups (users vs groups, UID/GID meaning and boundaries), systematically organizes commands you'll actually use (useradd/usermod/userdel, groupadd/groupmod/groupdel, passwd/chage), fills in security mechanisms (account locking, password policies, sudo configuration, correct /etc/sudoers syntax), and provides detailed analysis of core system files (/etc/passwd, /etc/shadow, /etc/group, /etc/gshadow, /etc/skel field meanings and practical uses). Finally, it uses practical cases (shared project directories, service account configuration, sudo permission stratification, batch user management) to ground "how to design reasonable user permission schemes" in practice, enabling you to independently complete the full workflow from creating users to permission allocation to security hardening.

  • Linux File Permissions: rwx, chmod/chown, umask, SUID/SGID/Sticky, and Troubleshooting

    File permissions are "basic" in Linux, but they are also one of the most common causes of production incidents: a service won't start, a deploy script can't execute, a web app returns 403, or a shared directory becomes a security hole because permissions were made too broad. To use permissions correctly, you need more than memorizing chmod 755— you need to understand how permission bits have completely different semantics on files vs directories (r/w/x mean different things for directories), the boundaries between owner/group/others, and why mechanisms like umask, SUID/SGID, and the sticky bit exist and when they should be used. This post starts from the minimal concept set, systematically explains rwx semantics, numeric/symbolic notation, typical usage and troubleshooting approaches for chmod/chown, uses common scenarios (shared directories, executable scripts, temp directories, security hardening) to explain "how to grant permissions and to what extent," then adds extended mechanisms like ACL and chattr plus a practical troubleshooting checklist, enabling you to locate and correctly fix permission issues in one shot.

  • Linux Basics: Core Concepts and Essential Commands

    The "difficulty" of Linux often lies not in the commands themselves but in whether you have a clear system map: why it's suited for servers, what its multi-user/multi-task and permission models mean in daily operations, what commonalities and differences exist across distributions in package management and directory layout, and what to do after your first login. This post serves as the entry guide for the entire Linux series. I'll first establish core concepts, then walk you through the most commonly used commands covering "file navigation — viewing and editing — remote connections — basic permissions and users." The goal is not to pile up a command reference but to take you from "able to log in" to "having a basic sense of direction"— each topic is introduced briefly, then you're guided to corresponding deep-dive articles (Disk Management, File Permissions, User Management, Service Management, Process Management, Package Management, Advanced File Operations). Afterward, learning any specialized topic will be much smoother.

  • PDE and Machine Learning (8): Reaction-Diffusion Systems and GNN

    Graph Neural Networks (GNNs) demonstrate remarkable capabilities in node classification, link prediction, and graph generation tasks. However, deep GNNs face a fundamental issue: over-smoothing— as the number of layers increases, node features gradually converge to identical values, losing local structural information. This phenomenon has deep mathematical connections with diffusion processes in partial differential equations: the diffusion term causes information to "flow" across the graph, while the reaction term maintains local differences. Reaction-diffusion equations (RDEs) are classical models describing this "balance between diffusion and reaction."

    Reaction-diffusion equations have a rich history in biology, chemistry, and physics. From Turing's morphogenesis theory to Gray-Scott's chemical oscillations and FitzHugh-Nagumo's neural pulse models, these equations reveal how patterns spontaneously emerge from uniform states. Recently, researchers have discovered that embedding reaction-diffusion dynamics into graph neural networks can not only alleviate over-smoothing but also enable networks to learn richer graph structural patterns.

    This article systematically establishes the mathematical bridge between reaction-diffusion systems and graph neural networks. We begin with classical reaction-diffusion equations, introducing Gray-Scott and FitzHugh-Nagumo models and Turing instability theory; then establish the framework of graph Laplacian operators and discrete diffusion; delve into the mathematical mechanisms of pattern formation, including linear stability analysis and bifurcation theory; and finally focus on graph neural networks, demonstrating diffusion interpretations like GRAND and PDE-GCN, and detailing the architecture and experiments of Graph Neural Reaction Diffusion Models (RDGNN).

  • PDE and Machine Learning (7): Diffusion Models and Score Matching

    The core task of generative models is to sample from data distributions. Traditional approaches like Variational Autoencoders (VAEs) and Generative Adversarial Networks (GANs) achieve this through explicit encoder-decoder structures or adversarial training. However, since 2020, diffusion models have rapidly emerged as the dominant paradigm in generative AI, celebrated for their exceptional generation quality and training stability. From DALL · E 2 to Stable Diffusion, from image generation to text-to-image synthesis, diffusion models are reshaping our understanding of generative AI.

    Yet beneath the success of diffusion models lies a profound mathematical structure: they are essentially numerical solvers for partial differential equations (PDEs). When we add Gaussian noise to data, we are actually solving a forward diffusion process whose probability density evolution is governed by the Fokker-Planck equation; when we learn denoising models, we are actually learning Score functions whose gradients guide the reverse diffusion process; when we use DDPM or DDIM sampling, we are actually numerically solving stochastic or deterministic ordinary differential equations. This PDE perspective not only reveals the mathematical essence of diffusion models but also provides a unified framework for understanding their convergence, designing new sampling algorithms, and extending to conditional generation tasks.

    This article systematically establishes this theoretical framework. We begin with classical heat equations, introducing fundamental tools such as Fick's law, Gaussian kernels, and Fourier transforms. We then introduce stochastic differential equations (SDEs) and the Fokker-Planck equation, demonstrating how diffusion processes can be formalized as probability density evolution. Next, we focus on Score-Based generative models, deriving Score Matching objective functions and establishing connections between Langevin dynamics and sampling processes. Finally, we delve into DDPM and DDIM, showing how they serve as discretization schemes for SDEs/ODEs, and validate theoretical predictions through four complete experiments.

  • PDE and Machine Learning (6): Continuous Normalizing Flows and Neural ODE

    What is the core problem of generative modeling? How can we transform a simple distribution (such as a standard Gaussian) into a complex data distribution (such as images or text)? Traditional normalizing flows achieve this goal through a series of invertible transformations, but the stacking of discrete layers limits expressiveness, and the cost of computing Jacobian determinants grows with dimensionality. In 2018, Chen et al. proposed Neural ODEs, viewing discrete residual networks as discretizations of continuous-time dynamics, opening the continuous perspective for generative models. Subsequently, Grathwohl et al. applied this idea to normalizing flows, proposing Continuous Normalizing Flows (CNF), which directly compute density evolution through the instantaneous rate of change of ODEs, avoiding explicit computation of Jacobian determinants.

    The mathematical foundations of continuous normalizing flows are deeply rooted in ordinary differential equation theory. Liouville's theorem tells us how ODEs change the volume of phase space; the change of variables formula establishes the relationship between density evolution and the divergence of velocity fields; Picard-Lindel ö f theorem guarantees the existence and uniqueness of ODE solutions. These classical theories have found new applications in deep learning: the adjoint method of neural ODEs reduces the memory complexity of backpropagation from to, whereis the number of discrete layers; the instantaneous rate of change formula of continuous normalizing flows reduces density computation fromto, whereis the dimensionality.

    However, traditional continuous normalizing flows face a fundamental question: How to design velocity fields such that the transformation path from simple distributions to data distributions is shortest? Optimal transport theory provides the answer. OT-Flow combines continuous normalizing flows with optimal transport theory, learning optimal transformation paths by minimizing transport costs. More recently, the Flow Matching method further simplifies this framework by directly matching target velocity fields rather than optimizing transport costs, achieving more efficient training and better generation quality.

    This article systematically establishes this theoretical framework. We begin with the theoretical foundations of ODEs, introducing Picard-Lindel ö f theorem, Liouville's theorem, and the change of variables formula. We then delve into the adjoint method of neural ODEs and density evolution of continuous normalizing flows. Next, we introduce optimal transport theory, demonstrating how OT-Flow and Flow Matching unify the continuous perspective of generative models. Finally, we validate theoretical predictions through four numerical experiments: simple ODE system fitting, two-dimensional distribution transformation visualization, adjoint method efficiency comparison, and Flow Matching vs CNF generation quality comparison.

  • PDE and Machine Learning (5): Symplectic Geometry and Structure-Preserving Networks

    Traditional neural networks often fail to preserve the intrinsic structure of physical systems when predicting their evolution — energy conservation, angular momentum conservation, symplectic structure, and more. A simple example: using a standard neural network to predict the motion of a harmonic oscillator, even with small training error, the energy gradually drifts after long-term evolution, and the trajectory deviates from the true orbit. This is because standard neural networks do not encode the geometric structure of physical systems.

    Structure-preserving learning addresses this by enabling neural networks to learn the geometric structure of physical systems, not just fit the data. For Hamiltonian systems, this means learning dynamics on symplectic manifolds; for Lagrangian systems, this means learning extremal paths of action functionals. These geometric constraints not only improve long-term prediction accuracy but also endow models with interpretability and physical meaning.

    This article systematically introduces the mathematical foundations and practical methods of structure-preserving learning. Starting from Hamiltonian mechanics and symplectic geometry, we introduce core concepts such as phase space, Poisson brackets, and symplectic manifolds; then we analyze in depth the energy-preserving properties of symplectic integrators (Verlet, symplectic Runge-Kutta); finally, we focus on three main structure-preserving neural network architectures: Hamiltonian Neural Networks (HNN), Lagrangian Neural Networks (LNN), and Symplectic Neural Networks (SympNet), validated through four classical experiments.

  • PDE and Machine Learning (4): Variational Inference and Fokker-Planck Equation

    Probabilistic inference is one of the core problems in machine learning. Given observed data, we wish to infer the posterior distribution of latent variables or sample from complex high-dimensional distributions. Traditional methods fall into two main categories: Variational Inference (VI) approximates the posterior by optimizing a variational lower bound, while Markov Chain Monte Carlo (MCMC) samples by constructing Markov chains. These seemingly different approaches reveal profound unity when viewed through the lens of partial differential equations.

    When we use Langevin dynamics for MCMC sampling, particle motion in a potential field is described by stochastic differential equations, with probability density evolution governed by the Fokker-Planck equation. When we optimize the variational lower bound using gradient descent, the evolution of parameter distributions in Wasserstein space can similarly be viewed as gradient flows of energy functionals. More remarkably, the KL divergence minimization process itself is the solution to the Fokker-Planck equation— variational inference and Langevin MCMC are completely equivalent in the continuous-time limit. This PDE perspective not only reveals the mathematical essence of probabilistic inference but also provides a unified theoretical framework for designing new inference algorithms such as Stein Variational Gradient Descent.

    This article systematically establishes this theoretical framework. We begin with the Fokker-Planck equation, showing how to formalize the probability density evolution of stochastic processes as partial differential equations. We then delve into Langevin dynamics, discussing overdamped and underdamped cases, and the distinction between It ô and Stratonovich integrals. Next, we establish the gradient flow interpretation of KL divergence, proving the equivalence between variational inference and Langevin MCMC. Finally, we focus on advanced methods like Stein Variational Gradient Descent, demonstrating how to solve variational inference problems using particle systems, and validate theoretical predictions through four complete experiments.

  • PDE and Machine Learning (3): Variational Principles and Optimization

    What is the essence of neural network training? When we perform gradient descent in high-dimensional parameter space, does there exist a deeper continuous-time dynamics? As network width tends to infinity, does discrete parameter updating converge to some elegant partial differential equation? The answers to these questions lie at the intersection of calculus of variations, optimal transport theory, and partial differential equations.

    Over the past decade, the success of deep learning has been built primarily on empirical insights and engineering practices. However, recent years have witnessed mathematicians discovering that viewing neural networks as particle systems on the space of probability measures and studying their evolution under Wasserstein geometry can reveal global properties of training dynamics, convergence guarantees, and the essence of phenomena like initialization and over-parameterization. The core tool of this perspective is variational principles— from the principle of least action in physics, to the JKO scheme in modern optimal transport theory, to the mean-field limit of neural networks.

    This article systematically establishes this theoretical framework. We begin with classical calculus of variations, introducing fundamental tools such as functional derivatives and Euler-Lagrange equations. We then introduce Wasserstein metrics and gradient flow theory, demonstrating how the heat equation and Fokker-Planck equation can be unified as gradient flows of energy functionals. Finally, we focus on neural network training, deriving mean-field equations, proving global convergence, and validating theoretical predictions through numerical experiments.

  • PDE and Machine Learning (1) - Physics-Informed Neural Networks

    Imagine you need to predict the temperature distribution in a metal rod. The traditional approach would be to divide the rod into countless small segments and solve equations at each point—this is the idea behind finite difference methods (FDM) and finite element methods (FEM). These methods have been refined over half a century and are quite mature, but they share a common pain point: you must first create a mesh. For a simple one-dimensional rod, this is manageable, but for complex shapes like aircraft wings or ten-dimensional spaces, mesh generation becomes a nightmare.

    In 2019, Raissi et al. proposed a revolutionary idea: Can we let a neural network directly learn the temperature distribution function instead of solving on mesh points? This is the core concept of Physics-Informed Neural Networks (PINN). It doesn't need a mesh—you just tell the network "you must satisfy the heat equation," and then let the network adjust its parameters until it finds a function that satisfies both the equation and boundary conditions.

    This idea isn't entirely new. As early as the 20th century, mathematician Ritz proposed a similar approach: transform PDE solving into "finding a function that minimizes some energy." The finite element method is based on this idea, using piecewise polynomials to approximate solutions. PINN's breakthrough lies in: replacing piecewise polynomials with neural networks and manual derivation with automatic differentiation. This makes computing high-order derivatives effortless and completely eliminates the need for meshes.

    Of course, PINN isn't a silver bullet. Training encounters various challenges: How to balance the weights of PDE residual, boundary conditions, and initial conditions? Why do high-frequency components always learn slowly? What about discontinuous solutions like shock waves? These problems have spawned numerous improvement methods—adaptive weighting, domain decomposition, causal training, importance sampling, and more.

    This article will guide you through understanding PINN from scratch. First, we'll review traditional numerical methods and their pros and cons; then dive into PINN's mathematical principles, including convergence theory and automatic differentiation mechanisms; next, introduce various improvement techniques and analyze what problems they solve; finally, validate theory through four complete experiments (heat equation, Poisson equation, Burgers equation, activation function comparison) and explore new directions like PIKAN.