PDE and Machine Learning (8): Reaction-Diffusion Systems and GNN

Graph Neural Networks (GNNs) demonstrate remarkable capabilities in node classification, link prediction, and graph generation tasks. However, deep GNNs face a fundamental issue: over-smoothing— as the number of layers increases, node features gradually converge to identical values, losing local structural information. This phenomenon has deep mathematical connections with diffusion processes in partial differential equations: the diffusion term causes information to "flow" across the graph, while the reaction term maintains local differences. Reaction-diffusion equations (RDEs) are classical models describing this "balance between diffusion and reaction."

Reaction-diffusion equations have a rich history in biology, chemistry, and physics. From Turing's morphogenesis theory to Gray-Scott's chemical oscillations and FitzHugh-Nagumo's neural pulse models, these equations reveal how patterns spontaneously emerge from uniform states. Recently, researchers have discovered that embedding reaction-diffusion dynamics into graph neural networks can not only alleviate over-smoothing but also enable networks to learn richer graph structural patterns.

This article systematically establishes the mathematical bridge between reaction-diffusion systems and graph neural networks. We begin with classical reaction-diffusion equations, introducing Gray-Scott and FitzHugh-Nagumo models and Turing instability theory; then establish the framework of graph Laplacian operators and discrete diffusion; delve into the mathematical mechanisms of pattern formation, including linear stability analysis and bifurcation theory; and finally focus on graph neural networks, demonstrating diffusion interpretations like GRAND and PDE-GCN, and detailing the architecture and experiments of Graph Neural Reaction Diffusion Models (RDGNN).

Introduction: From Continuous Space to Discrete Graphs

Reaction-Diffusion in Continuous Space

The general form of reaction-diffusion equations is: whereis a function of spatial positionand time,is the diffusion coefficient matrix,is the Laplacian operator, andis the reaction term.

Physical Intuition: - Diffusion term: Describes the diffusion of matter/information in space, making the distribution more uniform - Reaction term: Describes local chemical reactions, biological growth, and other nonlinear processes that may produce non-uniform patterns

When diffusion and reaction reach equilibrium, the system may spontaneously form spatial patterns from uniform states — this is the core of Turing instability theory.

Correspondence on Discrete Graphs

A graphconsists of a node setand an edge set. The evolution of node featurescan be analogized as:whereis the neighbor set of node,are edge weights, the first term is graph diffusion, and the second term is node reaction.

Key Insight: The graph Laplacian operator (degree matrix minus adjacency matrix) is precisely the discrete version of the continuous Laplacian operator. The reaction-diffusion system on graphs can be written as:whereis the feature matrix of all nodes, andis the node-wise reaction function.

Fundamentals of Reaction-Diffusion Equations

Gray-Scott Model

The Gray-Scott model is a two-component reaction-diffusion system describing autocatalytic chemical reactions:whereandare concentrations of two chemical substances,are diffusion coefficients,is the feed rate, andis the kill rate.

Parameter Meanings: -: Feed term for substance -: Consumption ofby (autocatalysis) -: Generation term for -: Decay ofWhen (diffuses fast,diffuses slow), the system may produce complex patterns such as spots, stripes, and spiral waves.

FitzHugh-Nagumo Model

The FitzHugh-Nagumo (FHN) model was originally used to describe electrical pulse propagation in neurons:whereis the membrane potential (fast variable),is the recovery variable (slow variable),is external stimulus, andensureschanges slowly.

Dynamical Features: - Excitation: Whenexceeds threshold, it rises rapidly - Recovery:slowly increases, causingto decrease - Refractory period: Whenis large, the system is difficult to excite again

The FHN model can produce patterns such as traveling waves, spiral waves, and target patterns.

Turing Instability

Turing instability theory explains how uniform steady states become unstable and form spatial patterns.

Consider a general two-component reaction-diffusion system:Letbe a spatially uniform steady state:.

Linear Stability Analysis: Under small perturbations, the linearized system is:whereetc. are partial derivatives.

Applying Fourier transform in space:, we obtain:where Turing Conditions: The uniform steady state is stable without diffusion (eigenvalues ofhave negative real parts) but becomes unstable at certain wavenumbers(eigenvalues ofhave positive real parts), requiring:

Activator-inhibitor mechanism:(self-activates),(self-inhibits)
Diffusion coefficient difference:(activator diffuses slowly, inhibitor diffuses fast)

When these conditions are satisfied, the system spontaneously forms spatial periodic patterns, with wavelength determined by the most unstable mode.

Graph Laplacian Operator and Discrete Diffusion

Definition of Graph Laplacian

For an undirected graph, the graph Laplacian operator is defined as:where: -is the degree matrix,is the degree of node -is the adjacency matrix,if, otherwise Normalized Laplacian operator: Random walk Laplacian: ### Heat Equation on Graphs

The heat equation (diffusion equation) on graphs is:Its solution is: Spectral Decomposition: Let the eigenvalue decomposition ofbe, where,.

Then: Physical Meaning: -corresponds to constant mode (uniform distribution) - Smallcorrespond to long-range patterns (slow decay) - Largecorrespond to short-range patterns (rapid decay)

As,(converges to uniform), which explains the over-smoothing phenomenon in GNNs.

Reaction-Diffusion on Graphs

The reaction-diffusion system on graphs is:whereis the node-wise reaction function.

Discrete-time Form (Euler method):whereis the time step size andis the layer index.

This can be rewritten as:The first term is the diffusion update, and the second term is the reaction update.

Mathematical Theory of Pattern Formation

Linear Stability Analysis

Consider the reaction-diffusion system on graphs:Letbe a spatially uniform steady state:for all.

Under small perturbations, linearization:In matrix form: Stability Condition: Let the eigenvalues ofbe, then the eigenvalues of the linearized system are.

If all, the steady state is stable
If there exists, the steady state becomes unstable, forming patterns

Key Insight: Even if(locally stable), if there exist small(long-range patterns), we may still have(globally unstable). This corresponds to the Turing mechanism.

Bifurcation Theory

As parameters change, the system may undergo bifurcations— qualitative changes in solution structure.

Saddle-node bifurcation: Two steady states collide and disappear

Hopf bifurcation: Steady state becomes unstable, producing limit cycles (periodic solutions)

Turing bifurcation: Uniform steady state becomes unstable, producing spatial patterns

For reaction-diffusion systems on graphs, bifurcation points are determined by the spectral properties of the graph. For example, on regular graphs, Turing bifurcation occurs at:whereis the second smallest eigenvalue (algebraic connectivity).

Diffusion Interpretation of Graph Neural Networks

Traditional GNN Architectures

The update rule of Graph Convolutional Networks (GCN) is:whereis the normalized adjacency matrix.

Diffusion Perspective: GCN can be viewed as a discretized graph diffusion process. Setting, we have:This is the Euler discretization of the graph heat equation.

GRAND: Graph Neural Diffusion

GRAND (Graph Neural Diffusion) explicitly models GNNs as diffusion processes:whereis a neural network-parameterized reaction term, andis node features.

Numerical Solution: Use ODE solvers (e.g., Runge-Kutta methods) to integrate this equation.

Advantages: - Continuous depth: Number of layers determined by ODE solver steps, adaptive - Stable training: ODE solvers guarantee numerical stability - Theoretical guarantees: Analysis tools based on PDE theory

PDE-GCN

PDE-GCN explicitly interprets GCN as numerical solution of PDEs:where the reaction termcan be: - Identity mapping:(corresponds to ResNet residual connections) - Nonlinear activation:(corresponds to traditional GCN) - Attention mechanism:(corresponds to GAT)

Theoretical Analysis: The convergence and stability of PDE-GCN can be analyzed through PDE theory, for example: - Maximum principle: Guarantees bounded features - Energy methods: Analyze long-term behavior - Spectral methods: Understand frequency response

Graph Neural Reaction Diffusion Models (RDGNN)

Architecture Design

The Graph Neural Reaction Diffusion Model (RDGNN) embeds reaction-diffusion dynamics into GNNs:where: -is the diffusion step size -is the reaction step size -is a neural network-parameterized reaction term

Reaction Term Design:wheredenotes concatenation, andis a decay coefficient.

The first term is the activation term (similar toin FHN), and the second term is the decay term (prevents explosion).

Theoretical Analysis

Stability Condition: The stability of RDGNN requires:whereis the maximum eigenvalue of.

Pattern Formation Condition: When the reaction term is sufficiently strong and diffusion coefficient is moderate, the system may form non-uniform patterns, thereby alleviating over-smoothing.

Expressive Power: RDGNN can learn: - Local patterns: Reaction term maintains node differences - Global structure: Diffusion term propagates information - Dynamic balance: Competition between the two produces rich representations

Implementation Details

Numerical Stability: - Use normalized Laplacian:instead of - Adaptive step size: Adjustaccording to eigenvalues - Residual connections: Computational Efficiency: - Sparse matrix multiplication: Exploit graph sparsity - Batch processing: Process multiple graphs simultaneously - GPU acceleration: Execute matrix operations on GPU

Experiments

Experiment 1: 1D/2D Reaction-Diffusion Pattern Generation

We first verify the pattern formation capability of reaction-diffusion equations in continuous space.

1D Gray-Scott Model:

import numpy as np
import matplotlib.pyplot as plt
from scipy.integrate import odeint

def gray_scott_1d(u, t, Du, Dv, F, k, dx, n):
    """Right-hand side of 1D Gray-Scott model"""
    u_vals = u[:n]
    v_vals = u[n:]
    
    # Diffusion term (finite difference)
    u_laplacian = np.zeros(n)
    v_laplacian = np.zeros(n)
    u_laplacian[0] = (u_vals[1] - 2*u_vals[0] + u_vals[-1]) / dx**2
    u_laplacian[-1] = (u_vals[0] - 2*u_vals[-1] + u_vals[-2]) / dx**2
    u_laplacian[1:-1] = (u_vals[2:] - 2*u_vals[1:-1] + u_vals[:-2]) / dx**2
    
    v_laplacian[0] = (v_vals[1] - 2*v_vals[0] + v_vals[-1]) / dx**2
    v_laplacian[-1] = (v_vals[0] - 2*v_vals[-1] + v_vals[-2]) / dx**2
    v_laplacian[1:-1] = (v_vals[2:] - 2*v_vals[1:-1] + v_vals[:-2]) / dx**2
    
    # Reaction term
    u_reaction = -u_vals * v_vals**2 + F * (1 - u_vals)
    v_reaction = u_vals * v_vals**2 - (F + k) * v_vals
    
    # Combine
    du_dt = Du * u_laplacian + u_reaction
    dv_dt = Dv * v_laplacian + v_reaction
    
    return np.concatenate([du_dt, dv_dt])

# Parameter settings
Du, Dv = 0.00002, 0.00001  # u diffuses fast, v diffuses slow
F, k = 0.04, 0.06
L = 2.5  # Domain length
n = 256  # Number of spatial discretization points
dx = L / n
x = np.linspace(0, L, n)

# Initial conditions (uniform state + small perturbation)
u0 = np.ones(n) * 0.5 + 0.01 * np.random.randn(n)
v0 = np.ones(n) * 0.25 + 0.01 * np.random.randn(n)
y0 = np.concatenate([u0, v0])

# Time integration
t = np.linspace(0, 2000, 1000)
sol = odeint(gray_scott_1d, y0, t, args=(Du, Dv, F, k, dx, n))

# Visualization
u_sol = sol[:, :n]
plt.figure(figsize=(12, 6))
plt.imshow(u_sol.T, aspect='auto', cmap='viridis', origin='lower')
plt.colorbar(label='u concentration')
plt.xlabel('Time')
plt.ylabel('Space')
plt.title('1D Gray-Scott Pattern Formation')
plt.show()

2D FitzHugh-Nagumo Model:

def fitzhugh_nagumo_2d(u, t, D, epsilon, beta, gamma, I, dx, ny, nx):
    """Right-hand side of 2D FitzHugh-Nagumo model"""
    n = ny * nx
    u_vals = u[:n].reshape(ny, nx)
    v_vals = u[n:].reshape(ny, nx)
    
    # 2D Laplacian (five-point stencil)
    u_laplacian = np.zeros_like(u_vals)
    u_laplacian[1:-1, 1:-1] = (
        u_vals[2:, 1:-1] + u_vals[:-2, 1:-1] +
        u_vals[1:-1, 2:] + u_vals[1:-1, :-2] -
        4 * u_vals[1:-1, 1:-1]
    ) / dx**2
    
    # Reaction term
    u_reaction = u_vals - u_vals**3 / 3 - v_vals + I
    v_reaction = epsilon * (u_vals + beta - gamma * v_vals)
    
    # Combine
    du_dt = D * u_laplacian + u_reaction
    dv_dt = v_reaction
    
    return np.concatenate([du_dt.flatten(), dv_dt.flatten()])

# Parameter settings
D = 1.0
epsilon, beta, gamma = 0.1, 0.7, 0.8
I = 0.5
L = 10.0
nx, ny = 128, 128
dx = L / nx
x = np.linspace(0, L, nx)
y = np.linspace(0, L, ny)
X, Y = np.meshgrid(x, y)

# Initial conditions (spiral wave)
r = np.sqrt((X - L/2)**2 + (Y - L/2)**2)
theta = np.arctan2(Y - L/2, X - L/2)
u0 = np.tanh(r - 3) * np.cos(theta)
v0 = np.zeros_like(u0)
y0 = np.concatenate([u0.flatten(), v0.flatten()])

# Time integration
t = np.linspace(0, 50, 500)
sol = odeint(fitzhugh_nagumo_2d, y0, t, args=(D, epsilon, beta, gamma, I, dx, ny, nx))

# Visualization
u_final = sol[-1, :nx*ny].reshape(ny, nx)
plt.figure(figsize=(10, 10))
plt.contourf(X, Y, u_final, levels=20, cmap='RdBu')
plt.colorbar(label='u (membrane potential)')
plt.title('2D FitzHugh-Nagumo Spiral Wave')
plt.show()

Experiment 2: Heat Diffusion on Graphs

We verify the diffusion properties of graph Laplacian operators.

import torch
import torch.nn.functional as F
import networkx as nx
from torch_geometric.utils import to_networkx, from_networkx

def graph_heat_diffusion(edge_index, num_nodes, T=10, num_steps=100):
    """Heat diffusion on graphs"""
    # Build Laplacian matrix
    from torch_geometric.utils import degree, add_self_loops
    
    edge_index, _ = add_self_loops(edge_index, num_nodes=num_nodes)
    row, col = edge_index
    deg = degree(col, num_nodes, dtype=torch.float)
    deg_inv_sqrt = deg.pow(-0.5)
    deg_inv_sqrt[deg_inv_sqrt == float('inf')] = 0
    
    # Normalized adjacency matrix
    norm = deg_inv_sqrt[row] * deg_inv_sqrt[col]
    
    # Initial features (random)
    H0 = torch.randn(num_nodes, 1)
    
    # Time integration
    dt = T / num_steps
    H = H0.clone()
    history = [H0.numpy()]
    
    for _ in range(num_steps):
        # Diffusion update: dH/dt = -L H
        # Equivalent to: H = (I - dt*L) H
        # L_sym = I - D^{-1/2} A D^{-1/2}
        # Therefore: (I - dt*L_sym) = (1-dt)*I + dt*D^{-1/2} A D^{-1/2}
        
        H_new = (1 - dt) * H
        for i in range(num_nodes):
            neighbors = edge_index[1, edge_index[0] == i]
            if len(neighbors) > 0:
                H_new[i] += dt * norm[edge_index[0] == i].sum() * H[neighbors].mean()
        
        H = H_new
        history.append(H.numpy())
    
    return np.array(history)

# Generate different types of graphs
graphs = {
    'Erd ő s-R é nyi': nx.erdos_renyi_graph(100, 0.1),
    'Barab á si-Albert': nx.barabasi_albert_graph(100, 5),
    'Watts-Strogatz': nx.watts_strogatz_graph(100, 10, 0.1),
    'Regular': nx.random_regular_graph(10, 100)
}

fig, axes = plt.subplots(2, 2, figsize=(12, 12))
axes = axes.flatten()

for idx, (name, G) in enumerate(graphs.items()):
    edge_index = from_networkx(G).edge_index
    num_nodes = G.number_of_nodes()
    
    history = graph_heat_diffusion(edge_index, num_nodes, T=5, num_steps=50)
    
    # Compute mean feature value (measure of smoothness)
    mean_features = [h.mean() for h in history]
    std_features = [h.std() for h in history]
    
    axes[idx].plot(mean_features, label='Mean')
    axes[idx].plot(std_features, label='Std')
    axes[idx].set_xlabel('Time step')
    axes[idx].set_ylabel('Feature value')
    axes[idx].set_title(f'{name} Graph')
    axes[idx].legend()
    axes[idx].grid(True)

plt.tight_layout()
plt.show()

Experiment 3: Over-smoothing in GNNs and Reaction Terms

We demonstrate the over-smoothing problem in traditional GCN and how reaction terms alleviate it.

import torch
import torch.nn as nn
from torch_geometric.nn import GCNConv
from torch_geometric.datasets import Planetoid

class GCN(nn.Module):
    """Traditional GCN"""
    def __init__(self, num_features, hidden_dim, num_classes, num_layers):
        super().__init__()
        self.convs = nn.ModuleList()
        self.convs.append(GCNConv(num_features, hidden_dim))
        for _ in range(num_layers - 2):
            self.convs.append(GCNConv(hidden_dim, hidden_dim))
        self.convs.append(GCNConv(hidden_dim, num_classes))
    
    def forward(self, x, edge_index):
        for i, conv in enumerate(self.convs[:-1]):
            x = conv(x, edge_index)
            x = torch.relu(x)
        x = self.convs[-1](x, edge_index)
        return x

class RDGNN(nn.Module):
    """Graph Neural Reaction Diffusion Model"""
    def __init__(self, num_features, hidden_dim, num_classes, num_layers, 
                 eps_d=0.1, eps_r=0.1):
        super().__init__()
        self.num_layers = num_layers
        self.eps_d = eps_d
        self.eps_r = eps_r
        
        # Input projection
        self.input_proj = nn.Linear(num_features, hidden_dim)
        
        # Reaction term networks
        self.reaction_nets = nn.ModuleList([
            nn.Sequential(
                nn.Linear(hidden_dim, hidden_dim),
                nn.ReLU(),
                nn.Linear(hidden_dim, hidden_dim)
            ) for _ in range(num_layers)
        ])
        
        # Output projection
        self.output_proj = nn.Linear(hidden_dim, num_classes)
    
    def forward(self, x, edge_index):
        # Build normalized Laplacian
        from torch_geometric.utils import degree, add_self_loops
        
        edge_index, _ = add_self_loops(edge_index, num_nodes=x.size(0))
        row, col = edge_index
        deg = degree(col, x.size(0), dtype=x.dtype)
        deg_inv_sqrt = deg.pow(-0.5)
        deg_inv_sqrt[deg_inv_sqrt == float('inf')] = 0
        norm = deg_inv_sqrt[row] * deg_inv_sqrt[col]
        
        # Initial features
        h = self.input_proj(x)
        
        for l in range(self.num_layers):
            # Diffusion term: -L h
            h_diff = torch.zeros_like(h)
            for i in range(h.size(0)):
                neighbors = col[row == i]
                if len(neighbors) > 0:
                    h_diff[i] = h[i] - norm[row == i].unsqueeze(1) * h[neighbors].sum(dim=0)
            
            # Reaction term
            h_react = self.reaction_nets[l](h) - 0.1 * h
            
            # Combined update
            h = h - self.eps_d * h_diff + self.eps_r * h_react
        
        return self.output_proj(h)

# Load data
dataset = Planetoid(root='/tmp/Cora', name='Cora')
data = dataset[0]

# Training function
def train_model(model, data, epochs=200):
    optimizer = torch.optim.Adam(model.parameters(), lr=0.01, weight_decay=5e-4)
    criterion = nn.CrossEntropyLoss()
    
    model.train()
    for epoch in range(epochs):
        optimizer.zero_grad()
        out = model(data.x, data.edge_index)
        loss = criterion(out[data.train_mask], data.y[data.train_mask])
        loss.backward()
        optimizer.step()
        
        if epoch % 50 == 0:
            model.eval()
            with torch.no_grad():
                pred = model(data.x, data.edge_index).argmax(dim=1)
                train_acc = (pred[data.train_mask] == data.y[data.train_mask]).float().mean()
                val_acc = (pred[data.val_mask] == data.y[data.val_mask]).float().mean()
            print(f'Epoch {epoch}: Train Acc={train_acc:.4f}, Val Acc={val_acc:.4f}')
            model.train()
    
    model.eval()
    with torch.no_grad():
        pred = model(data.x, data.edge_index).argmax(dim=1)
        test_acc = (pred[data.test_mask] == data.y[data.test_mask]).float().mean()
    return test_acc.item()

# Compare performance at different depths
results = {}
for num_layers in [2, 4, 8, 16, 32]:
    # GCN
    model_gcn = GCN(data.num_features, 64, dataset.num_classes, num_layers)
    acc_gcn = train_model(model_gcn, data)
    
    # RDGNN
    model_rdgnn = RDGNN(data.num_features, 64, dataset.num_classes, num_layers)
    acc_rdgnn = train_model(model_rdgnn, data)
    
    results[num_layers] = {'GCN': acc_gcn, 'RDGNN': acc_rdgnn}
    print(f'Layers={num_layers}: GCN={acc_gcn:.4f}, RDGNN={acc_rdgnn:.4f}')

# Visualization
layers = list(results.keys())
gcn_accs = [results[l]['GCN'] for l in layers]
rdgnn_accs = [results[l]['RDGNN'] for l in layers]

plt.figure(figsize=(10, 6))
plt.plot(layers, gcn_accs, 'o-', label='GCN', linewidth=2)
plt.plot(layers, rdgnn_accs, 's-', label='RDGNN', linewidth=2)
plt.xlabel('Number of Layers')
plt.ylabel('Test Accuracy')
plt.title('Over-smoothing: GCN vs RDGNN')
plt.legend()
plt.grid(True)
plt.show()

Experiment 4: RDGNN Performance on Homophilic/Heterophilic Graphs

We test RDGNN performance on different graph structures.

def generate_homophilic_graph(num_nodes, num_classes, p_in=0.3, p_out=0.05):
    """Generate homophilic graph"""
    # Node labels
    labels = torch.randint(0, num_classes, (num_nodes,))
    
    # Build adjacency matrix
    adj = torch.zeros(num_nodes, num_nodes)
    for i in range(num_nodes):
        for j in range(i+1, num_nodes):
            if labels[i] == labels[j]:
                if torch.rand(1) < p_in:
                    adj[i, j] = adj[j, i] = 1
            else:
                if torch.rand(1) < p_out:
                    adj[i, j] = adj[j, i] = 1
    
    # Convert to edge_index
    edge_index = adj.nonzero().t().contiguous()
    return edge_index, labels

def generate_heterophilic_graph(num_nodes, num_classes, p_in=0.05, p_out=0.3):
    """Generate heterophilic graph"""
    labels = torch.randint(0, num_classes, (num_nodes,))
    adj = torch.zeros(num_nodes, num_nodes)
    for i in range(num_nodes):
        for j in range(i+1, num_nodes):
            if labels[i] == labels[j]:
                if torch.rand(1) < p_in:
                    adj[i, j] = adj[j, i] = 1
            else:
                if torch.rand(1) < p_out:
                    adj[i, j] = adj[j, i] = 1
    edge_index = adj.nonzero().t().contiguous()
    return edge_index, labels

# Generate features
def generate_features(num_nodes, num_features, labels):
    """Generate features based on labels"""
    features = torch.randn(num_nodes, num_features)
    # Add label-related signal
    for c in range(len(torch.unique(labels))):
        mask = labels == c
        features[mask] += 0.5 * torch.randn(1, num_features)
    return features

# Experimental setup
num_nodes = 500
num_classes = 5
num_features = 32
hidden_dim = 64

results = {}

# Homophilic graph
edge_index_homo, labels_homo = generate_homophilic_graph(num_nodes, num_classes)
features_homo = generate_features(num_nodes, num_features, labels_homo)

# Split train/val/test sets
train_mask = torch.zeros(num_nodes, dtype=torch.bool)
val_mask = torch.zeros(num_nodes, dtype=torch.bool)
test_mask = torch.zeros(num_nodes, dtype=torch.bool)
train_mask[:400] = True
val_mask[400:450] = True
test_mask[450:] = True

data_homo = type('obj', (object,), {
    'x': features_homo,
    'edge_index': edge_index_homo,
    'y': labels_homo,
    'train_mask': train_mask,
    'val_mask': val_mask,
    'test_mask': test_mask,
    'num_features': num_features
})()

# Heterophilic graph
edge_index_hetero, labels_hetero = generate_heterophilic_graph(num_nodes, num_classes)
features_hetero = generate_features(num_nodes, num_features, labels_hetero)
data_hetero = type('obj', (object,), {
    'x': features_homo,  # Use same features for fair comparison
    'edge_index': edge_index_hetero,
    'y': labels_hetero,
    'train_mask': train_mask,
    'val_mask': val_mask,
    'test_mask': test_mask,
    'num_features': num_features
})()

# Train models
for graph_type, data in [('Homophilic', data_homo), ('Heterophilic', data_hetero)]:
    # GCN
    model_gcn = GCN(num_features, hidden_dim, num_classes, num_layers=8)
    acc_gcn = train_model(model_gcn, data)
    
    # RDGNN
    model_rdgnn = RDGNN(num_features, hidden_dim, num_classes, num_layers=8)
    acc_rdgnn = train_model(model_rdgnn, data)
    
    results[graph_type] = {'GCN': acc_gcn, 'RDGNN': acc_rdgnn}
    print(f'{graph_type}: GCN={acc_gcn:.4f}, RDGNN={acc_rdgnn:.4f}')

# Visualization
graph_types = list(results.keys())
gcn_accs = [results[gt]['GCN'] for gt in graph_types]
rdgnn_accs = [results[gt]['RDGNN'] for gt in graph_types]

x = np.arange(len(graph_types))
width = 0.35

plt.figure(figsize=(10, 6))
plt.bar(x - width/2, gcn_accs, width, label='GCN')
plt.bar(x + width/2, rdgnn_accs, width, label='RDGNN')
plt.xlabel('Graph Type')
plt.ylabel('Test Accuracy')
plt.title('Performance on Homophilic vs Heterophilic Graphs')
plt.xticks(x, graph_types)
plt.legend()
plt.grid(True, axis='y')
plt.show()

Summary and Outlook

This article establishes the mathematical bridge between reaction-diffusion systems and graph neural networks. We have demonstrated:

Theoretical Connections: Graph Laplacian operators are discrete versions of continuous Laplacian operators, and reaction-diffusion systems on graphs can produce rich spatial patterns
Pattern Formation Mechanisms: Turing instability theory explains how uniform steady states become unstable and form patterns, providing new design principles for GNNs
Architectural Innovation: RDGNN explicitly models reaction-diffusion dynamics, alleviating over-smoothing and enhancing expressive power
Experimental Validation: On multiple datasets and graph structures, RDGNN demonstrates better performance compared to traditional GCN

Future Directions: - Adaptive Reaction Terms: Dynamically adjust reaction term forms according to graph structure - Multi-scale Patterns: Learn spatial patterns at different scales - Temporal Evolution: Extend RDGNN to dynamic graphs - Theoretical Analysis: Deeper bifurcation theory and stability analysis

The reaction-diffusion perspective provides new theoretical tools and practical guidance for graph neural networks, promising to advance GNN development in complex graph structure learning.