Cloud Computing (5): Network Architecture and SDN
Chen Kai BOSS

Modern cloud applications don't exist in isolation — they're interconnected systems spanning multiple regions, services, and users worldwide. The network infrastructure that enables this connectivity is arguably the most critical component of cloud computing. Without robust networking, even the most powerful compute instances are isolated islands, unable to communicate, scale, or serve users effectively.

Cloud networking has evolved far beyond simple IP connectivity. Today's cloud networks are software-defined, programmable, and intelligent. They automatically route traffic, balance loads, cache content globally, encrypt data in transit, and adapt to changing conditions — all while maintaining sub-millisecond latency and 99.99% uptime.

In this comprehensive guide, we'll explore cloud networking from the ground up: Virtual Private Clouds (VPCs) that provide isolated network environments, load balancers that distribute traffic intelligently, Content Delivery Networks (CDNs) that bring content closer to users, Software-Defined Networking (SDN) that revolutionizes network control, Network Functions Virtualization (NFV) that transforms network appliances into software, and the security, monitoring, and troubleshooting tools that keep everything running smoothly.

Virtual Private Cloud (VPC) Fundamentals

What is a VPC?

A Virtual Private Cloud (VPC) is a logically isolated section of a cloud provider's infrastructure where you can launch resources in a virtual network that you define. Think of it as your own private data center within the cloud, but with the flexibility and scalability that cloud computing provides.

Key Characteristics:

  • Isolation: Resources in your VPC are isolated from other customers' resources
  • Customizable: You control IP address ranges, subnets, route tables, and gateways
  • Secure: Multiple layers of security including network ACLs and security groups
  • Scalable: Automatically scales with your needs without hardware changes
  • Hybrid-ready: Can connect to on-premises data centers via VPN or dedicated connections

VPC Architecture Components

A typical VPC consists of several interconnected components:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
┌─────────────────────────────────────────────────────────┐
│ Internet Gateway │
│ (Public Internet Access) │
└───────────────────────┬───────────────────────────────────┘

┌───────────────────────▼───────────────────────────────────┐
│ VPC (10.0.0.0/16) │
│ │
│ ┌────────────────────┐ ┌────────────────────┐ │
│ │ Public Subnet │ │ Private Subnet │ │
│ │ (10.0.1.0/24) │ │ (10.0.2.0/24) │ │
│ │ │ │ │ │
│ │ ┌──────────────┐ │ │ ┌──────────────┐ │ │
│ │ │ Web Server │ │ │ │ Database │ │ │
│ │ │ (Public IP) │ │ │ │ (Private IP) │ │ │
│ │ └──────────────┘ │ │ └──────────────┘ │ │
│ └────────────────────┘ └────────────────────┘ │
│ │
│ ┌──────────────────────────────────────────────────┐ │
│ │ Route Tables │ │
│ │ Public: 0.0.0.0/0 → Internet Gateway │ │
│ │ Private: 10.0.0.0/16 → Local │ │
│ └──────────────────────────────────────────────────┘ │
│ │
│ ┌──────────────────────────────────────────────────┐ │
│ │ Security Groups & Network ACLs │ │
│ └──────────────────────────────────────────────────┘ │
└───────────────────────────────────────────────────────────┘

┌───────────────────────▼───────────────────────────────────┘
│ VPN Gateway / Direct Connect │
│ (On-Premises Connectivity) │
└───────────────────────────────────────────────────────────┘

Core Components:

  1. Subnets: Subdivisions of your VPC IP address range. Typically organized as:

    • Public Subnets: Resources with direct internet access via Internet Gateway
    • Private Subnets: Resources without direct internet access (more secure)
    • Isolated Subnets: No internet access at all (highest security)
  2. Route Tables: Control traffic routing within and outside the VPC

    • Define how traffic flows between subnets
    • Specify routes to internet gateways, NAT gateways, VPN gateways
  3. Internet Gateway: Provides public internet access for resources in public subnets

    • One per VPC
    • Enables bidirectional internet connectivity
  4. NAT Gateway: Allows private subnet resources to access the internet for outbound traffic

    • Prevents inbound internet connections (more secure)
    • Managed service with high availability
  5. Security Groups: Stateful virtual firewalls at the instance level

    • Act as allow lists (default deny)
    • Rules are evaluated for both inbound and outbound traffic
  6. Network ACLs: Stateless subnet-level firewalls

    • Additional layer of security
    • Rules are evaluated separately for inbound and outbound traffic

VPC Configuration Examples

AWS VPC Configuration:

1
2
3
4
5
6
7
8
9
10
11
{
"VpcId": "vpc-12345678",
"CidrBlock": "10.0.0.0/16",
"State": "available",
"Tags": [
{
"Key": "Name",
"Value": "production-vpc"
}
]
}

Terraform VPC Configuration:

Problem Background: Infrastructure as Code (IaC) tools like Terraform enable consistent, repeatable VPC deployments across environments. Manual VPC configuration is error-prone and difficult to maintain, especially when managing multiple environments (dev, staging, production) or multiple regions. Terraform provides declarative configuration that can be version-controlled and automated.

Solution Approach: - Declarative configuration: Define desired state rather than manual steps - Resource dependencies: Terraform automatically handles resource creation order - State management: Track infrastructure state to enable updates and deletions - Modular design: Reuse VPC modules across projects and environments

Design Considerations: - CIDR planning: Ensure non-overlapping CIDR blocks, reserve space for future growth - Multi-AZ deployment: Create subnets in multiple availability zones for high availability - DNS configuration: Enable DNS hostnames and support for private DNS resolution - Tagging strategy: Use consistent tags for resource identification and cost allocation

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
# VPC Definition
# Purpose: Create a production VPC with DNS support
# Security: VPC provides network isolation, but additional security groups and NACLs are required
resource "aws_vpc" "main" {
# CIDR block: /16 provides 65,536 IP addresses (65,531 usable)
# Design: Choose non-overlapping CIDR to avoid conflicts with on-premises networks
cidr_block = "10.0.0.0/16"

# DNS hostnames: Enable DNS hostname assignment for EC2 instances
# Required for: ALB, ECS service discovery, Route 53 private hosted zones
enable_dns_hostnames = true

# DNS support: Enable DNS resolution for instances in VPC
# Required for: DNS queries to work within VPC
enable_dns_support = true

tags = {
Name = "production-vpc"
Environment = "production"
ManagedBy = "terraform"
}
}

# Public Subnet
# Purpose: Deploy resources that need direct internet access (load balancers, NAT gateways)
# Security: Resources in public subnet are exposed to internet, use security groups carefully
resource "aws_subnet" "public" {
vpc_id = aws_vpc.main.id

# CIDR: /24 subnet provides 256 IPs (251 usable, AWS reserves 5)
# Note: Ensure CIDR doesn't overlap with other subnets
cidr_block = "10.0.1.0/24"

# Availability Zone: Deploy across multiple AZs for high availability
availability_zone = "us-east-1a"

# Map public IP: Automatically assign public IP to instances launched in this subnet
# Use case: Load balancers, NAT instances, bastion hosts
map_public_ip_on_launch = true

tags = {
Name = "public-subnet-1a"
Type = "public"
Environment = "production"
}
}

# Private Subnet
# Purpose: Deploy internal resources (application servers, databases)
# Security: No direct internet access, more secure than public subnet
resource "aws_subnet" "private" {
vpc_id = aws_vpc.main.id

# CIDR: Non-overlapping with public subnet
cidr_block = "10.0.2.0/24"

# Availability Zone: Same AZ as public subnet for low latency
# Best practice: Create matching subnets in multiple AZs
availability_zone = "us-east-1a"

tags = {
Name = "private-subnet-1a"
}
}

# Internet Gateway
resource "aws_internet_gateway" "main" {
vpc_id = aws_vpc.main.id

tags = {
Name = "main-igw"
}
}

# Route Table for Public Subnet
resource "aws_route_table" "public" {
vpc_id = aws_vpc.main.id

route {
cidr_block = "0.0.0.0/0"
gateway_id = aws_internet_gateway.main.id
}

tags = {
Name = "public-route-table"
}
}

# Associate Public Subnet with Route Table
resource "aws_route_table_association" "public" {
subnet_id = aws_subnet.public.id
route_table_id = aws_route_table.public.id
}

# NAT Gateway (for private subnet internet access)
resource "aws_eip" "nat" {
domain = "vpc"
tags = {
Name = "nat-gateway-eip"
}
}

resource "aws_nat_gateway" "main" {
allocation_id = aws_eip.nat.id
subnet_id = aws_subnet.public.id

tags = {
Name = "main-nat-gateway"
}
}

# Route Table for Private Subnet
resource "aws_route_table" "private" {
vpc_id = aws_vpc.main.id

route {
cidr_block = "0.0.0.0/0"
nat_gateway_id = aws_nat_gateway.main.id
}

tags = {
Name = "private-route-table"
}
}

# Associate Private Subnet with Route Table
resource "aws_route_table_association" "private" {
subnet_id = aws_subnet.private.id
route_table_id = aws_route_table.private.id
}

Google Cloud VPC Configuration:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# gcloud command to create VPC
gcloud compute networks create production-vpc \
--subnet-mode=custom \
--bgp-routing-mode=regional

# Create subnets
gcloud compute networks subnets create public-subnet \
--network=production-vpc \
--range=10.0.1.0/24 \
--region=us-east1 \
--enable-flow-logs

gcloud compute networks subnets create private-subnet \
--network=production-vpc \
--range=10.0.2.0/24 \
--region=us-east1 \
--private-ip-google-access

VPC Peering and Connectivity

VPC Peering: Connect two VPCs to enable resources to communicate using private IP addresses.

1
2
3
4
5
6
7
VPC A (10.0.0.0/16)          VPC B (172.16.0.0/16)
┌──────────────┐ ┌──────────────┐
│ │ │ │
│ Instance 1 │◄───────────►│ Instance 2 │
│ 10.0.1.10 │ Peering │ 172.16.1.20 │
│ │ Connection │ │
└──────────────┘ └──────────────┘

Peering Configuration:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
# VPC Peering Connection
resource "aws_vpc_peering_connection" "main" {
vpc_id = aws_vpc.vpc_a.id
peer_vpc_id = aws_vpc.vpc_b.id
auto_accept = true

tags = {
Name = "vpc-a-to-vpc-b"
}
}

# Route in VPC A to reach VPC B
resource "aws_route" "vpc_a_to_vpc_b" {
route_table_id = aws_route_table.vpc_a.id
destination_cidr_block = "172.16.0.0/16"
vpc_peering_connection_id = aws_vpc_peering_connection.main.id
}

# Route in VPC B to reach VPC A
resource "aws_route" "vpc_b_to_vpc_a" {
route_table_id = aws_route_table.vpc_b.id
destination_cidr_block = "10.0.0.0/16"
vpc_peering_connection_id = aws_vpc_peering_connection.main.id
}

Load Balancing: SLB, ELB, and ALB

Load balancing is the process of distributing incoming network traffic across multiple backend servers to ensure no single server becomes overwhelmed, improving application availability and responsiveness.

Load Balancer Types

1. Network Load Balancer (Layer 4 - TCP/UDP)

Operates at the transport layer, routing traffic based on IP addresses and ports.

Characteristics:

  • Ultra-low latency (<100ms)
  • Handles millions of requests per second
  • Preserves source IP address
  • Best for TCP/UDP traffic
  • Connection-based routing

Use Cases:

  • High-performance applications requiring low latency
  • TCP/UDP-based protocols
  • Gaming applications
  • IoT device communication

AWS Network Load Balancer Configuration:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
resource "aws_lb" "network" {
name = "network-lb"
internal = false
load_balancer_type = "network"
subnets = [aws_subnet.public.*.id]

enable_deletion_protection = false

tags = {
Environment = "production"
}
}

resource "aws_lb_target_group" "network" {
name = "network-tg"
port = 80
protocol = "TCP"
vpc_id = aws_vpc.main.id

health_check {
protocol = "TCP"
port = 80
healthy_threshold = 2
unhealthy_threshold = 2
interval = 30
}
}

resource "aws_lb_listener" "network" {
load_balancer_arn = aws_lb.network.arn
port = "80"
protocol = "TCP"

default_action {
type = "forward"
target_group_arn = aws_lb_target_group.network.arn
}
}

2. Application Load Balancer (Layer 7 - HTTP/HTTPS)

Operates at the application layer, making routing decisions based on content.

Characteristics:

  • Content-based routing (path, host, headers)
  • SSL/TLS termination
  • Advanced request routing
  • WebSocket and HTTP/2 support
  • Best for HTTP/HTTPS traffic

Use Cases:

  • Web applications
  • Microservices architectures
  • Container-based applications
  • API gateways

AWS Application Load Balancer Configuration:

Problem Background: Modern web applications often consist of multiple services (API services, admin panels, static websites) that need to route traffic based on request path or hostname. Application Load Balancers (ALB) provide content-based routing capabilities, enabling flexible traffic distribution and intelligent request routing.

Solution Approach: - Path-based routing: Route /api/* requests to API server group - Host-based routing: Route requests from specific domains to corresponding server groups - Default routing: Forward unmatched requests to default server group - Priority mechanism: Match rules in priority order, ensuring precise matches take precedence over wildcards

Design Considerations: - Rule priority: Lower numbers indicate higher priority, recommend starting from 1 and incrementing - Path matching: Use wildcards * to match sub-paths, e.g., /api/* matches all paths starting with /api/ - Health checks: Each target group requires independent health check configuration - SSL termination: ALB handles SSL/TLS termination, reducing backend server load

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
# Application Load Balancer
# Purpose: Distribute HTTP/HTTPS traffic across multiple backend servers
# Security: ALB should be in public subnet with security group allowing 80/443 from internet
resource "aws_lb" "application" {
name = "application-lb"
# Internal: false means internet-facing, true means internal-only
internal = false
load_balancer_type = "application"
# Security groups: Control which traffic can reach the ALB
security_groups = [aws_security_group.lb.id]
# Subnets: Must span at least 2 availability zones for high availability
subnets = [aws_subnet.public.*.id]

# Deletion protection: Prevent accidental deletion in production
# Set to true in production, false in dev/test
enable_deletion_protection = false
# HTTP/2: Enable HTTP/2 support for better performance
enable_http2 = true

tags = {
Environment = "production"
Name = "application-lb"
}
}

# Target Group for Web Servers
# Purpose: Group web servers together for load balancing
# Health check: ALB uses health checks to determine which targets are healthy
resource "aws_lb_target_group" "web" {
name = "web-tg"
port = 80
protocol = "HTTP"
vpc_id = aws_vpc.main.id

# Health check configuration
# Critical: Health checks determine which instances receive traffic
health_check {
enabled = true
# Healthy threshold: Consecutive successful checks to mark healthy
healthy_threshold = 2
# Unhealthy threshold: Consecutive failed checks to mark unhealthy
unhealthy_threshold = 2
# Timeout: Maximum time to wait for health check response
timeout = 5
# Interval: Time between health checks (seconds)
interval = 30
# Path: Health check endpoint (should be lightweight and fast)
path = "/health"
# Matcher: HTTP status codes that indicate healthy
matcher = "200"
}

# Session stickiness: Route same client to same backend
# Use case: Applications that maintain server-side session state
# Note: Consider using distributed session store (Redis) instead
stickiness {
type = "lb_cookie"
cookie_duration = 86400 # 24 hours
enabled = true
}
}

# Target Group for API Servers
# Purpose: Separate API servers from web servers for independent scaling
resource "aws_lb_target_group" "api" {
name = "api-tg"
port = 8080 # API servers typically use non-standard ports
protocol = "HTTP"
vpc_id = aws_vpc.main.id

health_check {
enabled = true
healthy_threshold = 2
unhealthy_threshold = 2
timeout = 5
interval = 30
# API-specific health check endpoint
path = "/api/health"
matcher = "200"
}
}

# HTTPS Listener
# Purpose: Terminate SSL/TLS at ALB, forward HTTP to backend
# Security: Use strong SSL policies and valid certificates
resource "aws_lb_listener" "https" {
load_balancer_arn = aws_lb.application.arn
port = "443"
protocol = "HTTPS"
# SSL policy: Restrict to secure TLS versions and ciphers
# ELBSecurityPolicy-TLS-1-2-2017-01: TLS 1.2+ only
ssl_policy = "ELBSecurityPolicy-TLS-1-2-2017-01"
# Certificate: Must be valid ACM certificate or imported certificate
certificate_arn = aws_acm_certificate.main.arn

# Default action: Forward to web target group if no rules match
default_action {
type = "forward"
target_group_arn = aws_lb_target_group.web.arn
}
}

# Path-based routing rule
# Purpose: Route API requests to API server group
# Priority: Lower number = higher priority, evaluated first
resource "aws_lb_listener_rule" "api" {
listener_arn = aws_lb_listener.https.arn
priority = 100 # Higher priority than default rule

action {
type = "forward"
target_group_arn = aws_lb_target_group.api.arn
}

# Condition: Match requests with /api/* path
# Note: /api/* matches /api/users but not /api (need separate rule for exact match)
condition {
path_pattern {
values = ["/api/*"]
}
}
}

# Host-based routing rule
# Purpose: Route admin.example.com requests to admin server group
resource "aws_lb_listener_rule" "admin" {
listener_arn = aws_lb_listener.https.arn
priority = 200 # Lower priority than API rule

action {
type = "forward"
target_group_arn = aws_lb_target_group.admin.arn
}

condition {
host_header {
values = ["admin.example.com"]
}
}
}

3. Classic Load Balancer (Legacy)

Older generation load balancer, being phased out in favor of ALB and NLB.

Load Balancing Algorithms

1. Round Robin: Distributes requests sequentially across servers

1
2
3
4
Request 1 → Server A
Request 2 → Server B
Request 3 → Server C
Request 4 → Server A (cycle repeats)

2. Least Connections: Routes to server with fewest active connections

1
2
3
Server A: 5 connections
Server B: 3 connections ← Selected
Server C: 7 connections

3. Weighted Round Robin: Round robin with server capacity weights

1
2
3
Server A (weight: 3) → 3 requests
Server B (weight: 1) → 1 request
Server C (weight: 2) → 2 requests

4. IP Hash: Routes based on client IP hash (session persistence)

1
Client IP: 192.168.1.100 → Hash → Server B (always)

5. Least Response Time: Routes to server with lowest response time

1
2
3
Server A: 50ms response time ← Selected
Server B: 120ms response time
Server C: 80ms response time

Load Balancer Health Checks

Health checks ensure traffic only goes to healthy backend servers.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
resource "aws_lb_target_group" "app" {
name = "app-tg"
port = 80
protocol = "HTTP"
vpc_id = aws_vpc.main.id

health_check {
enabled = true
healthy_threshold = 2 # Consecutive successes needed
unhealthy_threshold = 3 # Consecutive failures to mark unhealthy
timeout = 5 # Timeout in seconds
interval = 30 # Check interval in seconds
path = "/health"
protocol = "HTTP"
matcher = "200" # HTTP status codes considered healthy
port = "traffic-port"
}
}

Health Check Best Practices:

  • Use dedicated health check endpoints (/health, /ready)
  • Keep health checks lightweight (avoid database queries)
  • Set appropriate thresholds (2-3 healthy, 2-3 unhealthy)
  • Use different endpoints for liveness vs readiness (Kubernetes)
  • Monitor health check metrics to detect issues early

Load Balancer Performance Benchmarks

Throughput Comparison:

Load Balancer Type Max Throughput Latency Connections/sec
Network LB 100+ Gbps <100ms Millions
Application LB 10+ Gbps <400ms Hundreds of K
Classic LB 5 Gbps <500ms Tens of K

Traffic Analysis Example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# Simulating load balancer traffic distribution
import random
import statistics

servers = ['Server-A', 'Server-B', 'Server-C']
server_weights = {'Server-A': 3, 'Server-B': 1, 'Server-C': 2}
request_count = {'Server-A': 0, 'Server-B': 0, 'Server-C': 0}

# Weighted round robin simulation
total_weight = sum(server_weights.values())
for i in range(1000):
rand = random.uniform(0, total_weight)
cumulative = 0
for server, weight in server_weights.items():
cumulative += weight
if rand <= cumulative:
request_count[server] += 1
break

print("Request Distribution:")
for server, count in request_count.items():
percentage = (count / 1000) * 100
print(f"{server}: {count} requests ({percentage:.1f}%)")

Expected Output:

1
2
3
4
Request Distribution:
Server-A: 500 requests (50.0%)
Server-B: 167 requests (16.7%)
Server-C: 333 requests (33.3%)

Content Delivery Networks (CDN)

A Content Delivery Network (CDN) is a geographically distributed network of servers that cache content closer to end users, reducing latency and improving performance.

How CDNs Work

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
User Request Flow:
┌─────────┐
│ User │
│ (Tokyo) │
└────┬────┘
│ 1. Request for example.com/image.jpg

┌─────────────────┐
│ DNS Resolver │
└────┬────────────┘
│ 2. Query CDN DNS

┌─────────────────┐
│ CDN Edge │ ← Closest to user (Tokyo)
│ Server (Cache) │
└────┬────────────┘
│ 3. Cache HIT → Return cached content
│ Cache MISS → Forward to origin

┌─────────────────┐
│ Origin Server │
│ (US-East) │
└─────────────────┘

CDN Architecture

Edge Locations: Servers distributed globally, typically in major cities

  • Cache frequently accessed content
  • Serve content with lowest latency
  • Reduce load on origin servers

Origin Server: Original source of content

  • Serves content when cache misses occur
  • Can be cloud storage (S3, GCS) or web servers

CDN Features:

  1. Caching: Stores content at edge locations
  2. Compression: Gzip/Brotli compression to reduce bandwidth
  3. SSL/TLS: HTTPS termination at edge
  4. DDoS Protection: Absorbs attack traffic
  5. Geographic Routing: Routes to nearest edge location

CDN Configuration Examples

AWS CloudFront Configuration:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
resource "aws_cloudfront_distribution" "main" {
origin {
domain_name = aws_s3_bucket.website.bucket_regional_domain_name
origin_id = "S3-${aws_s3_bucket.website.bucket}"

s3_origin_config {
origin_access_identity = aws_cloudfront_origin_access_identity.main.cloudfront_access_identity_path
}
}

enabled = true
is_ipv6_enabled = true
comment = "Production CDN distribution"
default_root_object = "index.html"

default_cache_behavior {
allowed_methods = ["DELETE", "GET", "HEAD", "OPTIONS", "PATCH", "POST", "PUT"]
cached_methods = ["GET", "HEAD"]
target_origin_id = "S3-${aws_s3_bucket.website.bucket}"

forwarded_values {
query_string = false
cookies {
forward = "none"
}
}

viewer_protocol_policy = "redirect-to-https"
min_ttl = 0
default_ttl = 3600
max_ttl = 86400
compress = true
}

# Cache behavior for images
ordered_cache_behavior {
path_pattern = "/images/*"
allowed_methods = ["GET", "HEAD"]
cached_methods = ["GET", "HEAD"]
target_origin_id = "S3-${aws_s3_bucket.website.bucket}"

forwarded_values {
query_string = false
headers = ["Origin"]
}

min_ttl = 0
default_ttl = 86400 # 24 hours
max_ttl = 31536000 # 1 year
compress = true
viewer_protocol_policy = "redirect-to-https"
}

restrictions {
geo_restriction {
restriction_type = "whitelist"
locations = ["US", "CA", "GB", "DE"]
}
}

viewer_certificate {
cloudfront_default_certificate = true
}

custom_error_response {
error_code = 404
response_code = 200
response_page_path = "/index.html"
}
}

Cache Headers Configuration:

1
2
3
4
5
6
7
8
9
10
11
12
# Origin server configuration for optimal CDN caching
location ~* \.(jpg|jpeg|png|gif|ico|css|js)${
expires 1y;
add_header Cache-Control "public, immutable";
add_header Vary "Accept-Encoding";
}

location /api/ {
add_header Cache-Control "no-cache, no-store, must-revalidate";
add_header Pragma "no-cache";
add_header Expires "0";
}

CDN Performance Metrics

Key Metrics:

  • Cache Hit Ratio: Percentage of requests served from cache
    • Target: >90% for static content
    • Formula: (Cache Hits / Total Requests) × 100
  • Latency: Time from request to first byte
    • Edge cache: <50ms
    • Origin fetch: 100-500ms (depending on distance)
  • Bandwidth Savings: Data not transferred from origin
    • Formula: (Origin Bandwidth - CDN Bandwidth) / Origin Bandwidth × 100

Performance Comparison:

1
2
3
4
5
6
7
8
9
10
11
Without CDN:
User (Tokyo) → Origin (US-East): 200ms latency, 1.0 Gbps bandwidth

With CDN:
User (Tokyo) → Edge (Tokyo): 20ms latency, 0.1 Gbps bandwidth (90% cache hit)
User (Tokyo) → Edge (Tokyo) → Origin (US-East): 220ms latency, 0.1 Gbps bandwidth (10% cache miss)

Overall Improvement:

- Average Latency: 200ms → 40ms (80% reduction)
- Bandwidth: 1.0 Gbps → 0.1 Gbps (90% reduction)

Software-Defined Networking (SDN)

Software-Defined Networking (SDN) is an architecture that separates the network control plane from the data plane, enabling centralized network management and programmability.

Traditional Networking vs SDN

Traditional Networking:

1
2
3
4
5
6
7
8
9
┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│ Switch 1 │────▶│ Switch 2 │────▶│ Switch 3 │
│ │ │ │ │ │
│ Control + │ │ Control + │ │ Control + │
│ Data Plane │ │ Data Plane │ │ Data Plane │
└─────────────┘ └─────────────┘ └─────────────┘
│ │ │
└────────────────────┴────────────────────┘
(Distributed Control)

SDN Architecture:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
┌─────────────────────────────────────────┐
│ SDN Controller │
│ (Centralized Control Plane) │
│ │
│ - Network Topology Management │
│ - Flow Rule Programming │
│ - Policy Enforcement │
└───────────────┬─────────────────────────┘

┌───────────┼───────────┐
│ │ │
┌───▼───┐ ┌───▼───┐ ┌───▼───┐
│ Switch │ │ Switch │ │ Switch │
│ 1 │ │ 2 │ │ 3 │
│ │ │ │ │ │
│ Data │ │ Data │ │ Data │
│ Plane │ │ Plane │ │ Plane │
└───────┘ └───────┘ └───────┘

SDN Architecture Components

1. Control Plane: Centralized controller that manages network behavior

  • Network topology discovery
  • Flow rule computation
  • Policy enforcement
  • Network state management

2. Data Plane: Network devices (switches, routers) that forward packets

  • Forward packets based on flow tables
  • Report statistics to controller
  • Execute forwarding rules

3. Southbound API: Communication protocol between controller and switches

  • OpenFlow (most common)
  • NETCONF
  • P4 Runtime

4. Northbound API: Interface for applications to interact with controller

  • REST APIs
  • Python SDKs
  • Network management applications

OpenFlow Protocol

OpenFlow is the most widely adopted SDN protocol, defining the communication between controllers and switches.

OpenFlow Flow Table Structure:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
┌─────────────────────────────────────────────────────┐
│ Flow Table │
├──────────┬──────────┬──────────┬──────────┬────────┤
│ Match │ Priority │ Counters │ Actions │ Timeout │
├──────────┼──────────┼──────────┼──────────┼────────┤
│ Ingress │ 10 │ 1.2M │ Forward │ 0 │
│ Port: 1 │ │ packets │ Port: 2 │ │
├──────────┼──────────┼──────────┼──────────┼────────┤
│ Src IP: │ 20 │ 500K │ Drop │ 0 │
│ 10.0.1.5 │ │ packets │ │ │
├──────────┼──────────┼──────────┼──────────┼────────┤
│ * │ 0 │ 10M │ Send to │ 0 │
│ (default)│ │ packets │ Controller │ │
└──────────┴──────────┴──────────┴──────────┴────────┘

OpenFlow Message Types:

  1. Controller-to-Switch: Commands from controller

    • OFPT_FLOW_MOD: Add/modify/delete flow entries
    • OFPT_PACKET_OUT: Send packet through switch
    • OFPT_PORT_MOD: Modify port configuration
  2. Asynchronous: Events from switch to controller

    • OFPT_PACKET_IN: Packet doesn't match any flow
    • OFPT_FLOW_REMOVED: Flow entry removed
    • OFPT_PORT_STATUS: Port status changed
  3. Symmetric: Bidirectional messages

    • OFPT_HELLO: Initial handshake
    • OFPT_ECHO_REQUEST/REPLY: Keepalive

OpenFlow Flow Entry Example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
# Python example using Ryu SDN framework
from ryu.ofproto import ofproto_v1_3
from ryu.controller import ofp_event
from ryu.controller.handler import MAIN_DISPATCHER, set_ev_cls

class SimpleSwitch(app_manager.RyuApp):
OFP_VERSIONS = [ofproto_v1_3.OFP_VERSION]

def __init__(self, *args, **kwargs):
super(SimpleSwitch, self).__init__(*args, **kwargs)
self.mac_to_port = {}

@set_ev_cls(ofp_event.EventOFPPacketIn, MAIN_DISPATCHER)
def packet_in_handler(self, ev):
msg = ev.msg
datapath = msg.datapath
ofproto = datapath.ofproto
parser = datapath.ofproto_parser

# Extract packet information
in_port = msg.match['in_port']
pkt = packet.Packet(msg.data)
eth = pkt.get_protocols(ethernet.ethernet)[0]

# Learn MAC address
self.mac_to_port[datapath.id][eth.src] = in_port

# Check if destination MAC is known
if eth.dst in self.mac_to_port[datapath.id]:
out_port = self.mac_to_port[datapath.id][eth.dst]
else:
out_port = ofproto.OFPP_FLOOD

# Install flow rule
actions = [parser.OFPActionOutput(out_port)]
match = parser.OFPMatch(in_port=in_port, eth_dst=eth.dst)
self.add_flow(datapath, match, actions)

# Send packet
out = parser.OFPPacketOut(
datapath=datapath,
buffer_id=msg.buffer_id,
in_port=in_port,
actions=actions,
data=msg.data
)
datapath.send_msg(out)

def add_flow(self, datapath, match, actions):
ofproto = datapath.ofproto
parser = datapath.ofproto_parser

inst = [parser.OFPInstructionActions(
ofproto.OFPIT_APPLY_ACTIONS, actions)]

mod = parser.OFPFlowMod(
datapath=datapath,
priority=1,
match=match,
instructions=inst
)
datapath.send_msg(mod)

SDN Controllers

Popular SDN Controllers:

  1. OpenDaylight: Enterprise-grade, Java-based

    • REST APIs
    • Model-driven architecture
    • Plugin ecosystem
  2. ONOS: Carrier-grade SDN controller

    • High availability
    • Distributed architecture
    • Network applications
  3. Ryu: Python-based, lightweight

    • Easy to learn and extend
    • Good for research and development
    • REST API support
  4. Floodlight: Java-based, open source

    • REST API
    • Modular architecture
    • Good documentation

ONOS Controller Example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
// ONOS Application Example
@Component(immediate = true)
public class LoadBalancerApp implements NetworkConfigListener {

@Reference(cardinality = ReferenceCardinality.MANDATORY)
protected FlowRuleService flowRuleService;

@Reference(cardinality = ReferenceCardinality.MANDATORY)
protected PacketService packetService;

@Activate
public void activate() {
log.info("Load Balancer Application Started");
}

@Deactivate
public void deactivate() {
log.info("Load Balancer Application Stopped");
}

private void installLoadBalancingRule(DeviceId deviceId,
PortNumber inPort,
IpAddress serverIp) {
TrafficSelector selector = DefaultTrafficSelector.builder()
.matchInPort(inPort)
.matchEthType(Ethernet.TYPE_IPV4)
.build();

TrafficTreatment treatment = DefaultTrafficTreatment.builder()
.setIpDst(serverIp)
.setOutput(PortNumber.portNumber(1))
.build();

FlowRule rule = DefaultFlowRule.builder()
.forDevice(deviceId)
.withSelector(selector)
.withTreatment(treatment)
.withPriority(10)
.makePermanent()
.build();

flowRuleService.applyFlowRules(rule);
}
}

SDN Use Cases

1. Traffic Engineering: Optimize network paths based on current conditions

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# Dynamic path selection based on link utilization
def select_path(source, destination, topology):
paths = find_all_paths(source, destination, topology)

# Calculate path cost based on link utilization
path_costs = []
for path in paths:
cost = 0
for i in range(len(path) - 1):
link = (path[i], path[i+1])
utilization = get_link_utilization(link)
cost += utilization * 100 # Weight by utilization
path_costs.append((path, cost))

# Select path with lowest cost
best_path = min(path_costs, key=lambda x: x[1])[0]
return best_path

2. Network Virtualization: Create multiple logical networks on shared infrastructure

3. Security Policies: Centralized firewall and access control

4. Quality of Service (QoS): Guarantee bandwidth and latency for specific flows

5. Network Monitoring: Real-time visibility into network state

Network Functions Virtualization (NFV)

Network Functions Virtualization (NFV) decouples network functions from dedicated hardware appliances, running them as software on standard servers.

NFV Architecture

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Traditional Approach:
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Router │ │ Firewall │ │ Load │
│ Hardware │ │ Hardware │ │ Balancer │
│ Appliance │ │ Appliance │ │ Hardware │
└─────────────┘ └─────────────┘ └─────────────┘

NFV Approach:
┌─────────────────────────────────────────────┐
│ Standard x86 Servers │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Router │ │ Firewall │ │ Load │ │
│ │ VNF │ │ VNF │ │ Balancer │ │
│ │ │ │ │ │ VNF │ │
│ └──────────┘ └──────────┘ └──────────┘ │
└─────────────────────────────────────────────┘

NFV Components:

  1. Virtualized Network Functions (VNFs): Software implementations of network functions

    • Router VNF
    • Firewall VNF
    • Load Balancer VNF
    • NAT VNF
  2. NFV Infrastructure (NFVI): Hardware and software resources

    • Compute resources
    • Storage resources
    • Network resources
    • Virtualization layer
  3. NFV Management and Orchestration (MANO):

    • VNF Manager: Lifecycle management of VNFs
    • Virtualized Infrastructure Manager (VIM): Manages NFVI resources
    • NFV Orchestrator: Coordinates VNFs and resources

NFV Benefits

Cost Reduction:

  • Eliminate proprietary hardware
  • Use commodity servers
  • Reduce power consumption
  • Lower capital expenditure

Flexibility:

  • Rapid deployment of new services
  • Easy scaling up/down
  • Dynamic resource allocation
  • Service chaining

Innovation:

  • Faster time to market
  • Easier testing and validation
  • Software-based updates
  • DevOps practices

NFV Implementation Example

Firewall VNF using iptables:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
#!/bin/bash
# Firewall VNF Configuration

# Allow established connections
iptables -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT

# Allow SSH
iptables -A INPUT -p tcp --dport 22 -j ACCEPT

# Allow HTTP/HTTPS
iptables -A INPUT -p tcp --dport 80 -j ACCEPT
iptables -A INPUT -p tcp --dport 443 -j ACCEPT

# Block specific IP ranges
iptables -A INPUT -s 192.168.100.0/24 -j DROP

# Rate limiting
iptables -A INPUT -p tcp --dport 80 -m limit --limit 25/minute --limit-burst 100 -j ACCEPT

# Default deny
iptables -P INPUT DROP
iptables -P FORWARD DROP
iptables -P OUTPUT ACCEPT

Router VNF using Linux:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
#!/bin/bash
# Router VNF Configuration

# Enable IP forwarding
echo 1 > /proc/sys/net/ipv4/ip_forward

# Configure interfaces
ip addr add 10.0.1.1/24 dev eth0
ip addr add 10.0.2.1/24 dev eth1

# Add routes
ip route add 10.0.3.0/24 via 10.0.2.2

# NAT configuration
iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE
iptables -A FORWARD -i eth1 -o eth0 -j ACCEPT
iptables -A FORWARD -i eth0 -o eth1 -m state --state RELATED,ESTABLISHED -j ACCEPT

Load Balancer VNF using HAProxy:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
# HAProxy Load Balancer VNF Configuration
global
log /dev/log local0
maxconn 4096
daemon

defaults
log global
mode http
option httplog
timeout connect 5000ms
timeout client 50000ms
timeout server 50000ms

frontend http_front
bind *:80
default_backend http_back

backend http_back
balance roundrobin
option httpchk GET /health
server web1 10.0.1.10:80 check
server web2 10.0.1.11:80 check
server web3 10.0.1.12:80 check

NFV Service Chaining

Service chaining connects multiple VNFs in sequence to process traffic.

1
2
Traffic Flow:
Internet → Firewall VNF → Load Balancer VNF → Router VNF → Backend Servers

Service Chaining Configuration:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
# NFV Service Chain Definition
service_chain:
name: web-traffic-chain
vnfs:

- name: firewall-vnf
type: firewall
image: firewall-vnf:latest
resources:
cpu: 2
memory: 4GB

- name: loadbalancer-vnf
type: loadbalancer
image: haproxy-vnf:latest
resources:
cpu: 1
memory: 2GB

- name: router-vnf
type: router
image: router-vnf:latest
resources:
cpu: 1
memory: 1GB
chain_order:

- firewall-vnf
- loadbalancer-vnf
- router-vnf

VPN and Direct Connect

Virtual Private Network (VPN)

VPNs create encrypted tunnels over public networks to connect remote sites or users securely.

VPN Types:

  1. Site-to-Site VPN: Connects entire networks

    • Connects on-premises data center to VPC
    • Always-on connection
    • Uses IPsec protocol
  2. Client VPN: Connects individual users

    • Remote access for employees
    • SSL/TLS or IPsec
    • Per-user authentication

AWS VPN Configuration:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
# Customer Gateway (on-premises)
resource "aws_customer_gateway" "main" {
bgp_asn = 65000
ip_address = "203.0.113.12" # On-premises public IP
type = "ipsec.1"

tags = {
Name = "on-premises-gateway"
}
}

# Virtual Private Gateway
resource "aws_vpn_gateway" "main" {
vpc_id = aws_vpc.main.id

tags = {
Name = "vpn-gateway"
}
}

# VPN Connection
resource "aws_vpn_connection" "main" {
vpn_gateway_id = aws_vpn_gateway.main.id
customer_gateway_id = aws_customer_gateway.main.id
type = "ipsec.1"
static_routes_only = false

tags = {
Name = "site-to-site-vpn"
}
}

# Route to VPN
resource "aws_route" "vpn" {
route_table_id = aws_route_table.private.id
destination_cidr_block = "192.168.0.0/16" # On-premises network
gateway_id = aws_vpn_gateway.main.id
}

VPN Tunnel Configuration (StrongSwan):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
# /etc/ipsec.conf
config setup
charondebug="ike 2, knl 2, cfg 2"
uniqueids=yes

conn aws-vpc
type=tunnel
auto=start
keyexchange=ikev2
authby=secret
left=203.0.113.12 # On-premises public IP
leftsubnet=192.168.0.0/16 # On-premises network
right=54.123.45.67 # AWS VPN endpoint
rightsubnet=10.0.0.0/16 # VPC network
ike=aes256-sha256-modp2048
esp=aes256-sha256
dpdaction=restart
dpddelay=30s
dpdtimeout=120s

Direct Connect

Direct Connect provides dedicated network connections from on-premises to cloud providers, bypassing the internet.

Benefits:

  • Lower latency
  • More consistent network performance
  • Reduced bandwidth costs
  • Private connectivity

Direct Connect Architecture:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
On-Premises Data Center

│ Dedicated Fiber Connection
│ (1 Gbps, 10 Gbps, 100 Gbps)

┌────────────────────┐
│ Direct Connect │
│ Location (Colo) │
└─────────┬──────────┘

│ AWS/Azure/GCP Backbone

┌────────────────────┐
│ Cloud VPC │
└────────────────────┘

AWS Direct Connect Configuration:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
# Direct Connect Connection
resource "aws_dx_connection" "main" {
name = "direct-connect-1gbps"
bandwidth = "1Gbps"
location = "EqDC2" # Equinix Data Center

tags = {
Name = "production-dx"
}
}

# Virtual Interface (Private)
resource "aws_dx_private_virtual_interface" "main" {
connection_id = aws_dx_connection.main.id
name = "private-vif"
vlan = 100
address_family = "ipv4"
bgp_asn = 65000

# VPC Gateway
vpn_gateway_id = aws_vpn_gateway.main.id
}

Cross-Region Networking

Cross-region networking enables resources in different geographic regions to communicate efficiently.

Challenges

  1. Latency: Physical distance increases latency
  2. Bandwidth Costs: Inter-region data transfer costs
  3. Consistency: Maintaining data consistency across regions
  4. Failover: Handling region failures

Solutions

1. Global Load Balancing: Route traffic to nearest healthy region

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# AWS Route 53 Health Checks and Failover
resource "aws_route53_health_check" "us_east" {
fqdn = "us-east.example.com"
port = 443
type = "HTTPS"
resource_path = "/health"
failure_threshold = 3
request_interval = 30
}

resource "aws_route53_record" "main" {
zone_id = aws_route53_zone.main.zone_id
name = "example.com"
type = "A"

failover_routing_policy {
type = "PRIMARY"
}

set_identifier = "us-east"
health_check_id = aws_route53_health_check.us_east.id
records = ["54.123.45.67"]
}

2. VPC Peering Across Regions: Connect VPCs in different regions

1
2
3
4
5
6
7
8
9
10
# Cross-region VPC peering
resource "aws_vpc_peering_connection" "cross_region" {
vpc_id = aws_vpc.us_east.id
peer_vpc_id = aws_vpc.eu_west.id
peer_region = "eu-west-1"

tags = {
Name = "us-east-to-eu-west"
}
}

3. Transit Gateway: Centralized hub for VPC connectivity

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
resource "aws_ec2_transit_gateway" "main" {
description = "Global transit gateway"

tags = {
Name = "global-tgw"
}
}

# Attach VPCs from different regions
resource "aws_ec2_transit_gateway_vpc_attachment" "us_east" {
subnet_ids = aws_subnet.us_east.*.id
transit_gateway_id = aws_ec2_transit_gateway.main.id
vpc_id = aws_vpc.us_east.id
}

resource "aws_ec2_transit_gateway_vpc_attachment" "eu_west" {
subnet_ids = aws_subnet.eu_west.*.id
transit_gateway_id = aws_ec2_transit_gateway.main.id
vpc_id = aws_vpc.eu_west.id
}

Security Groups and Network ACLs

Security Groups

Security groups are stateful virtual firewalls at the instance level.

Characteristics:

  • Stateful: Return traffic automatically allowed
  • Default deny: All traffic blocked unless explicitly allowed
  • Instance-level: Applied to individual instances
  • Can have multiple security groups per instance

Security Group Rules:

Problem Background: Security groups are the primary network security mechanism in AWS, acting as virtual firewalls at the instance level. By default, security groups deny all inbound traffic and allow all outbound traffic. Properly configured security groups protect applications from unauthorized access while ensuring legitimate traffic flows smoothly.

Solution Approach: - Least privilege principle: Only open necessary ports and IP ranges - Stateful filtering: Leverage security groups' stateful nature to automatically allow return traffic - Security group references: Reference other security groups for dynamic security policies - Layered defense: Combine security groups with network ACLs for multiple layers of protection

Design Considerations: - Inbound rules: Explicitly allow required source IPs and ports - Outbound rules: Control external resource access (databases, APIs) - Rule limits: Maximum 50 inbound and 50 outbound rules per security group - Multiple security groups: Instances can have multiple security groups, rules are combined (OR logic)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
# Web Server Security Group
# Purpose: Control network access for web servers
# Security: Follow least privilege, avoid opening 0.0.0.0/0 to sensitive ports
resource "aws_security_group" "web" {
name = "web-sg"
description = "Security group for web servers"
vpc_id = aws_vpc.main.id

# Allow HTTP from anywhere
# Security consideration: In production, consider restricting to load balancer IPs or CDN IPs
# Best practice: Use WAF or CDN to filter traffic before it reaches instances
ingress {
description = "HTTP"
from_port = 80
to_port = 80
protocol = "tcp"
# Warning: 0.0.0.0/0 allows access from any IP address
# Production: Restrict to specific IP ranges or use security group references
cidr_blocks = ["0.0.0.0/0"]
}

# Allow HTTPS from anywhere
# Security consideration: HTTPS port typically needs to be open, but use WAF and DDoS protection
ingress {
description = "HTTPS"
from_port = 443
to_port = 443
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}

# Allow SSH from specific IP
# Security consideration: Strongly recommend restricting SSH access
# Best practice: Use bastion host or VPN, avoid direct SSH access from internet
ingress {
description = "SSH"
from_port = 22
to_port = 22
protocol = "tcp"
# Restricted to office IP range for management access
cidr_blocks = ["203.0.113.0/24"] # Office IP range
}

# Allow all outbound traffic
# Note: Security groups are stateful, return traffic for inbound connections is automatically allowed
# This egress rule allows outbound connections initiated by the instance
egress {
from_port = 0
to_port = 0
protocol = "-1" # All protocols
cidr_blocks = ["0.0.0.0/0"]
}

tags = {
Name = "web-security-group"
Environment = "production"
}
}

# Database Security Group
# Purpose: Control access to database servers
# Security: Only allow access from application servers, no direct internet access
resource "aws_security_group" "database" {
name = "db-sg"
description = "Security group for database"
vpc_id = aws_vpc.main.id

# Allow MySQL from web servers only
# Security consideration: Using security group reference instead of IP addresses
# Benefits: Automatically includes all instances with web security group, dynamic updates
ingress {
description = "MySQL"
from_port = 3306
to_port = 3306
protocol = "tcp"
# Reference web security group: Only instances with web security group can access
# This is more secure and flexible than using IP addresses
security_groups = [aws_security_group.web.id]
}

# No outbound rules (default deny)
# Database servers typically don't need outbound internet access
# If needed, explicitly allow specific destinations (e.g., for backups to S3)

tags = {
Name = "database-security-group"
Environment = "production"
}
}

Key Points Interpretation: - Stateful nature: Security groups automatically track connection state; if inbound connection is allowed, corresponding outbound return traffic is automatically allowed - Rule evaluation: Rules are evaluated in order, first matching rule applies; if no rule matches, default deny - Security group references: Using security group IDs as source/destination enables dynamic security policies (e.g., allow instances in same security group to communicate)

Design Trade-offs: - Rule count vs performance: More rules provide finer control but may impact performance; recommend keeping rules under 50 - Open range vs security: Opening 0.0.0.0/0 simplifies configuration but reduces security; recommend least privilege principle - Security group count vs management: Separate security groups per service provide better isolation but increase management complexity

Common Questions: - Q: Is there a limit on security group rules? A: Maximum 50 inbound and 50 outbound rules per security group, but an instance can have multiple security groups - Q: How to enable communication between security groups? A: Reference other security group IDs in rules to allow specific security groups to access - Q: Do security groups affect performance? A: Minimal impact with few rules, but recommend optimizing rule order when exceeding 20 rules

Production Practices: - Use Terraform or CloudFormation to manage security groups, enabling version control and auditing - Regularly audit security group rules, identify ports open to 0.0.0.0/0 (especially sensitive ports like 22, 3306, 5432) - Use different security groups for different environments to prevent production config leaks - Analyze actual traffic using VPC Flow Logs to optimize security group rules and remove unused rules - Use AWS Config or similar tools to continuously monitor security group configuration changes and detect security risks

Network ACLs

Network ACLs are stateless subnet-level firewalls.

Characteristics:

  • Stateless: Must explicitly allow return traffic
  • Subnet-level: Applied to entire subnets
  • Rule evaluation: Processed in order (lowest to highest)
  • Default allow: All traffic allowed unless explicitly denied

Network ACL Configuration:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
resource "aws_network_acl" "main" {
vpc_id = aws_vpc.main.id

# Allow HTTP inbound
ingress {
rule_no = 100
protocol = "tcp"
from_port = 80
to_port = 80
cidr_block = "0.0.0.0/0"
action = "allow"
}

# Allow HTTPS inbound
ingress {
rule_no = 110
protocol = "tcp"
from_port = 443
to_port = 443
cidr_block = "0.0.0.0/0"
action = "allow"
}

# Allow ephemeral ports for return traffic
ingress {
rule_no = 120
protocol = "tcp"
from_port = 1024
to_port = 65535
cidr_block = "0.0.0.0/0"
action = "allow"
}

# Allow all outbound
egress {
rule_no = 100
protocol = "-1"
from_port = 0
to_port = 0
cidr_block = "0.0.0.0/0"
action = "allow"
}

tags = {
Name = "main-nacl"
}
}

Security Best Practices

  1. Principle of Least Privilege: Only allow necessary traffic
  2. Defense in Depth: Use both security groups and NACLs
  3. Regular Audits: Review and update rules regularly
  4. IP Whitelisting: Restrict access to known IP ranges
  5. Logging: Enable VPC Flow Logs for monitoring

Network Monitoring and Troubleshooting

VPC Flow Logs

VPC Flow Logs capture information about IP traffic flowing through your VPC.

Flow Log Configuration:

1
2
3
4
5
6
7
8
9
10
11
resource "aws_flow_log" "main" {
iam_role_arn = aws_iam_role.flow_log.arn
log_destination = aws_cloudwatch_log_group.flow_log.arn
traffic_type = "ALL"
vpc_id = aws_vpc.main.id
}

resource "aws_cloudwatch_log_group" "flow_log" {
name = "vpc-flow-logs"
retention_in_days = 30
}

Flow Log Analysis:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
# Analyze VPC Flow Logs
import boto3
import json
from collections import defaultdict

logs_client = boto3.client('logs')

# Query flow logs
response = logs_client.filter_log_events(
logGroupName='vpc-flow-logs',
startTime=1234567890000,
endTime=1234567900000
)

# Parse and analyze
traffic_stats = defaultdict(int)
for event in response['events']:
log_line = event['message']
# Flow log format: version account-id interface-id srcaddr dstaddr srcport dstport protocol packets bytes start end action log-status
parts = log_line.split()
if len(parts) >= 13:
src_addr = parts[3]
dst_addr = parts[4]
protocol = parts[6]
bytes_transferred = int(parts[8])

traffic_stats[(src_addr, dst_addr, protocol)] += bytes_transferred

# Print top traffic flows
for (src, dst, proto), bytes_count in sorted(
traffic_stats.items(),
key=lambda x: x[1],
reverse=True
)[:10]:
print(f"{src} -> {dst} ({proto}): {bytes_count / 1024 / 1024:.2f} MB")

Network Troubleshooting Tools

1. ping: Test connectivity

1
2
3
4
5
6
7
8
# Basic ping
ping 8.8.8.8

# Ping with specific interface
ping -I eth0 10.0.1.10

# Ping with packet size
ping -s 1472 10.0.1.10

2. traceroute: Trace network path

1
2
3
4
5
6
7
8
# IPv4 traceroute
traceroute example.com

# IPv6 traceroute
traceroute6 example.com

# TCP traceroute
traceroute -T -p 443 example.com

3. tcpdump: Packet capture

1
2
3
4
5
6
7
8
9
10
11
# Capture all traffic on interface
tcpdump -i eth0

# Capture specific port
tcpdump -i eth0 port 80

# Capture and save to file
tcpdump -i eth0 -w capture.pcap

# Read from file
tcpdump -r capture.pcap

4. netstat: Network connections

1
2
3
4
5
6
7
8
# Show all listening ports
netstat -tuln

# Show all connections
netstat -tun

# Show process information
netstat -tulnp

5. ss: Modern netstat replacement

1
2
3
4
5
6
7
8
# Show listening sockets
ss -tuln

# Show connections
ss -tun

# Show process information
ss -tulnp

6. iptables: Firewall rules

1
2
3
4
5
6
7
8
# List all rules
iptables -L -n -v

# List NAT rules
iptables -t nat -L -n -v

# Check specific rule
iptables -C INPUT -p tcp --dport 80 -j ACCEPT

Common Network Issues and Solutions

Issue 1: Cannot reach instance from internet

Diagnosis:

1
2
3
4
5
6
7
8
# Check security group rules
aws ec2 describe-security-groups --group-ids sg-12345678

# Check route table
aws ec2 describe-route-tables --route-table-ids rtb-12345678

# Check network ACLs
aws ec2 describe-network-acls --network-acl-ids acl-12345678

Solutions:

  • Verify security group allows inbound traffic
  • Check route table has route to internet gateway
  • Ensure instance has public IP
  • Verify network ACL allows traffic

Issue 2: High latency between regions

Diagnosis:

1
2
3
4
5
6
# Measure latency
ping -c 10 us-east-instance.example.com
ping -c 10 eu-west-instance.example.com

# Trace route
traceroute us-east-instance.example.com

Solutions:

  • Use Direct Connect for dedicated connections
  • Implement CDN for static content
  • Deploy resources closer to users
  • Optimize application architecture

Issue 3: Intermittent connectivity

Diagnosis:

1
2
3
4
5
6
7
8
9
10
# Monitor connectivity
while true; do
ping -c 1 10.0.1.10 && echo "OK" || echo "FAIL"
sleep 1
done

# Check flow logs for dropped packets
aws logs filter-log-events \
--log-group-name vpc-flow-logs \
--filter-pattern "REJECT"

Solutions:

  • Check for rate limiting
  • Review security group rules
  • Verify network ACL rules
  • Check for DDoS attacks

Case Studies

Case Study 1: E-commerce Platform Migration

Scenario: A large e-commerce platform migrating from on-premises to AWS, requiring high availability and global reach.

Requirements:

  • Multi-region deployment (US, EU, Asia)
  • 99.99% uptime SLA
  • Sub-100ms latency for API calls
  • Handle 10M+ requests per day
  • PCI-DSS compliance

Architecture:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
Global Users

├─► US-East (Primary)
│ ├─► Application Load Balancer
│ ├─► Auto Scaling Group (Web Servers)
│ ├─► RDS Multi-AZ (Database)
│ └─► ElastiCache (Redis)

├─► EU-West (Secondary)
│ └─► (Same architecture)

└─► Asia-Pacific (Tertiary)
└─► (Same architecture)

CDN (CloudFront)

└─► S3 (Static Assets)

Implementation:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
# Multi-region VPC setup
module "vpc_us_east" {
source = "./modules/vpc"
region = "us-east-1"
cidr = "10.0.0.0/16"
}

module "vpc_eu_west" {
source = "./modules/vpc"
region = "eu-west-1"
cidr = "10.1.0.0/16"
}

module "vpc_ap_southeast" {
source = "./modules/vpc"
region = "ap-southeast-1"
cidr = "10.2.0.0/16"
}

# Global Load Balancer (Route 53)
resource "aws_route53_record" "api" {
zone_id = aws_route53_zone.main.zone_id
name = "api.example.com"
type = "A"

latency_routing_policy {
region = "us-east-1"
}

set_identifier = "us-east"
records = [module.vpc_us_east.alb_dns]
}

# Cross-region replication for database
resource "aws_db_instance" "primary" {
identifier = "db-primary"
engine = "mysql"
instance_class = "db.r5.xlarge"
multi_az = true

# Enable cross-region read replicas
replicate_source_db = null
}

resource "aws_db_instance" "replica_eu" {
identifier = "db-replica-eu"
replicate_source_db = aws_db_instance.primary.identifier
availability_zone = "eu-west-1a"
}

Results:

  • Latency: Reduced from 250ms to 45ms (82% improvement)
  • Availability: Achieved 99.99% uptime
  • Cost: 40% reduction compared to on-premises
  • Scalability: Handled 15M requests/day during peak

Case Study 2: Financial Services SDN Implementation

Scenario: A financial services company implementing SDN to improve network agility and reduce operational costs.

Requirements:

  • Centralized network management
  • Dynamic traffic engineering
  • Security policy enforcement
  • Compliance with financial regulations

Architecture:

1
2
3
4
5
6
7
8
9
10
11
SDN Controller (ONOS)

├─► Data Center 1 (Trading)
│ ├─► OpenFlow Switches
│ └─► VNFs (Firewall, Load Balancer)

├─► Data Center 2 (Backend)
│ └─► OpenFlow Switches

└─► Cloud VPC (Disaster Recovery)
└─► Virtual Switches

Implementation:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
# SDN Application for Traffic Engineering
class TrafficEngineeringApp(app_manager.RyuApp):
def __init__(self, *args, **kwargs):
super(TrafficEngineeringApp, self).__init__(*args, **kwargs)
self.link_utilization = {}
self.path_cache = {}

@set_ev_cls(ofp_event.EventOFPPacketIn, MAIN_DISPATCHER)
def packet_in_handler(self, ev):
# Analyze packet and select optimal path
path = self.select_optimal_path(src, dst)
self.install_flow_rules(path)

def select_optimal_path(self, src, dst):
# Get all possible paths
paths = self.get_all_paths(src, dst)

# Calculate path cost based on utilization
best_path = min(paths, key=lambda p: self.calculate_path_cost(p))

return best_path

def calculate_path_cost(self, path):
total_cost = 0
for i in range(len(path) - 1):
link = (path[i], path[i+1])
utilization = self.get_link_utilization(link)
# Prefer paths with lower utilization
cost = utilization * 100
total_cost += cost
return total_cost

Results:

  • Network Utilization: Improved from 60% to 85%
  • Latency: Reduced by 30% through optimal routing
  • Operational Costs: Reduced by 35%
  • Policy Deployment: Reduced from days to minutes

Case Study 3: Media Streaming Platform with Global CDN

Scenario: A video streaming platform serving millions of users worldwide, requiring low latency and high bandwidth.

Requirements:

  • Sub-second video start time
  • Support 4K streaming
  • Handle 50M+ concurrent users
  • Global content distribution
  • Cost-effective bandwidth usage

Architecture:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
Users Worldwide

├─► CDN Edge (North America)
│ ├─► Cache Hit: 95%
│ └─► Latency: 20ms

├─► CDN Edge (Europe)
│ ├─► Cache Hit: 92%
│ └─► Latency: 25ms

├─► CDN Edge (Asia)
│ ├─► Cache Hit: 90%
│ └─► Latency: 30ms

└─► Origin Servers (Multi-region)
├─► S3 (Video Storage)
└─► EC2 (Transcoding)

Implementation:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
# CloudFront Distribution for Video Streaming
resource "aws_cloudfront_distribution" "video" {
origin {
domain_name = aws_s3_bucket.videos.bucket_regional_domain_name
origin_id = "S3-videos"

s3_origin_config {
origin_access_identity = aws_cloudfront_origin_access_identity.video.cloudfront_access_identity_path
}
}

enabled = true
is_ipv6_enabled = true
comment = "Video streaming CDN"

# Video streaming cache behavior
ordered_cache_behavior {
path_pattern = "/videos/*"
allowed_methods = ["GET", "HEAD", "OPTIONS"]
cached_methods = ["GET", "HEAD"]
target_origin_id = "S3-videos"

forwarded_values {
query_string = false
headers = ["Origin", "Access-Control-Request-Headers", "Access-Control-Request-Method"]
}

min_ttl = 0
default_ttl = 86400 # 24 hours
max_ttl = 31536000 # 1 year
compress = true
viewer_protocol_policy = "redirect-to-https"
}

# Adaptive bitrate streaming support
ordered_cache_behavior {
path_pattern = "/streaming/*"
allowed_methods = ["GET", "HEAD", "OPTIONS"]
cached_methods = ["GET", "HEAD"]
target_origin_id = "S3-videos"

forwarded_values {
query_string = true
headers = ["Origin"]
}

min_ttl = 0
default_ttl = 3600 # 1 hour
max_ttl = 86400 # 24 hours
compress = true
viewer_protocol_policy = "redirect-to-https"
}

restrictions {
geo_restriction {
restriction_type = "none"
}
}

viewer_certificate {
acm_certificate_arn = aws_acm_certificate.video.arn
ssl_support_method = "sni-only"
minimum_protocol_version = "TLSv1.2_2021"
}

# Price class for cost optimization
price_class = "PriceClass_All"
}

CDN Cache Strategy:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
# Cache warming script
import boto3
import requests
from concurrent.futures import ThreadPoolExecutor

cloudfront = boto3.client('cloudfront')
s3 = boto3.client('s3')

def warm_cache(video_key):
"""Warm CDN cache for a video"""
distribution_id = 'E1234567890ABC'
video_url = f"https://d{distribution_id}.cloudfront.net/videos/{video_key}"

# Request video to populate cache
response = requests.head(video_url)
return response.status_code == 200

# Get popular videos from S3
paginator = s3.get_paginator('list_objects_v2')
pages = paginator.paginate(Bucket='video-bucket', Prefix='videos/')

popular_videos = []
for page in pages:
for obj in page.get('Contents', []):
# Filter by popularity (simplified)
if obj['Size'] > 0:
popular_videos.append(obj['Key'])

# Warm cache for top 1000 videos
with ThreadPoolExecutor(max_workers=50) as executor:
executor.map(warm_cache, popular_videos[:1000])

Results:

  • Video Start Time: Reduced from 3s to 0.8s (73% improvement)
  • Cache Hit Ratio: 94% globally
  • Bandwidth Costs: Reduced by 88% compared to direct origin serving
  • User Experience: 4.8/5.0 rating (up from 3.2/5.0)
  • Concurrent Users: Successfully handled 60M+ users

Q&A: Cloud Networking and SDN

Q1: What's the difference between Security Groups and Network ACLs?

A: Security Groups and Network ACLs provide different layers of network security:

Feature Security Groups Network ACLs
Level Instance level Subnet level
State Stateful (return traffic auto-allowed) Stateless (must allow return traffic)
Rules Allow rules only Allow and deny rules
Evaluation All rules evaluated Rules evaluated in order
Default Deny all Allow all
Scope Applied to specific instances Applied to entire subnet

Best Practice: Use Security Groups as primary defense (easier to manage), and Network ACLs for additional subnet-level protection when needed.

Q2: How do I choose between Network Load Balancer and Application Load Balancer?

A: Choose based on your requirements:

Use Network Load Balancer (NLB) when:

  • You need ultra-low latency (<100ms)
  • Handling millions of requests per second
  • Working with TCP/UDP protocols
  • Preserving source IP address is important
  • High-performance requirements

Use Application Load Balancer (ALB) when:

  • You need content-based routing (path, host, headers)
  • SSL/TLS termination at load balancer
  • HTTP/HTTPS traffic
  • WebSocket or HTTP/2 support needed
  • Advanced request routing required

Example: For a gaming application with TCP traffic requiring low latency, use NLB. For a web application with microservices requiring path-based routing, use ALB.

Q3: What is the difference between VPC Peering and Transit Gateway?

A:

VPC Peering:

  • Point-to-point connection between two VPCs
  • Simple and cost-effective for few VPCs
  • No bandwidth charges (within same region)
  • Limited scalability (full mesh becomes complex)

Transit Gateway:

  • Hub-and-spoke model connecting multiple VPCs
  • Centralized management
  • Better for many VPCs (scales better)
  • Supports VPN and Direct Connect attachments
  • Per-GB data processing charges

When to use:

  • VPC Peering: 2-5 VPCs, simple connectivity needs
  • Transit Gateway: 5+ VPCs, complex network topology, need centralized management

Q4: How does SDN improve network management compared to traditional networking?

A: SDN provides several key advantages:

  1. Centralized Control: Single point of management instead of configuring each device individually
  2. Programmability: Networks can be controlled via software APIs
  3. Dynamic Configuration: Changes can be made instantly without touching hardware
  4. Traffic Engineering: Optimize paths based on real-time conditions
  5. Network Virtualization: Create multiple logical networks on shared infrastructure
  6. Automation: Integrate with DevOps tools and CI/CD pipelines

Example: In traditional networking, changing a firewall rule requires logging into each firewall device. With SDN, you update a policy in the controller, and it's automatically applied to all relevant devices.

Q5: What are the main components of NFV architecture?

A: NFV architecture consists of three main components:

  1. Virtualized Network Functions (VNFs): Software implementations of network functions (routers, firewalls, load balancers) running on standard servers

  2. NFV Infrastructure (NFVI): The hardware and software resources that provide compute, storage, and networking capabilities:

    • Compute: Servers (CPU, memory)
    • Storage: Storage systems
    • Network: Switches, routers
    • Virtualization Layer: Hypervisor or container runtime
  3. NFV Management and Orchestration (MANO):

    • VNF Manager: Manages lifecycle of VNFs (create, update, delete)
    • Virtualized Infrastructure Manager (VIM): Manages NFVI resources (OpenStack, Kubernetes)
    • NFV Orchestrator: Coordinates VNFs and resources to create network services

Q6: How do CDNs reduce latency and improve performance?

A: CDNs improve performance through several mechanisms:

  1. Geographic Distribution: Content cached at edge locations closer to users

    • Example: User in Tokyo accesses content from Tokyo edge (20ms) instead of US origin (200ms)
  2. Caching: Frequently accessed content stored at edge, reducing origin load

    • Cache hit ratio typically 90%+ for static content
  3. Compression: Gzip/Brotli compression reduces bandwidth usage

    • Can reduce file sizes by 70-90%
  4. Optimized Routing: CDNs use intelligent routing to select best edge server

    • Based on latency, server load, network conditions
  5. HTTP/2 and HTTP/3: Modern protocols with multiplexing and header compression

Performance Impact:

  • Latency: 70-90% reduction for cached content
  • Bandwidth: 80-95% reduction in origin bandwidth
  • Availability: Improved through distributed architecture

Q7: What are the security considerations for VPN connections?

A: Key security considerations:

  1. Encryption: Use strong encryption algorithms

    • IKEv2 with AES-256-GCM
    • Avoid weak ciphers (DES, MD5)
  2. Authentication: Strong authentication methods

    • Pre-shared keys (PSK) for site-to-site
    • Certificates for better security
    • Multi-factor authentication for client VPN
  3. Key Management: Secure key storage and rotation

    • Regular key rotation (every 90 days)
    • Use key management services (AWS KMS, Azure Key Vault)
  4. Monitoring: Monitor VPN connections

    • Connection status
    • Traffic patterns
    • Failed authentication attempts
  5. Network Segmentation: Isolate VPN traffic

    • Use separate VPC/subnet for VPN endpoints
    • Restrict access to necessary resources only
  6. Compliance: Ensure compliance with regulations

    • Encrypt data in transit
    • Log access and activities
    • Regular security audits

Q8: How does Direct Connect differ from VPN, and when should I use each?

A:

Feature Direct Connect VPN
Connection Type Dedicated physical connection Encrypted tunnel over internet
Latency Lower and more consistent Higher, variable
Bandwidth 1 Gbps - 100 Gbps Limited by internet connection
Cost Higher (monthly fee + data transfer) Lower (pay per hour/data)
Setup Time Weeks (physical installation) Minutes (software configuration)
Reliability Higher (dedicated circuit) Depends on internet quality
Use Case High-volume, consistent traffic Low-volume, occasional access

Use Direct Connect when:

  • High bandwidth requirements (100+ Mbps consistently)
  • Low latency critical (financial trading, real-time applications)
  • Large data transfers (data migration, backups)
  • Compliance requires private connectivity

Use VPN when:

  • Low to moderate bandwidth needs
  • Occasional connectivity (backup connection, remote access)
  • Cost-sensitive scenarios
  • Quick setup required

Q9: What are the best practices for cross-region networking?

A: Best practices:

  1. Minimize Cross-Region Traffic:

    • Replicate data to regions where it's accessed
    • Use regional endpoints for services
    • Cache content at edge locations
  2. Optimize Data Transfer:

    • Use compression for data transfers
    • Batch operations to reduce round trips
    • Use Direct Connect for high-volume transfers
  3. Implement Failover:

    • Health checks for each region
    • Automatic failover to healthy regions
    • Test failover procedures regularly
  4. Monitor and Alert:

    • Monitor latency between regions
    • Track data transfer costs
    • Set up alerts for connectivity issues
  5. Design for Regional Independence:

    • Each region should be self-contained
    • Minimize dependencies between regions
    • Design for eventual consistency
  6. Cost Optimization:

    • Use compression and caching
    • Route traffic efficiently
    • Consider data transfer costs in architecture decisions

Q10: How do I troubleshoot network connectivity issues in the cloud?

A: Systematic troubleshooting approach:

Step 1: Verify Instance-Level Configuration

1
2
3
4
5
6
7
8
# Check network interface
ip addr show

# Check routing table
ip route show

# Test connectivity
ping 8.8.8.8

Step 2: Check Security Groups

1
2
3
4
# List security group rules
aws ec2 describe-security-groups --group-ids sg-12345678

# Verify rules allow necessary traffic

Step 3: Check Network ACLs

1
2
3
4
# List NACL rules
aws ec2 describe-network-acls --network-acl-ids acl-12345678

# Verify rules are not blocking traffic

Step 4: Check Route Tables

1
2
3
4
# List route tables
aws ec2 describe-route-tables --route-table-ids rtb-12345678

# Verify routes are correct

Step 5: Review VPC Flow Logs

1
2
3
4
# Query flow logs for rejected traffic
aws logs filter-log-events \
--log-group-name vpc-flow-logs \
--filter-pattern "REJECT"

Step 6: Test from Different Sources

  • Test from same subnet
  • Test from different subnet
  • Test from internet (if applicable)

Step 7: Use Network Diagnostic Tools

1
2
3
4
5
6
7
8
# Packet capture
tcpdump -i eth0 -w capture.pcap

# Trace route
traceroute destination-ip

# Check DNS resolution
nslookup example.com

Common Issues and Solutions:

  • No internet access: Check route table, internet gateway, NAT gateway
  • Cannot reach other instances: Check security groups, NACLs, route tables
  • High latency: Check instance type, network performance, geographic distance
  • Intermittent connectivity: Check for rate limiting, DDoS protection, health checks

Summary

Cloud networking and Software-Defined Networking represent fundamental shifts in how we design, deploy, and manage network infrastructure. From the isolated environments provided by Virtual Private Clouds to the intelligent traffic distribution of load balancers, from the global reach of Content Delivery Networks to the programmability of SDN controllers, modern cloud networking enables applications that are scalable, secure, and performant.

Key Takeaways:

  1. VPC Fundamentals: Virtual Private Clouds provide isolated, customizable network environments with multiple layers of security through security groups and network ACLs.

  2. Load Balancing: Modern load balancers (ALB, NLB) distribute traffic intelligently, improving availability and performance through health checks and advanced routing.

  3. Content Delivery: CDNs bring content closer to users, dramatically reducing latency and bandwidth costs through geographic distribution and intelligent caching.

  4. SDN Revolution: Software-Defined Networking separates control from data planes, enabling centralized management, programmability, and dynamic network configuration.

  5. NFV Transformation: Network Functions Virtualization moves network appliances to software, reducing costs and increasing flexibility.

  6. Hybrid Connectivity: VPN and Direct Connect enable secure, reliable connections between on-premises and cloud environments.

  7. Cross-Region Networking: Global applications require careful design to minimize latency, optimize costs, and ensure high availability across regions.

  8. Security: Multiple layers of security (security groups, NACLs, encryption) protect network traffic and resources.

  9. Monitoring and Troubleshooting: Comprehensive monitoring and systematic troubleshooting ensure network reliability and performance.

  10. Best Practices: Following best practices for network design, security, and operations ensures scalable, maintainable cloud networks.

As cloud computing continues to evolve, networking remains at its core. Understanding these concepts and technologies is essential for building modern, scalable applications that can serve users globally while maintaining security, performance, and cost efficiency. Whether you're designing a simple web application or a complex multi-region system, the principles and technologies covered in this guide provide the foundation for successful cloud networking implementations.

  • Post title:Cloud Computing (5): Network Architecture and SDN
  • Post author:Chen Kai
  • Create time:2023-02-15 00:00:00
  • Post link:https://www.chenk.top/en/cloud-computing-networking-sdn/
  • Copyright Notice:All articles in this blog are licensed under BY-NC-SA unless stating additionally.
 Comments