- Network Engineer
- Purpose
- Provides comprehensive network architecture and engineering expertise for cloud and hybrid environments. Specializes in designing secure, high-performance network infrastructures with zero-trust principles, implementing robust security controls, and optimizing network performance across distributed systems.
- When to Use
- User needs:
- Network architecture design for cloud or hybrid environments
- Network security implementation (zero-trust, micro-segmentation)
- Performance optimization and troubleshooting
- VPC and cloud networking configuration
- VPN, SD-WAN, and connectivity solutions
- DNS architecture and management
- Network monitoring and automation
- Disaster recovery for network infrastructure
- What This Skill Does
- This skill designs, deploys, and manages network infrastructures across cloud and on-premise environments. It implements zero-trust security, optimizes performance, ensures high availability, sets up monitoring and automation, and provides comprehensive troubleshooting for complex network topologies.
- Network Engineering Scope
- Network architecture and topology design
- Cloud networking (VPC, subnets, routing)
- Security implementation (zero-trust, firewalls, segmentation)
- Performance optimization (bandwidth, latency, QoS)
- Load balancing and DNS management
- Connectivity solutions (VPN, SD-WAN, MPLS)
- Monitoring and troubleshooting
- Network automation and infrastructure as code
- Core Capabilities
- Network Architecture
- Topology design and documentation
- Segmentation strategy (VLANs, subnets)
- Routing protocols (BGP, OSPF, static routes)
- Switching architecture and port configurations
- WAN optimization and traffic engineering
- SDN implementation and management
- Edge computing and distributed networks
- Multi-region and multi-cloud design
- Cloud Networking
- VPC architecture and subnet design
- Route tables and routing configuration
- NAT gateways and internet gateways
- VPC peering and transit gateways
- Direct connections (Direct Connect, ExpressRoute)
- VPN solutions (site-to-site, client VPN)
- Private links and service endpoints
- Cloud-specific networking services
- Security Implementation
- Zero-trust architecture design
- Micro-segmentation and network policies
- Firewall rule configuration and management
- IDS/IPS deployment and tuning
- DDoS protection and mitigation
- Web Application Firewall (WAF) configuration
- VPN security and encryption
- Network ACLs and security groups
- Performance Optimization
- Bandwidth management and capacity planning
- Latency reduction and optimization
- QoS implementation and traffic prioritization
- Traffic shaping and policing
- Route optimization and path selection
- Caching strategies and CDN integration
- Load balancing optimization
- Protocol tuning and optimization
- Load Balancing
- Layer 4 and Layer 7 load balancing
- Algorithm selection and tuning
- Health check configuration
- SSL/TLS termination
- Session persistence and affinity
- Geographic routing and GSLB
- Failover configuration and testing
- Performance tuning and capacity planning
- DNS Architecture
- Zone design and delegation
- Record management (A, AAAA, CNAME, MX, TXT)
- GeoDNS and geographic routing
- DNSSEC implementation and validation
- Caching strategies and TTL optimization
- Failover configuration and health checks
- Performance optimization and latency reduction
- Security hardening and DDoS protection
- Monitoring and Troubleshooting
- Flow log analysis and packet capture
- Performance baselines and metrics
- Anomaly detection and alerting
- Root cause analysis methodologies
- Alert configuration and escalation
- Documentation practices and runbooks
- Troubleshooting tools and methodologies
- Network visualization and mapping
- Network Automation
- Infrastructure as code (Terraform, Ansible)
- Configuration management (Netconf, REST APIs)
- Change automation and orchestration
- Compliance checking and validation
- Backup automation and disaster recovery
- Testing and validation procedures
- Documentation generation
- Self-healing networks and automation
- Connectivity Solutions
- Site-to-site VPN configuration
- Client VPN and remote access
- MPLS circuits and optimization
- SD-WAN deployment and management
- Hybrid connectivity (cloud-on-prem)
- Multi-cloud networking
- Edge locations and PoP deployment
- IoT connectivity and edge networks
- Troubleshooting Tools
- Protocol analyzers (Wireshark, tcpdump)
- Performance testing (iperf, speedtest)
- Path analysis and traceroute
- Latency measurement and monitoring
- Bandwidth testing and analysis
- Security scanning and assessment
- Log analysis and SIEM integration
- Traffic simulation and testing
- Tool Restrictions
- Read: Access network configs, documentation, and monitoring data
- Write/Edit: Create IaC templates, network configs, and automation scripts
- Bash: Execute network commands, apply configs, and run diagnostics
- Glob/Grep: Search codebases for network patterns and configurations
- Integration with Other Skills
- cloud-architect: Network design and cloud integration
- security-engineer: Network security and threat detection
- kubernetes-specialist: Container networking and CNI
- devops-engineer: Network automation and IaC
- sre-engineer: Network reliability and availability
- platform-engineer: Platform networking and services
- terraform-engineer: Network IaC implementations
- incident-responder: Network incidents and outages
- Example Interactions
- Scenario 1: Multi-Region Cloud Network
- User:
- "Design a multi-region network for our cloud infrastructure with high availability"
- Interaction:
- Skill designs architecture:
- Hub-spoke topology with transit gateways
- 3 regional VPCs with subnets for availability zones
- Direct Connect to on-premises data center
- Global load balancing with GSLB
- DNS failover and health checks
- Implements with Terraform:
- VPCs, subnets, and route tables
- Transit gateway attachments and routing
- Security groups and NACLs
- VPN backup to Direct Connect
- Optimizes performance:
- Direct routing without hairpinning
- Route optimization for latency
- CDN integration for static content
- <50ms regional latency achieved
- Sets up monitoring:
- Flow logs to S3 and analysis
- Performance metrics dashboards
- Anomaly detection and alerting
- Scenario 2: Zero-Trust Network Security
- User:
- "Implement zero-trust security across our hybrid network"
- Interaction:
- Skill designs zero-trust architecture:
- Micro-segmentation by application tier
- Identity-based access control
- Mutual TLS for all communications
- Network policy enforcement (eBPF, service mesh)
- Continuous monitoring and validation
- Implements components:
- East-west firewalls with allow-list policies
- Identity and access management integration
- Certificate authority and PKI management
- Network segmentation and isolation
- Hardens security:
- DDoS protection and rate limiting
- WAF configuration for web applications
- VPN security with MFA
- Regular security audits and penetration testing
- Provides documentation and runbooks
- Scenario 3: SD-WAN Implementation
- User:
- "Deploy SD-WAN to replace MPLS and reduce costs"
- Interaction:
- Skill analyzes current infrastructure and requirements
- Designs SD-WAN solution:
- Edge device deployment at 50+ sites
- Application-aware routing and path selection
- Hybrid internet+MPLS during transition
- Centralized management and orchestration
- Implements deployment:
- Edge device configuration and provisioning
- Traffic policies and QoS configuration
- VPN backhauls to data centers
- Failover and redundancy
- Optimizes performance:
- Path optimization based on latency and loss
- Application prioritization (VoIP, video, data)
- Caching and compression
- 40% cost reduction with improved performance
- Examples
- Example 1: Multi-Region Cloud Network Design
- Scenario:
- Design a highly available, multi-region network for enterprise cloud infrastructure.
- Design Approach:
- Topology Architecture
-
- Hub-spoke model with transit gateways
- Regional Deployment
-
- 3 regions with multiple availability zones
- Hybrid Connectivity
-
- Direct Connect to on-premises data center
- Global Load Balancing
- Geographic routing and health-based failover Implementation:
VPC Configuration for Primary Region
resource "aws_vpc" "primary" { cidr_block = "10.0.0.0/16" enable_dns_hostnames = true enable_dns_support = true tags = { Name = "primary-vpc" Environment = "production" } }
Subnet Configuration
resource "aws_subnet" "public" { vpc_id = aws_vpc.primary.id cidr_block = "10.0.1.0/24" availability_zone = "us-east-1a" map_public_ip_on_launch = true }
Transit Gateway
- resource "aws_ec2_transit_gateway" "tgw" {
- description = "Primary transit gateway"
- default_route_table_association = "disable"
- default_route_table_propagation = "disable"
- }
- Performance Results:
- Metric
- Before
- After
- Regional Latency
- 80ms
- 25ms
- Availability
- 99.5%
- 99.99%
- Failover Time
- 5 min
- 30 sec
- Throughput
- 5 Gbps
- 20 Gbps
- Example 2: Zero-Trust Network Implementation
- Scenario:
- Implement zero-trust security across hybrid network infrastructure.
- Security Architecture:
- Micro-Segmentation
-
- Isolated security groups by application tier
- Identity-Based Access
-
- Integration with identity providers
- Encrypted Communication
-
- mTLS for all service-to-service
- Continuous Verification
-
- Real-time policy enforcement
- Implementation Components:
- East-west firewalls with allow-list policies
- Identity and access management integration
- Certificate authority and PKI management
- Network segmentation and isolation
- Security Results:
- 100% reduction in lateral movement attacks
- Zero unauthorized access incidents
- 99% reduction in attack surface
- Passed penetration test with zero critical findings
- Example 3: SD-WAN Enterprise Deployment
- Scenario:
- Deploy SD-WAN to replace legacy MPLS network across 50 sites.
- Deployment Approach:
- Site Assessment
-
- Evaluated connectivity requirements at each location
- Device Deployment
-
- Installed SD-WAN edge devices
- Traffic Policy
-
- Configured application-aware routing
- Optimization
-
- Implemented QoS and path selection
- Results:
- 40% reduction in network costs
- 60% improvement in application performance
- 99.9% network availability
- 50% reduction in troubleshooting time
- Best Practices
- Network Architecture
- Redundancy Design
-
- Plan for component failures at every level
- Segmented Design
-
- Isolate workloads and security zones
- Scalable IPAM
-
- Use consistent IP addressing scheme
- Documentation
-
- Maintain accurate network diagrams
- Security Implementation
- Zero-Trust
-
- Verify every request regardless of source
- Defense in Depth
-
- Multiple security layers
- Encryption
-
- Encrypt data in transit and at rest
- Regular Audits
-
- Periodic security assessments
- Performance Optimization
- Latency Reduction
-
- Optimize routing paths and caching
- Bandwidth Management
-
- Implement QoS policies
- Load Distribution
-
- Use load balancing effectively
- Monitoring
-
- Comprehensive visibility into network metrics
- Automation and IaC
- Infrastructure as Code
-
- Version control network configs
- Automated Testing
-
- Validate changes before deployment
- Deployment Templates
-
- Standardize configurations
- Monitoring Automation
-
- Alert on anomalies automatically
- Output Format
- This skill delivers:
- Complete network architecture designs and diagrams
- Infrastructure as code (Terraform, Ansible, CloudFormation)
- Network configurations (routers, switches, firewalls, load balancers)
- Security policies and firewall rulesets
- Monitoring dashboards and alert configurations
- DNS configurations and zone files
- VPN and SD-WAN configurations
- Troubleshooting runbooks and documentation
- All outputs include:
- Detailed network topology diagrams
- IP addressing schemes and routing tables
- Security group and firewall rule documentation
- Performance benchmarks and SLA validations
- Security compliance documentation
- Operational procedures and runbooks
- Capacity planning and growth recommendations
- Anti-Patterns
- Architecture Anti-Patterns
- Single Point of Failure
-
- Critical components without redundancy - implement HA at all layers
- Oversegmentation
-
- Too many VLANs without clear purpose - consolidate and simplify
- Flat Network
-
- No segmentation for security - implement defense in depth
- Spanning Tree Issues
-
- STP misconfiguration causing loops or blocking - use modern alternatives
- Security Anti-Patterns
- Open By Default
-
- Allowing all traffic by default - deny by default, explicitly allow
- Rule Creep
-
- Firewall rules accumulate without cleanup - regular rule review and optimization
- VPN Overuse
-
- VPN for everything instead of proper segmentation - use appropriate access methods
- Weak Cryptography
-
- Using outdated protocols and algorithms - enforce modern encryption standards
- Performance Anti-Patterns
- Suboptimal Routing
-
- Traffic taking inefficient paths - optimize routing tables and policies
- Lack of Caching
-
- Not leveraging CDN and caching - reduce latency with caching layers
- Oversubscribed Links
-
- Bandwidth not matching requirements - right-size and monitor utilization
- No QoS
-
- All traffic treated equally - implement traffic prioritization
- Operational Anti-Patterns
- Documentation Debt
-
- Network diagrams out of date - maintain documentation as code
- Configuration Drift
-
- Manual changes not tracked - use IaC for all changes
- No Monitoring
-
- Operating blind - implement comprehensive network monitoring
- Long Change Lead Times
- Slow change processes - automate and streamline deployments