scalability-playbook

安装量: 42
排名: #17409

安装

npx skills add https://github.com/patricio0312rev/skills --skill scalability-playbook

Scalability Playbook

Systematic approach to identifying and resolving scalability bottlenecks.

Bottleneck Analysis Current System Profile Traffic: 1,000 req/min Users: 10,000 active Data: 100GB database Response time: p95 = 500ms

Identified Bottlenecks 1. Database Queries

Symptom: Slow page loads (2-3s) Measurement: Query time p95 = 800ms Impact: HIGH - affects all reads Trigger: When p95 >500ms

  1. Single Server

Symptom: High CPU (>80%) Measurement: Load average >4 Impact: MEDIUM - intermittent slowdowns Trigger: When CPU >70%

  1. No Caching

Symptom: Repeated DB queries Measurement: Cache hit rate = 0% Impact: MEDIUM - unnecessary load Trigger: When query volume >10k/min

Scaling Strategies (Ordered) Level 1: Quick Wins (Days) 1.1 Add Database Indexes

Problem: Slow queries Solution:

CREATE INDEX idx_users_email ON users(email); CREATE INDEX idx_orders_user_created ON orders(user_id, created_at);

Expected Impact: 80% faster queries Cost: $0 Effort: 1 day

1.2 Enable Query Caching

Problem: Repeated queries Solution: Redis cache layer

const cached = await redis.get(user:${userId}); if (cached) return JSON.parse(cached);

const user = await db.users.findById(userId); await redis.setex(user:${userId}, 3600, JSON.stringify(user));

Expected Impact: 60% reduction in DB load Cost: $50/month Effort: 2 days

Level 2: Horizontal Scaling (Weeks) 2.1 Add Read Replicas

Problem: Read-heavy workload Solution: Route reads to replicas

Write Load: Primary DB Read Load: 3x Read Replicas

Expected Impact: 3x read capacity Cost: $300/month Effort: 1 week

2.2 Load Balancer + Multiple Servers

Problem: Single point of failure Solution:

ALB ├── Server 1 ├── Server 2 └── Server 3

Expected Impact: 3x throughput Cost: $400/month Effort: 1 week

Level 3: Architecture Changes (Months) 3.1 CDN for Static Assets

Problem: Slow asset delivery Solution: CloudFront CDN Expected Impact: 90% faster asset loads Cost: $100/month Effort: 1 week

3.2 Async Processing

Problem: Slow sync operations Solution: Background job queues

// Before: Sync await sendEmail(user); await processPayment(order); await updateAnalytics(event); return response; // Waits 5+ seconds

// After: Async await queue.add("send-email", { userId }); await queue.add("process-payment", { orderId }); await queue.add("update-analytics", { event }); return response; // Returns immediately

Expected Impact: 80% faster responses Cost: $50/month (SQS) Effort: 2 weeks

Level 4: Data Layer Optimization (Months) 4.1 Database Sharding

Problem: Single DB too large Solution: Shard by user_id

Shard 1: user_id 0-24999 Shard 2: user_id 25000-49999 Shard 3: user_id 50000-74999 Shard 4: user_id 75000-99999

Expected Impact: 4x capacity Cost: $1,200/month Effort: 2 months

4.2 Event-Driven Architecture

Problem: Tight coupling, cascading failures Solution: Message broker (Kafka)

Service A → Kafka → Service B ↘ ↗ Service C

Expected Impact: Better isolation, resilience Cost: $500/month Effort: 3 months

Scaling Triggers | Metric | Current | Warning | Critical | Action |

| ---------------- | ------- | ------- | -------- | ----------------------- |

| CPU | 40% | 70% | 85% | Add servers |

| Memory | 50% | 75% | 90% | Upgrade instances |

| DB Connections | 20 | 40 | 50 | Add read replicas |

| Query Time (p95) | 200ms | 500ms | 1000ms | Add indexes |

| Queue Depth | 100 | 1000 | 5000 | Add workers |

| Error Rate | 0.1% | 1% | 5% | Investigate immediately |

Phased Scaling Plan Phase 1: Current → 10x (0-3 months)

Target: 10,000 req/min, 100K users

Actions:

Add database indexes (Week 1) Implement Redis caching (Week 2) Add 3x read replicas (Week 4) Horizontal scale app servers (Week 6) CDN for static assets (Week 8)

Cost: $500 → $1,000/month

Phase 2: 10x → 100x (3-12 months)

Target: 100,000 req/min, 1M users

Actions:

Database sharding (Month 4-6) Multi-region deployment (Month 6-8) Microservices extraction (Month 8-12) Event-driven architecture (Month 10-12)

Cost: $1,000 → $10,000/month

Phase 3: 100x → 1000x (12-24 months)

Target: 1M req/min, 10M users

Actions:

Global CDN (Month 13) Advanced caching (L1/L2) (Month 14-15) Custom DB solutions (Month 16-18) Edge computing (Month 18-20)

Cost: $10,000 → $100,000/month

Load Testing Plan

Current baseline

hey -n 10000 -c 100 https://api.example.com/users

Target 10x

hey -n 100000 -c 1000 https://api.example.com/users

Measure:

- Requests/sec

- p50, p95, p99 latency

- Error rate

- Resource utilization

Cost-Benefit Analysis | Strategy | Cost/Month | Expected Impact | ROI | Priority |

| ------------- | ---------- | ------------------ | --- | -------- |

| DB Indexes | $0 | 80% faster queries | ∞ | HIGH |

| Redis Cache | $50 | 60% less DB load | 12x | HIGH |

| Read Replicas | $300 | 3x capacity | 10x | MEDIUM |

| Load Balancer | $400 | 3x throughput | 7x | MEDIUM |

| DB Sharding | $1,200 | 4x capacity | 3x | LOW |

Best Practices Measure first: Don't optimize blindly Low-hanging fruit: Start with easy wins Load test: Validate before production Monitor continuously: Set up alerts Plan ahead: Scale before hitting limits Cost-conscious: ROI-driven decisions Incremental: Small, safe changes Output Checklist Current system profile Bottlenecks identified and measured Scaling strategies ordered by effort Triggers defined for each action Phased plan (1x → 10x → 100x) Cost estimates per phase Load testing plan Monitoring dashboard Rollback procedures

返回排行榜