production-troubleshooting

安装量: 40
排名: #17863

安装

npx skills add https://github.com/blogic-cz/blogic-marketplace --skill production-troubleshooting

Production Troubleshooting Overview Diagnose performance issues and errors in production/test environments using systematic investigation workflows with Sentry, kubectl, and Helm configuration analysis. When to Use This Skill Use this skill when: User reports performance issues on test/production (not localhost) Need to investigate slow queries or high latency Debugging pod crashes or resource throttling Analyzing Sentry traces for errors Checking Kubernetes resource limits and configurations Investigation Workflow Follow these steps in order when troubleshooting production issues: Step 1: Check Sentry Traces Start with Sentry to identify slow queries and external API latency patterns. Using Sentry MCP: Search for traces related to the reported issue Look for slow database queries (>500ms) Check external API call latency Identify error patterns and stack traces What to look for: Database query times exceeding 500ms External API calls with high latency Repeated error patterns Performance degradation trends Step 2: Review Application Logs Examine kubectl logs for timing information and error patterns. Using agent-tools-k8s: agent-tools-k8s logs --pod < pod-name

--env < env

--tail 200 Key log patterns to search for: [Server] - Server startup and initialization timing [SSR] - Server-side rendering timing [tRPC] - TRPC query execution timing [DB Pool] - Database connection pool status ERROR or WARN - Application errors and warnings Common issues: Sequential API calls instead of parallel (Promise.all) Long DB connection acquisition times Slow SSR rendering Step 3: Check Pod Resource Usage Verify CPU and memory usage to detect throttling. Using agent-tools-k8s: agent-tools-k8s top --env < env

Warning signs: CPU usage >70% indicates potential throttling Memory usage >80% indicates potential OOM issues Consistent high utilization suggests under-provisioning Step 4: Review Pod Configuration Check resource limits and Helm values to identify misconfigurations. Using kubectl: kubectl get pod < pod-name

-n < namespace

-o yaml Key sections to check: resources.limits.cpu and resources.limits.memory resources.requests.cpu and resources.requests.memory Environment variables configuration Image version and tags Helm values locations: web-app: /kubernetes/helm/web-app/values.{test,prod}.yaml Reference references/helm-values-locations.md for detailed Helm configuration structure. Common Causes & Solutions CPU/Memory Throttling Symptom: High CPU/memory usage (>70-80%) Solution: Increase resource limits in Helm values Network Latency Symptom: Slow external API calls, DNS resolution delays Solution: Check network policies, verify DNS configuration, consider retry logic Database Connection Pool Issues Symptom: [DB Pool] errors, slow connection acquisition Solution: Review idleTimeoutMillis and pool size configuration Sequential API Calls Symptom: Multiple API calls taking cumulative time Solution: Refactor to use Promise.all() for parallel execution Resources kubectl commands Common kubectl operations (use via agent-tools-k8s ): agent-tools-k8s logs --pod --env --tail 200 - Extract and filter pod logs agent-tools-k8s top --env - Show CPU/memory usage for pods agent-tools-k8s describe --resource pod --name --env - Check resource limits and pod configuration agent-tools-k8s kubectl --env --cmd "get pods" - Raw kubectl for anything else references/ helm-values-locations.md - Detailed guide to Helm values file structure and locations common-issues.md - Catalog of common production issues and solutions

返回排行榜