Kubernetes Operations

Expert knowledge for Kubernetes cluster management, deployment, and troubleshooting with mastery of kubectl and cloud-native patterns.

Core Expertise

Kubernetes Operations

Workload Management

Deployments, StatefulSets, DaemonSets, Jobs, and CronJobs

Networking

Services, Ingress, NetworkPolicies, and DNS configuration

Configuration & Storage

ConfigMaps, Secrets, PersistentVolumes, and PersistentVolumeClaims

Troubleshooting

Debugging pods, analyzing logs, and inspecting cluster events

Cluster Operations Process

Manifest First

Always prefer declarative YAML manifests for resource management

Validate & Dry-Run

Use

kubectl apply --dry-run=client

to validate changes

Inspect & Verify

After applying changes, verify with

kubectl get

,

kubectl describe

,

kubectl logs

Monitor Health

Continuously check status of nodes, pods, and services
Clean Up: Ensure old or unused resources are properly garbage collected Essential Commands

Resource management

kubectl apply -f manifest.yaml kubectl get pods -A kubectl describe pod < pod-name

kubectl logs -f < pod-name

kubectl exec -it < pod-name

-- /bin/bash

Debugging

kubectl get events --sort-by

'.lastTimestamp' kubectl top nodes kubectl top pods --containers kubectl port-forward < pod-name

8080 :80

Deployment management

kubectl rollout status deployment/ < name

kubectl rollout history deployment/ < name

kubectl rollout undo deployment/ < name

Cluster inspection

kubectl cluster-info kubectl get nodes -o wide kubectl api-resources Key Debugging Patterns Pod Debugging

Pod inspection

kubectl describe pod < pod-name

kubectl get pod < pod-name

-o yaml kubectl logs < pod-name

--previous

Interactive debugging

kubectl exec -it < pod-name

-- /bin/bash kubectl debug < pod-name

-it --image = busybox kubectl port-forward < pod-name

8080 :80 Networking Troubleshooting

Service debugging

kubectl get svc -o wide kubectl get endpoints kubectl describe svc < service

Network connectivity

kubectl run test-pod --image = busybox -it --rm -- sh

Inside pod: nslookup, wget, nc commands

Common Issues

CrashLoopBackOff debugging

kubectl logs < pod

--previous kubectl describe pod < pod

kubectl get events --field-selector involvedObject.name = < pod

Resource constraints

kubectl top pod < pod

kubectl describe pod < pod

| grep -A 5 Limits

State management

kubectl state list kubectl state show < resource

Best Practices Context Safety (CRITICAL) Always specify --context explicitly in every kubectl command Never rely on the current context - it may have been changed by another process Use kubectl --context= get pods format for all operations This prevents accidental operations on the wrong cluster (e.g., running production commands against staging)

CORRECT: Explicit context

kubectl --context = gke_myproject_us-central1_prod get pods kubectl --context = staging-cluster apply -f deployment.yaml

WRONG: Relying on current context

kubectl get pods

Which cluster is this targeting?

Resource Definitions Use declarative YAML manifests Implement proper labels and selectors Define resource requests and limits Configure health checks (liveness/readiness probes) Security Use NetworkPolicies to restrict traffic Implement RBAC for access control Store sensitive data in Secrets Run containers as non-root users Monitoring Configure proper logging and metrics Set up alerts for critical conditions Use health checks and readiness probes Monitor resource usage and quotas Agentic Optimizations Context Command Pod status (structured) kubectl get pods -n -o json | jq '.items[] | {name:.metadata.name, status:.status.phase}' Quick overview kubectl get pods -n -o wide Events (compact) kubectl get events -n --sort-by='.lastTimestamp' -o json Resource details kubectl get -o json Logs (bounded) kubectl logs -n --tail=50 For detailed debugging commands, troubleshooting patterns, Helm workflows, and advanced K8s operations, see REFERENCE.md.

kubernetes-operations

安装