DevOps Automation
Automate DevOps workflows including CI/CD pipelines, monitoring, incident management, and infrastructure operations. Based on n8n's IT Ops workflow templates.
Overview
This skill covers:
CI/CD pipeline automation
Monitoring and alerting
Incident management
Infrastructure automation
Deployment workflows
CI/CD Automation
GitHub Actions Integration
workflow
:
"GitHub CI/CD Notifications"
triggers
:
-
github_push
-
github_pull_request
-
github_workflow_run
on_push
:
action
:
-
trigger_ci
:
if_main_branch
-
notify_slack
:
channel
:
"#deployments"
message
:
|
📦 New Push to {branch}
Commit
:
`
{
commit_sha_short
}
`
Author
:
{
author
}
Message
:
{
commit_message
}
[
View Diff
]
(
{
compare_url
}
)
on_pr_opened
:
action
:
-
notify_slack
:
channel
:
"#code-review"
message
:
|
🔀 New Pull Request
Title
:
{
pr_title
}
Author
:
{
author
}
Branch
:
{
head
}
→
{
base
}
[
Review PR
]
(
{
pr_url
}
)
-
assign_reviewers
:
based_on_codeowners
-
run_ci_checks
on_workflow_complete
:
action
:
-
notify_slack
:
message
:
|
{status_emoji} Build {status}
Workflow
:
{
workflow_name
}
Branch
:
{
branch
}
Duration
:
{
duration
}
{
if_failed
:
[
View Logs
]
(
{
logs_url
}
)
}
Deployment Pipeline
deployment_pipeline
:
stages
:
build
:
trigger
:
push_to_main
steps
:
-
checkout_code
-
install_dependencies
-
run_tests
-
build_artifact
-
push_to_registry
staging
:
trigger
:
build_success
steps
:
-
deploy_to_staging
-
run_integration_tests
-
notify_qa
production
:
trigger
:
manual_approval
steps
:
-
create_backup
-
deploy_to_production
-
run_smoke_tests
-
notify_team
rollback
:
trigger
:
deployment_failed OR manual
steps
:
-
revert_to_previous
-
notify_team
-
create_incident
Monitoring & Alerting
Alert Routing
alert_routing
:
sources
:
-
prometheus
-
datadog
-
cloudwatch
-
new_relic
severity_levels
:
critical
:
response_time
:
5_minutes
channels
:
[
pagerduty
,
slack_urgent
,
sms
]
escalation
:
immediate
high
:
response_time
:
15_minutes
channels
:
[
slack_alerts
,
email
]
escalation
:
after_15_minutes
medium
:
response_time
:
1_hour
channels
:
[
slack_alerts
]
low
:
response_time
:
24_hours
channels
:
[
slack_logging
]
routing_rules
:
-
if
:
service == "payments"
team
:
payments_oncall
severity_boost
:
+1
-
if
:
service == "auth"
team
:
security_oncall
-
default
:
team
:
platform_oncall
Alert Templates
alert_templates
:
infrastructure
:
cpu_high
:
title
:
"🔥 High CPU Usage"
body
:
|
Server:
CPU: {cpu_percent}%
Duration:
Threshold
:
{
threshold
}
%
[
View Dashboard
]
(
{
grafana_url
}
)
memory_critical
:
title
:
"💾 Critical Memory"
body
:
|
Server:
Memory: {memory_percent}%
Available: {available_mb}MB
[
SSH to Server
]
(
{
ssh_link
}
)
disk_full
:
title
:
"💿 Disk Space Critical"
body
:
|
Server:
Disk: {disk_percent}%
Available: {available_gb}GB
Suggestion
:
Clean logs or expand volume
application
:
error_spike
:
title
:
"📈 Error Rate Spike"
body
:
|
Service:
Error Rate: {error_rate}%
Normal: {baseline}%
Top Errors
:
{
top_errors
}
latency_high
:
title
:
"🐢 High Latency"
body
:
|
Service:
P99 Latency: {p99_ms}ms
Threshold: {threshold_ms}ms
Incident Management
Incident Workflow
incident_workflow
:
detection
:
sources
:
[
monitoring
,
user_report
,
automated_check
]
triage
:
auto_severity
:
-
if
:
affects_payments
severity
:
critical
-
if
:
affects_auth
severity
:
critical
-
if
:
affects_api AND error_rate
>
10%
severity
:
high
response
:
critical
:
-
create_incident_channel
:
"#inc-{timestamp}"
-
page_oncall
:
immediately
-
notify_stakeholders
:
[
engineering_lead
,
product
]
-
start_war_room
:
zoom_link
-
create_status_page
:
incident
high
:
-
create_incident_channel
-
notify_oncall
:
slack
-
create_ticket
:
jira
communication
:
internal
:
frequency
:
every_30_minutes
channel
:
incident_channel
template
:
|
📊 Incident Update
Status
:
{
status
}
Impact
:
{
impact
}
Next update
:
{
next_update_time
}
Current actions
:
{
action_items
}
external
:
channel
:
status_page
template
:
customer_facing_update
resolution
:
steps
:
-
confirm_resolution
-
update_status_page
:
resolved
-
notify_stakeholders
-
schedule_postmortem
-
close_incident_channel
:
after_24h
Postmortem Template
postmortem_template
:
sections
:
summary
:
-
incident_title
-
duration
-
severity
-
impact
timeline
:
format
:
|
| Time | Event |
|------|-------|
| {time} | {event} |
root_cause
:
-
what_happened
-
why_it_happened
-
contributing_factors
impact
:
-
users_affected
-
revenue_impact
-
sla_breach
resolution
:
-
how_it_was_fixed
-
time_to_detect
-
time_to_resolve
action_items
:
format
:
|
| Action | Owner | Due Date | Status |
|--------|-------|----------|--------|
lessons_learned
:
-
what_went_well
-
what_went_poorly
-
lucky_breaks
Infrastructure Automation
Server Provisioning
provisioning_workflow
:
trigger
:
jira_ticket OR slack_request
steps
:
1. validate_request
:
check
:
[
budget_approval
,
security_review
]
2. create_infrastructure
:
terraform
:
-
vpc
-
security_groups
-
ec2_instances
-
load_balancer
3. configure_server
:
ansible
:
-
base_configuration
-
security_hardening
-
monitoring_agent
-
application_setup
4. validate
:
-
health_check
-
security_scan
-
performance_baseline
5. notify
:
slack
:
"✅ Server {hostname} is ready"
include
:
[
ssh_access
,
dashboard_link
]
Scheduled Maintenance
maintenance_automation
:
tasks
:
certificate_renewal
:
schedule
:
"30 days before expiry"
action
:
-
request_new_cert
:
letsencrypt
-
deploy_cert
-
verify_ssl
-
notify
:
if_failure
security_patching
:
schedule
:
"weekly"
action
:
-
check_updates
-
if_critical
:
immediate_patch
-
else
:
schedule_maintenance_window
log_rotation
:
schedule
:
"daily"
action
:
-
rotate_logs
-
compress_old
-
upload_to_s3
-
delete_local
:
older_than_7_days
backup_verification
:
schedule
:
"weekly"
action
:
-
restore_to_test_env
-
run_integrity_checks
-
report_status
Kubernetes Automation
K8s Workflows
kubernetes_automation
:
deployment
:
trigger
:
docker_image_pushed
steps
:
-
update_manifest
:
with_new_image_tag
-
apply_to_staging
-
run_tests
-
if_success
:
apply_to_production
scaling
:
trigger
:
metric_threshold
rules
:
-
if
:
cpu
>
80%
action
:
scale_up
max_replicas
:
10
-
if
:
cpu < 20%
action
:
scale_down
min_replicas
:
2
rollback
:
trigger
:
health_check_failed
action
:
-
kubectl_rollout_undo
-
notify_team
-
create_incident
Output Example
Request: "Set up deployment notifications for GitHub Actions" Output :

GitHub Actions Deployment Notifications

n8n Workflow ```yaml trigger: GitHub Webhook events: [workflow_run] Notification Templates Build Started: 🚀 Deployment Started Branch: main Commit: abc1234 Author: @developer Triggered by: Push View Workflow Build Success: ✅ Deployment Successful Environment: Production Duration: 3m 42s Version: v1.2.3 Changes: • Feature X • Bug fix Y View Deployment Build Failed: ❌ Deployment Failed Stage: Test Error: npm test failed View Logs Retry Slack Integration channel : "#deployments" mention_on_failure : "@oncall" thread_replies : true

DevOps Automation Skill - Part of Claude Office Skills

devops-automation

安装