HolmesGPT Skill

AI-powered troubleshooting for Kubernetes and cloud-native environments.

Overview

HolmesGPT is a CNCF Sandbox project that connects AI models with live

observability data to investigate infrastructure problems, find root

causes, and suggest remediations. It operates with

read-only access

and respects RBAC permissions, making it safe for production environments.

Quick Reference

Topic

Reference

Installation

references/installation.md

Configuration

references/configuration.md

Data Sources

references/data-sources.md

Commands

references/commands.md

Troubleshooting

references/troubleshooting.md

HTTP API

references/http-api.md

Integrations

references/integrations.md

Key Features

Root Cause Analysis

Investigates alerts and cluster issues

Multi-Source Integration

30+ toolsets (K8s, Prometheus, Grafana)

Alert Integration

AlertManager, PagerDuty, OpsGenie, Jira, Slack

Interactive Mode

Troubleshooting with

/run

,

/show

,

/clear

Custom Toolsets

Extend with proprietary tools via YAML configuration
CI/CD Integration: Automated deployment failure investigation Installation Quick Start CLI (Homebrew) brew tap robusta-dev/homebrew-holmesgpt brew install holmesgpt export ANTHROPIC_API_KEY = "your-key"

or OPENAI_API_KEY

holmes ask "what pods are unhealthy?" Kubernetes (Helm) helm repo add robusta https://robusta-charts.storage.googleapis.com helm repo update helm install holmesgpt robusta/holmes -f values.yaml Docker docker run -it --net = host \ -e OPENAI_API_KEY = "your-key" \ -v ~/.kube/config:/root/.kube/config \ us-central1-docker.pkg.dev/genuine-flight-317411/devel/holmes \ ask "what pods are crashing?" Essential Commands

Basic investigation

holmes ask "what pods are unhealthy and why?" holmes ask "why is my deployment failing?"

Interactive mode

holmes ask "investigate issue" --interactive

Alert investigation

holmes investigate alertmanager --alertmanager-url http://localhost:9093 holmes investigate pagerduty --pagerduty-api-key < KEY

--update

With file context

holmes ask "summarize the key points" -f ./logs.txt

CI/CD integration

holmes ask "why did deployment fail?" --destination slack --slack-token < TOKEN

Supported AI Providers Provider Environment Variable Models Anthropic ANTHROPIC_API_KEY Sonnet 4, Opus 4.5 OpenAI OPENAI_API_KEY GPT-4.1, GPT-4o Azure OpenAI AZURE_API_KEY GPT-4.1 AWS Bedrock AWS credentials Claude 3.5 Sonnet Google Gemini GEMINI_API_KEY Gemini 1.5 Pro Vertex AI VERTEXAI_PROJECT Gemini 1.5 Pro Ollama Local install Llama 3.1, Mistral Basic Helm Values Structure

values.yaml for Kubernetes deployment

image : repository : robustadev/holmes tag : latest env : - name : ANTHROPIC_API_KEY valueFrom : secretKeyRef : name : holmesgpt - secrets key : anthropic - api - key

Model configuration

modelList : sonnet : api_key : "{{ env.ANTHROPIC_API_KEY }}" model : anthropic/claude - sonnet - 4 - 20250514 temperature : 0

Toolsets to enable

toolsets : kubernetes/core : enabled : true kubernetes/logs : enabled : true prometheus/metrics : enabled : true

Resources

resources : requests : memory : "1024Mi" cpu : "100m" limits : memory : "1024Mi"

RBAC (read-only by default)

createServiceAccount : true Interactive Mode Commands Command Description /clear Reset context when changing topics /run Execute custom commands and share output with AI /show Display complete tool outputs /context Review accumulated investigation information Custom Toolset Example

custom-toolset.yaml

toolsets : my-custom-tool : description : "Custom diagnostic tool" tools : - name : check_service_health description : "Check health of a specific service" command : | curl -s http://{{ service_name }}.{{ namespace }}.svc.cluster.local/health parameters : - name : service_name description : "Name of the service" - name : namespace description : "Kubernetes namespace" Use with: holmes ask "check health" -t custom-toolset.yaml Kubernetes Annotations for Integration

Add to Services/Deployments for HolmesGPT context

metadata

:

annotations

:

holmesgpt.dev/runbook

:

|

This service handles payment processing.

Common issues: database connectivity, API rate limits.

Check: kubectl logs -l app=payment-service

Environment Variables Reference

Variable

Description

Default

HOLMES_CONFIG_PATH

Config file path

~/.holmes/config.yaml

HOLMES_LOG_LEVEL

Log verbosity

INFO

PROMETHEUS_URL

Prometheus server URL

-

GITHUB_TOKEN

GitHub API token

-

DATADOG_API_KEY

DataDog API key

-

CONFLUENCE_BASE_URL

Confluence URL

-

Best Practices

Use Specific Queries

Include namespace, deployment name, symptoms

Start with Claude Sonnet 4.0/4.5

Best accuracy for complex investigations

Enable Relevant Toolsets

Only enable what you need to reduce noise

Use Interactive Mode

For complex multi-step investigations

Set Up Runbooks

Provide context for known alert types
CI/CD Integration: Automate deployment failure analysis Security Considerations HolmesGPT uses read-only access ( get , list , watch only) Respects existing RBAC permissions Never modifies, creates, or deletes resources API keys stored in Kubernetes Secrets Data not used for model training Official Resources Documentation: https://holmesgpt.dev/ GitHub: https://github.com/robusta-dev/holmesgpt Helm Chart: https://github.com/robusta-dev/holmesgpt/tree/master/helm/holmes Slack Community: Cloud Native Slack

holmesgpt-skill

安装

or OPENAI_API_KEY

Basic investigation

Interactive mode

Alert investigation

With file context

CI/CD integration

values.yaml for Kubernetes deployment

Model configuration

Toolsets to enable

Resources

RBAC (read-only by default)

custom-toolset.yaml

Add to Services/Deployments for HolmesGPT context