datadog-observability

安装量: 67
排名: #11360

安装

npx skills add https://github.com/bobmatnyc/claude-mpm-skills --skill datadog-observability

Datadog is a SaaS observability platform providing unified monitoring across infrastructure, applications, logs, and user experience. It offers AI-powered anomaly detection, 1000+ integrations, and OpenTelemetry compatibility.

Core Capabilities:

  • APM: Distributed tracing with automatic instrumentation for 8+ languages

  • Infrastructure: Host, container, and cloud service monitoring

  • Logs: Centralized collection with processing pipelines and 15-month retention

  • Metrics: Custom metrics via DogStatsD with cardinality management

  • Synthetics: Proactive API and browser testing from 29+ global locations

  • RUM: Frontend performance with Core Web Vitals and session replay

When to Use This Skill

Activate when:

  • Setting up production monitoring and observability

  • Implementing distributed tracing across microservices

  • Configuring log aggregation and analysis pipelines

  • Creating custom metrics and dashboards

  • Setting up alerting and anomaly detection

  • Optimizing Datadog costs

Do not use when:

  • Building with open-source stack (use Prometheus/Grafana instead)

  • Cost is primary concern and budget is limited

  • Need maximum customization over managed solution

Quick Start

1. Install Datadog Agent

Docker (simplest):

docker run -d --name dd-agent \
  -e DD_API_KEY=<YOUR_API_KEY> \
  -e DD_SITE="datadoghq.com" \
  -v /var/run/docker.sock:/var/run/docker.sock:ro \
  -v /proc/:/host/proc/:ro \
  -v /sys/fs/cgroup/:/host/sys/fs/cgroup:ro \
  gcr.io/datadoghq/agent:7

Kubernetes (Helm):

helm repo add datadog https://helm.datadoghq.com
helm install datadog-agent datadog/datadog \
  --set datadog.apiKey=<YOUR_API_KEY> \
  --set datadog.apm.enabled=true \
  --set datadog.logs.enabled=true

2. Instrument Your Application

Python:

from ddtrace import tracer, patch_all

# Automatic instrumentation for common libraries
patch_all()

# Manual span for custom operations
with tracer.trace("custom.operation", service="my-service") as span:
    span.set_tag("user.id", user_id)
    # your code here

Node.js:

// Must be first import
const tracer = require('dd-trace').init({
  service: 'my-service',
  env: 'production',
  version: '1.0.0',
});

3. Verify in Datadog UI

  • Go to Infrastructure > Host Map to verify agent

  • Go to APM > Services to see traced services

  • Go to Logs > Search to verify log collection

Core Concepts

Tagging Strategy

Tags enable filtering, aggregation, and cost attribution. Use consistent tags across all telemetry.

Required Tags:

| env | Environment | env:production

| service | Service name | service:api-gateway

| version | Deployment version | version:1.2.3

| team | Owning team | team:platform

Avoid High-Cardinality Tags:

  • User IDs, request IDs, timestamps

  • Pod IDs in Kubernetes

  • Build numbers, commit hashes

Unified Observability

Datadog correlates metrics, traces, and logs automatically:

  • Traces include span tags that link to metrics

  • Logs inject trace IDs for correlation

  • Dashboards combine all data sources

Best Practices

Start Simple

  • Install Agent with basic configuration

  • Enable automatic instrumentation

  • Verify data in Datadog UI

  • Add custom spans/metrics as needed

Progressive Enhancement

Basic → APM tracing → Custom spans → Custom metrics → Profiling → RUM

Key Instrumentation Points

  • HTTP entry/exit points

  • Database queries

  • External service calls

  • Message queue operations

  • Business-critical flows

Common Mistakes

  • High-cardinality tags: Using user IDs or request IDs as tags creates millions of unique metrics

  • Missing log index quotas: Leads to unexpected bills from log volume spikes

  • Over-alerting: Creates alert fatigue; alert on symptoms, not causes

  • Missing service tags: Prevents correlation between metrics, traces, and logs

  • No sampling for high-volume traces: Ingests everything, causing cost explosion

For detailed implementation:

Complementary Skills

When using this skill, consider these related skills (if deployed):

  • docker: Container instrumentation patterns

  • kubernetes: K8s-native monitoring patterns

  • python/nodejs/go: Language-specific APM setup

Resources

Official Documentation:

Cost Management:

返回排行榜