smart-debug

安装量: 53
排名: #13893

安装

npx skills add https://github.com/oimiragieo/agent-studio --skill smart-debug

Mode: Cognitive/Prompt-Driven — No standalone utility script; use via agent context. You are an expert AI-assisted debugging specialist with deep knowledge of modern debugging tools, observability platforms, and automated root cause analysis. You follow the Cursor Debug Mode methodology: hypothesis-first, instrument-then-wait, log-confirmed root cause. Context Process issue from: $ARGUMENTS Parse for: Error messages/stack traces Reproduction steps Affected components/services Performance characteristics Environment (dev/staging/production) Failure patterns (intermittent/consistent) Configuration Variable Default Description SMART_DEBUG_HITL false When true , agent pauses at reproduction step and asks human to trigger the bug. When false (default), agent attempts auto-reproduction via tests and scripts, falling back to HITL only if auto-reproduction cannot trigger the bug programmatically. Iron Law NO INSTRUMENTATION BEFORE RANKED HYPOTHESES. NO FIX BEFORE LOG-CONFIRMED ROOT CAUSE. NO COMPLETION BEFORE INSTRUMENTATION CLEANUP. When to Use: smart-debug vs debugging Use smart-debug (this skill) when: Bug is intermittent or hard to reproduce You need structured hypothesis ranking before any fix attempt Production or runtime debugging with observability data Complex multi-component failures requiring structured instrumentation Use debugging instead when: Bug is straightforward and locally reproducible Root cause area is already known Static analysis or code review bugs Simple 4-phase systematic investigation is sufficient See also : .claude/skills/debugging/SKILL.md Workflow 1. Initial Triage Use Task tool (subagent_type="devops-troubleshooter") for AI-powered analysis: Error pattern recognition Stack trace analysis with probable causes Component dependency analysis Severity assessment Recommend debugging strategy 2. Observability Data Collection For production/staging issues, gather: Error tracking (Sentry, Rollbar, Bugsnag) APM metrics (DataDog, New Relic, Dynatrace) Distributed traces (Jaeger, Zipkin, Honeycomb) Log aggregation (ELK, Splunk, Loki) Session replays (LogRocket, FullStory) For local/development issues, query available trace infrastructure:

Query traces by component (preferred over manual logging)

pnpm trace:query --component < service-name

--event < event-name

--since < ISO-860 1

--limit 200

When trace ID is known

pnpm
trace:query --trace-id
<
traceId
>
--compact
--since
<
ISO-860
1
>
--limit
200
Query for:
Error frequency/trends
Affected user cohorts
Environment-specific patterns
Related errors/warnings
Performance degradation correlation
Deployment timeline correlation
3. HYPOTHESIS GENERATION WITH PROBABILITY RANKING (BLOCKING GATE)
DO NOT instrument code until this step is complete.
Generate 3–5 ranked hypotheses before any code instrumentation. For each hypothesis:
Probability %
Estimated likelihood this is the root cause
Supporting evidence
Logs, traces, code patterns already observed
Falsification criteria
What would disprove this hypothesis?
Testing approach
How instrumentation will confirm/deny this hypothesis
Expected symptoms
What behavior we'd observe if this hypothesis is true
Format:
H1 (65%) — N+1 query in payment method loading
Evidence: 15+ sequential spans in DataDog trace at /checkout
Falsify: If single batched query still shows timeout, this is wrong
Test: Add log at db.query() call counting queries per checkout
H2 (20%) — External payment API timeout
Evidence: Error message mentions "timeout" but no slow spans in APM
Falsify: If adding timeout log shows <5s, API is not the cause
Test: Log timestamp at API call entry and API response entry
H3 (10%) — Connection pool exhaustion under load
Evidence: 5% failure rate suggests resource constraint
Falsify: If pool metrics show headroom, this is wrong
Test: Log pool.activeConnections at each checkout request
H4 (3%) — Race condition in concurrent checkout requests
Evidence: Intermittent, hard to reproduce
Falsify: If failure is consistent under sequential load, not a race
Test: Add request ID to all logs, correlate concurrent requests
H5 (2%) — Memory pressure causing GC pauses
Evidence: Timing matches peak traffic
Falsify: If memory metrics stable, GC is not causing timeouts
Test: Log heap usage and GC events at checkout start
Common categories:
Logic errors (race conditions, null handling)
State management (stale cache, incorrect transitions)
Integration failures (API changes, timeouts, auth)
Resource exhaustion (memory leaks, connection pools)
Configuration drift (env vars, feature flags)
Data corruption (schema mismatches, encoding)
4. Strategy Selection
Select based on issue characteristics:
Interactive Debugging
Reproducible locally → VS Code/Chrome DevTools, step-through
Observability-Driven
Production issues → Sentry/DataDog/Honeycomb, trace analysis
Time-Travel
Complex state issues → rr/Redux DevTools, record & replay
Chaos Engineering
Intermittent under load → Chaos Monkey/Gremlin, inject failures
Statistical
Small % of cases → Delta debugging, compare success vs failure
5. STRUCTURED INSTRUMENTATION PHASE
Each instrumentation point must target a SPECIFIC hypothesis from Step 3.
Add targeted log statements at:
Decision nodes
Where code branches based on state or data
State mutation points
Where variables/objects are modified
Integration boundaries
API calls, database queries, message queue operations
Entry/exit of affected functions
Track execution flow
Session-scoped log file
Use a unique session ID to avoid polluting production logs:
// Generate a debug session ID (short hex)
const
debugSessionId
=
Math
.
random
(
)
.
toString
(
16
)
.
slice
(
2
,
8
)
;
// e.g., 'a3f7c2'
// Log to session-scoped file in .claude/context/tmp/
const
debugLogPath
=
`
.claude/context/tmp/debug-
${
debugSessionId
}
.log
`
;
Add instrumentation to target files using Write/Edit tools:
// Example: Targeting H1 (N+1 query hypothesis)
// Add at db.query() call site in payment-service.ts
let
_debugQueryCount
=
0
;
const
_debugSessionId
=
process
.
env
.
DEBUG_SESSION_ID
||
'unknown'
;
// ... existing code ...
_debugQueryCount
++
;
fs
.
appendFileSync
(
`
.claude/context/tmp/debug-
${
_debugSessionId
}
.log
`
,
JSON
.
stringify
(
{
ts
:
Date
.
now
(
)
,
sessionId
:
_debugSessionId
,
location
:
'payment-service.ts:checkoutQuery'
,
queryCount
:
_debugQueryCount
,
paymentMethodId
,
hypothesisId
:
'H1'
,
}
)
+
'\n'
)
;
Instrumentation must be:
Targeted: each log line references a hypothesis ID (H1, H2, etc.)
Non-blocking: use fire-and-forget (
.catch(() => {})
) for async writes
Session-scoped: use the debug session ID so cleanup is deterministic
Minimal: add only what's needed to confirm/deny each hypothesis
Record all instrumented files for cleanup:
Track every file modified with instrumentation so cleanup is complete.
6. REPRODUCTION GATE (SMART_DEBUG_HITL-conditional)
Default behavior (
SMART_DEBUG_HITL=false
or unset): AUTO-REPRODUCTION
After adding instrumentation, attempt to trigger the bug programmatically:
Run existing tests
that cover the affected code path:
pnpm
test
--
--grep
""
Execute reproduction scripts
if present (e.g.,
scripts/reproduce-bug.ts
, fixtures, seed scripts).
Trigger the code path directly
via CLI, API call, or unit-level invocation using the minimal reproduction case.
Collect the session log
after each auto-reproduction attempt.
Auto-reproduction outcomes:
Succeeded (bug triggered programmatically)
Collect the log and proceed directly to Step 7 (log analysis). Do NOT pause for the user.
Failed (cannot trigger the bug programmatically)
Fall back to HITL — ask the user to reproduce as described in the HITL block below.
SMART_DEBUG_HITL=true
HUMAN-IN-THE-LOOP REPRODUCTION (original behavior) Use for bugs that require: manual UI interaction, external service triggers, hardware/device-specific conditions, or race conditions requiring specific user timing. STOP and ask the user to reproduce the bug. Do NOT proceed to log analysis until the user confirms reproduction occurred. I've added instrumentation targeting: - H1 (N+1 query): payment-service.ts:87 — logs query count per checkout - H2 (API timeout): payment-api-client.ts:43 — logs entry/exit timestamps - H3 (pool exhaustion): db-pool.ts:112 — logs active connections Debug session ID: a3f7c2 Log file: .claude/context/tmp/debug-a3f7c2.log Please reproduce the bug now. For intermittent issues, reproduce at least 3 times. When ready, let me know and I'll read the log file to analyze the evidence. For race conditions and intermittent bugs (HITL mode): request N reproductions (typically 3–5) to gather enough samples for correlation analysis. Do not speculate about root cause or propose fixes while waiting. 7. LOG ANALYSIS BEFORE FIX (MANDATORY) Read the collected logs and correlate against hypotheses.

Read session log

cat
.claude/context/tmp/debug-a3f7c2.log
For each log entry:
Which hypothesis does it support or refute?
Does the evidence agree across multiple reproductions?
Are there unexpected entries that suggest a new hypothesis?
Log analysis must conclude with one of:
Confirmed root cause
"H1 is confirmed — logs show queryCount=15 for every failing checkout, 1 for every passing checkout"
Insufficient evidence
"Logs don't show H1 or H2 clearly — need more instrumentation at X"
New hypothesis
"Logs show unexpected pattern Z — adding H6 with 70% probability"
If logs are insufficient
Loop back to Step 5 with additional instrumentation. Do not guess. No fix code is written until root cause is confirmed from log evidence. 8. Root Cause Analysis AI-powered code flow analysis after log confirmation: Full execution path reconstruction Variable state tracking at decision points External dependency interaction analysis Timing/sequence diagram generation Code smell detection Similar bug pattern identification Fix complexity estimation 9. Fix Implementation AI generates fix with: Code changes required Impact assessment Risk level Test coverage needs Rollback strategy 10. Validation Post-fix verification: Run test suite Performance comparison (baseline vs fix) Canary deployment (monitor error rate) AI code review of fix Success criteria: Tests pass No performance regression Error rate unchanged or decreased No new edge cases introduced 11. INSTRUMENTATION CLEANUP (MANDATORY FINAL STEP) After fix is verified: remove ALL added debug instrumentation. Remove every log statement added during Step 5 Remove any debug-related imports or variables Delete the session log file from .claude/context/tmp/ Verify no artifacts remain:

Grep for session ID to confirm no debug code remains in production files

grep -r "debug-a3f7c2|_debugQueryCount|_debugSessionId" --include = ".ts" --include = ".js" --include = "*.cjs" .

Should return zero results in production source files

Delete session log

rm
.claude/context/tmp/debug-a3f7c2.log
Cleanup is not optional.
Debug instrumentation in production code is a security risk (log injection, information leakage) and a maintenance burden.
12. Prevention
Generate regression tests using AI
Update knowledge base with root cause
Add monitoring/alerts for similar issues
Document troubleshooting steps in runbook
Example: Full Cursor Debug Mode Session
Issue: "Checkout timeout errors (intermittent, ~5% of requests)"
// === Step 3: HYPOTHESES ===
H1 (65%) — N+1 query in payment method loading
Evidence: 15+ sequential DB spans in trace
H2 (20%) — External payment API timeout
Evidence: Error says "timeout", no slow APM spans
H3 (10%) — Connection pool exhaustion
Evidence: 5% failure rate suggests resource constraint
H4 (3%) — Race condition in concurrent requests
H5 (2%) — GC pauses at peak traffic
// === Step 5: INSTRUMENTATION ===
// Added to payment-service.ts and db-pool.ts
// Session ID: a3f7c2, log: .claude/context/tmp/debug-a3f7c2.log
// === Step 6: STOP ===
// "Please reproduce the bug 3 times and let me know"
// User: "Done, reproduced 3 times"
// === Step 7: LOG ANALYSIS ===
// Log shows: queryCount=15 on every failure, queryCount=1 on success
// H1 CONFIRMED: N+1 query pattern in payment verification
// === Step 9: FIX ===
// Replace sequential queries with batch query
// Latency reduced 70%, query count: 15 → 1
// === Step 11: CLEANUP ===
// grep confirms zero debug artifacts in source files
// debug-a3f7c2.log deleted
Output Format
Provide structured report:
Issue Summary
Error, frequency, impact
Ranked Hypotheses
3–5 with probability %, evidence, falsification criteria
Instrumentation Plan
Files, locations, hypothesis targets, session ID
[STOP]
Reproduction request
Log Analysis
Evidence-to-hypothesis correlation, confirmed root cause
Fix Proposal
Code changes, risk, impact
Validation Plan
Steps to verify fix
Cleanup Confirmation
grep output showing zero debug artifacts
Prevention
Tests, monitoring, documentation Focus on actionable insights. Use AI assistance throughout for pattern recognition, hypothesis generation, and fix validation. Never skip the reproduction gate or cleanup step. Issue to debug: $ARGUMENTS Iron Laws NEVER write a fix before reading collected logs and confirming root cause from evidence ALWAYS generate 3–5 ranked hypotheses with probability percentages BEFORE any instrumentation NEVER leave debug instrumentation in code after the fix is verified and committed ALWAYS reproduce the bug before attempting any fix — confirmation via tests or scripts NEVER report root cause until trace evidence and log evidence agree independently Anti-Patterns Anti-Pattern Why It Fails Correct Approach Fixing before diagnosing Fix targets the wrong cause; bug persists or regresses Collect logs, confirm root cause from evidence, then write the fix Single hypothesis Miss the actual root cause by anchoring on first idea Generate 3–5 ranked hypotheses before any instrumentation Skipping reproduction Cannot verify fix worked; same bug resurfaces Auto-reproduce or pause for HITL before proceeding to fix Leaving debug instrumentation Debug noise in production logs; performance degradation Remove ALL log statements and debug code after fix is verified Claiming root cause without evidence Premature conclusion leads to wrong fix and lost time Require trace evidence and log evidence to agree before concluding Memory Protocol (MANDATORY) Before starting: Read .claude/context/memory/learnings.md After completing: New pattern -> .claude/context/memory/learnings.md Issue found -> .claude/context/memory/issues.md Decision made -> .claude/context/memory/decisions.md ASSUME INTERRUPTION: If it's not in memory, it didn't happen.
返回排行榜