- Recovery Skill
- When to Use
- Context window exhausted mid-workflow
- Session interrupted or lost
- Need to resume from last completed step
- Workflow state needs reconstruction
- Step 1: Identify Last Completed Step
- Check gate files
- for last successful validation:
- Location:
- .claude/context/history/gates/{workflow_id}/
- Find highest step number with validation_status: "pass"
- This is the last successfully completed step
- Review reasoning files
- for progress:
- Location:
- .claude/context/history/reasoning/{workflow_id}/
- Read reasoning files up to last completed step
- Extract context and decisions made
- Identify artifacts created
- :
- Check artifact registry:
- .claude/context/artifacts/registry-{workflow_id}.json
- List all artifacts created up to last step
- Verify artifact files exist
- Step 2: Load Plan Documents
- Read plan document
- (stateless):
- Load
- plan-{workflow_id}.json
- from artifact registry
- Extract current workflow state
- Identify completed vs pending tasks
- Load relevant phase plan
- (if multi-phase):
- Check if project is multi-phase (exceeds phase_size_max_lines threshold)
- Load active phase plan:
- plan-{workflow_id}-phase-{n}.json
- Understand phase boundaries and dependencies
- Understand current state
- :
- Map completed tasks to plan
- Identify next steps
- Check for dependencies
- Step 3: Context Recovery
- Load artifacts from last completed step
- :
- Read artifact registry
- Load all artifacts with validation_status: "pass"
- Verify artifact integrity
- Read reasoning files for context
- :
- Load reasoning files from completed steps
- Extract key decisions and context
- Understand workflow progression
- Reconstruct workflow state
- :
- Combine plan, artifacts, and reasoning
- Create recovery state document
- Validate state consistency
- Step 4: Resume Execution
- Continue from next step
- :
- Identify next step after last completed
- Load step requirements from plan
- Prepare inputs for next step
- Planner updates plan status
- (stateless):
- Update plan-{workflow_id}.json with current status
- Mark completed steps
- Update progress tracking
- Orchestrator coordinates next agents
- :
- Pass recovered artifacts to next step
- Resume workflow execution
- Monitor for additional interruptions
- Failure Classification
- When a task fails, classify the failure type:
- Failure Type
- Indicators
- Recovery Action
- BROKEN_BUILD
- Build errors, syntax errors, module not found
- ROLLBACK + fix
- VERIFICATION_FAILED
- Test failures, validation errors, assertion errors
- RETRY with fix (max 3 attempts)
- CIRCULAR_FIX
- Same error 3+ times, similar approaches repeated
- SKIP or ESCALATE
- CONTEXT_EXHAUSTED
- Token limit reached, maximum length exceeded
- Compress context, continue
- UNKNOWN
- No pattern match
- RETRY once, then ESCALATE
- Circular Fix Detection
- Iron Law
-
- If the same approach has been tried 3+ times without success, STOP.
- When circular fix is detected:
- Stop
- the current approach immediately
- Document
- what was tried (approaches, errors, files)
- Try fundamentally different approach
- (different library, different pattern, simpler implementation)
- If still failing, ESCALATE
- to human intervention
- Detection Algorithm
- :
- Extract keywords from current approach (excluding stop words)
- Compare with keywords from last 3 attempts
- If Jaccard similarity > 30% for 2+ attempts, flag as circular
- Example
- :
- Attempt 1: "Using async await for fetch"
- Attempt 2: "Using async/await with try-catch"
- Attempt 3: "Trying async await pattern again"
- => CIRCULAR FIX DETECTED - Stop and try callback pattern instead
- Attempt Count Thresholds
- Failure Type
- Max Attempts
- Then Action
- VERIFICATION_FAILED
- 3
- SKIP + ESCALATE
- UNKNOWN
- 2
- ESCALATE
- BROKEN_BUILD
- 1
- ROLLBACK (if good commit exists)
- CIRCULAR_FIX
- 0
- Immediately SKIP
- References
- See
- references/
- for detailed patterns:
- failure-types.md
- - Failure classification details and indicators
- recovery-actions.md
- - Recovery action decision tree and execution
- merge-strategies.md
- - File merge strategies for multi-agent scenarios
- Recovery Validation Checklist
- Last completed step identified correctly
- Plan document loaded and validated
- All artifacts from completed steps available
- Reasoning files reviewed for context
- Workflow state reconstructed accurately
- No duplicate work will be performed
- Next step inputs prepared
- Recovery logged in reasoning file
- Error Handling
- Missing plan document
-
- Request planner to recreate plan from requirements
- Missing artifacts
-
- Request artifact recreation from source agent
- Corrupted artifacts
-
- Request artifact recreation with validation
- Incomplete reasoning
- Use artifact registry and gate files to reconstruct state
1. Check gate files for last completed step
ls .claude/context/history/gates/ { workflow_id } /
2. Load plan document
cat .claude/context/artifacts/plan- { workflow_id } .json
3. Review reasoning files
cat .claude/context/history/reasoning/ { workflow_id } /*.json
4. Resume from next step