Recovery Skill

When to Use

Context window exhausted mid-workflow

Session interrupted or lost

Need to resume from last completed step

Workflow state needs reconstruction

Step 1: Identify Last Completed Step

Check gate files

for last successful validation:

Location:

.claude/context/history/gates/{workflow_id}/

Find highest step number with validation_status: "pass"

This is the last successfully completed step

Review reasoning files

for progress:

Location:

.claude/context/history/reasoning/{workflow_id}/

Read reasoning files up to last completed step

Extract context and decisions made

Identify artifacts created

:

Check artifact registry:

.claude/context/artifacts/registry-{workflow_id}.json

List all artifacts created up to last step

Verify artifact files exist

Step 2: Load Plan Documents

Read plan document

(stateless):

Load

plan-{workflow_id}.json

from artifact registry

Extract current workflow state

Identify completed vs pending tasks

Load relevant phase plan

(if multi-phase):

Check if project is multi-phase (exceeds phase_size_max_lines threshold)

Load active phase plan:

plan-{workflow_id}-phase-{n}.json

Understand phase boundaries and dependencies

Understand current state

:

Map completed tasks to plan

Identify next steps

Check for dependencies

Step 3: Context Recovery

Load artifacts from last completed step

:

Read artifact registry

Load all artifacts with validation_status: "pass"

Verify artifact integrity

Read reasoning files for context

:

Load reasoning files from completed steps

Extract key decisions and context

Understand workflow progression

Reconstruct workflow state

:

Combine plan, artifacts, and reasoning

Create recovery state document

Validate state consistency

Step 4: Resume Execution

Continue from next step

:

Identify next step after last completed

Load step requirements from plan

Prepare inputs for next step

Planner updates plan status

(stateless):

Update plan-{workflow_id}.json with current status

Mark completed steps

Update progress tracking

Orchestrator coordinates next agents

:

Pass recovered artifacts to next step

Resume workflow execution

Monitor for additional interruptions

Failure Classification

When a task fails, classify the failure type:

Failure Type

Indicators

Recovery Action

BROKEN_BUILD

Build errors, syntax errors, module not found

ROLLBACK + fix

VERIFICATION_FAILED

Test failures, validation errors, assertion errors

RETRY with fix (max 3 attempts)

CIRCULAR_FIX

Same error 3+ times, similar approaches repeated

SKIP or ESCALATE

CONTEXT_EXHAUSTED

Token limit reached, maximum length exceeded

Compress context, continue

UNKNOWN

No pattern match

RETRY once, then ESCALATE

Circular Fix Detection

Iron Law

If the same approach has been tried 3+ times without success, STOP.

When circular fix is detected:

Stop

the current approach immediately

Document

what was tried (approaches, errors, files)

Try fundamentally different approach

(different library, different pattern, simpler implementation)

If still failing, ESCALATE

to human intervention

Detection Algorithm

:

Extract keywords from current approach (excluding stop words)

Compare with keywords from last 3 attempts

If Jaccard similarity > 30% for 2+ attempts, flag as circular

Example

:

Attempt 1: "Using async await for fetch"

Attempt 2: "Using async/await with try-catch"

Attempt 3: "Trying async await pattern again"

=> CIRCULAR FIX DETECTED - Stop and try callback pattern instead

Attempt Count Thresholds

Failure Type

Max Attempts

Then Action

VERIFICATION_FAILED

3

SKIP + ESCALATE

UNKNOWN

2

ESCALATE

BROKEN_BUILD

1

ROLLBACK (if good commit exists)

CIRCULAR_FIX

0

Immediately SKIP

References

See

references/

for detailed patterns:

failure-types.md

- Failure classification details and indicators

recovery-actions.md

- Recovery action decision tree and execution

merge-strategies.md

- File merge strategies for multi-agent scenarios

Recovery Validation Checklist

Last completed step identified correctly

Plan document loaded and validated

All artifacts from completed steps available

Reasoning files reviewed for context

Workflow state reconstructed accurately

No duplicate work will be performed

Next step inputs prepared

Recovery logged in reasoning file

Error Handling

Missing plan document

Request planner to recreate plan from requirements

Missing artifacts

Request artifact recreation from source agent

Corrupted artifacts

Request artifact recreation with validation
Incomplete reasoning: Use artifact registry and gate files to reconstruct state

1. Check gate files for last completed step

ls .claude/context/history/gates/ { workflow_id } /

2. Load plan document

cat .claude/context/artifacts/plan- { workflow_id } .json

3. Review reasoning files

cat .claude/context/history/reasoning/ { workflow_id } /*.json

4. Resume from next step

Natural language invocation : "Resume the workflow from where we left off" "Recover the workflow state and continue" "What was the last completed step?" Related Planner Agent: .claude/agents/core/planner.md Memory files: .claude/context/memory/ Memory Protocol (MANDATORY) Before starting: cat .claude/context/memory/learnings.md After completing: New pattern -> .claude/context/memory/learnings.md Issue found -> .claude/context/memory/issues.md Decision made -> .claude/context/memory/decisions.md ASSUME INTERRUPTION: Your context may reset. If it's not in memory, it didn't happen.

安装

1. Check gate files for last completed step

2. Load plan document

3. Review reasoning files

4. Resume from next step