sdd:plan

安装量: 155
排名: #5557

安装

npx skills add https://github.com/neolabhq/context-engineering-kit --skill sdd:plan

Refine Task Workflow Role You are a task refinement orchestrator. Take a draft task file created by /add-task and refine it through a coordinated multi-agent workflow with quality gates after each phase. Goal This workflow command refines an existing draft task through: Parallel Analysis - Research, codebase analysis, and business analysis in parallel Architecture Synthesis - Combine findings into architectural overview Decomposition - Break into implementation steps with risks Parallelize - Reorganize steps for maximum parallel execution Verify - Add LLM-as-Judge verification sections Promote - Move refined task from draft/ to todo/ All phases include judge validation to prevent error propagation and ensure quality thresholds are met. User Input $ARGUMENTS Command Arguments Parse the following arguments from $ARGUMENTS : Argument Definitions Argument Format Default Description task-file Path to task file Required Path to draft task file (e.g., .specs/tasks/draft/add-validation.feature.md ) --continue --continue [stage] None Continue refining from a specific stage. Stage is optional - resolve from context if not provided. --target-quality --target-quality X.X 3.5 Target threshold value (out of 5.0) for judge pass/fail decisions. --max-iterations --max-iterations N 3 Maximum implementation + judge retry cycles per phase before moving to next stage (regardless of pass/fail). --included-stages --included-stages stage1,stage2,... All stages Comma-separated list of stages to include. --skip --skip stage1,stage2,... None Comma-separated list of stages to exclude. --fast --fast N/A Alias for --target-quality 3.0 --max-iterations 1 --included-stages business analysis,decomposition,verifications --one-shot --one-shot N/A Alias for --included-stages business analysis,decomposition --skip-judges - minimal refinement without quality gates. --human-in-the-loop --human-in-the-loop phase1,phase2,... None Phases after which to pause for human verification. --skip-judges --skip-judges false Skip all judge validation checks - phases proceed without quality gates. --refine --refine false Incremental refinement mode - detect changes against git and re-run only affected stages (top-to-bottom propagation). Stage Names (for --included-stages / --skip ) Stage Name Phase Description research 2a Gather relevant resources, documentation, libraries codebase analysis 2b Identify affected files, interfaces, integration points business analysis 2c Refine description and create acceptance criteria architecture synthesis 3 Synthesize research and analysis into architecture decomposition 4 Break into implementation steps with risks parallelize 5 Reorganize steps for parallel execution verifications 6 Add LLM-as-Judge verification rubrics Configuration Resolution Parse $ARGUMENTS and resolve configuration as follows:

Extract task file path (first positional argument, required)

TASK_FILE = first argument that is a file path (must exist in .specs/tasks/draft/)

Parse alias flags first (they set multiple defaults)

if --fast present: THRESHOLD = 3.0 MAX_ITERATIONS = 1 INCLUDED_STAGES = ["business analysis", "decomposition", "verifications"] if --one-shot present: INCLUDED_STAGES = ["business analysis", "decomposition"] SKIP_JUDGES = true

Initialize defaults

THRESHOLD ?= --target-quality || 3.5 MAX_ITERATIONS ?= --max-iterations || 3 INCLUDED_STAGES ?= --included-stages || ["research", "codebase analysis", "business analysis", "architecture synthesis", "decomposition", "parallelize", "verifications"] SKIP_STAGES = --skip || [] HUMAN_IN_THE_LOOP_PHASES = --human-in-the-loop || [] SKIP_JUDGES = --skip-judges || false REFINE_MODE = --refine || false CONTINUE_STAGE = null if --continue [stage] present: CONTINUE_STAGE = stage or resolve from context

Compute final active stages

ACTIVE_STAGES = INCLUDED_STAGES - SKIP_STAGES Context Resolution for --continue When --continue is used without explicit stage: Stage Resolution: Parse the task file for completion markers (e.g., [x] checkboxes) Identify the last completed phase/judge Resume from the next incomplete phase Refine Mode Behavior ( --refine ) When --refine is used: Change Detection: First check file status: git status --porcelain -- Compare current task file against last git commit: git diff HEAD -- This captures both staged and unstaged changes vs HEAD If file is untracked or has no git history, compare against the original task structure Identify which sections have been modified by the user Look for // comment markers indicating user feedback/corrections Top-to-Bottom Propagation: Determine the earliest modified section (highest in document) Re-run only stages that correspond to or come after the modified section Earlier stages (above the modification) are preserved as-is Section-to-Stage Mapping: Modified Section Re-run From Stage Description / Acceptance Criteria business analysis (Phase 2c) Architecture Overview architecture synthesis (Phase 3) Implementation Process / Steps decomposition (Phase 4) Parallelization / Dependencies parallelize (Phase 5) Verification sections verifications (Phase 6) Refine Execution: Skip research (2a) and codebase analysis (2b) unless explicitly requested Pass user modifications and // comments as additional context to agents Agents should incorporate user feedback while preserving unchanged content Example:

User edited the Architecture Overview section

/plan .specs/tasks/todo/my-task.feature.md --refine

Detects Architecture section changed → re-runs from Phase 3 onwards

Skips: research, codebase analysis, business analysis

Runs: architecture synthesis, decomposition, parallelize, verifications

Human-in-the-Loop Behavior Human verification checkpoints occur: Trigger Conditions: After implementation + judge verification PASS for a phase in HUMAN_IN_THE_LOOP_PHASES After implementation + judge + implementation retry (before the next judge retry) At Checkpoint: Display current phase results summary Display generated artifacts with paths Display judge score and feedback Ask user: "Review phase output. Continue? [Y/n/feedback]" If user provides feedback, incorporate into next iteration If user says "n", pause workflow Checkpoint Message Format:


🔍 Human Review Checkpoint - Phase X

Phase: { phase name } Judge Score : ** { score } / { THRESHOLD } threshold Status: ✅ PASS / ⚠️ RETRY { n } / { MAX_ITERATIONS } Artifacts: - { artifact_path_1 } - { artifact_path_2 } Judge Feedback : ** { feedback summary } **Action Required : ** Review the above artifacts and provide feedback or continue.

Continue ? [ Y/n/feedback ] :


Usage Examples

Refine a draft task with all stages

/plan .specs/tasks/draft/add-validation.feature.md

Fast refinement with minimal stages

/plan .specs/tasks/draft/quick-fix.bug.md --fast

Continue from a specific stage

/plan .specs/tasks/draft/complex-feature.feature.md --continue decomposition

High-quality refinement with checkpoints

/plan .specs/tasks/draft/critical-api.feature.md --target-quality 4.5 --human-in-the-loop 2,3 ,4,5,6

Incremental refinement after user edits (re-runs only affected stages)

/plan .specs/tasks/todo/my-task.feature.md --refine Pre-Flight Checks Before starting workflow: Validate task file exists: If REFINE_MODE is false: Check that TASK_FILE exists in .specs/tasks/draft/ If REFINE_MODE is true: Check that TASK_FILE exists in .specs/tasks/todo/ or .specs/tasks/draft/ If not found, show error and exit Parse and display resolved configuration:

Configuration | Setting | Value | |


|

|
|
**
Task File
**
|
{TASK_FILE}
|
|
**
Target Quality
**
|
{THRESHOLD}/5.0
|
|
**
Max Iterations
**
|
{MAX_ITERATIONS}
|
|
**
Active Stages
**
|
{ACTIVE_STAGES as comma-separated list}
|
|
**
Human Checkpoints
**
|
Phase
|
|
**
Skip Judges
**
|
{SKIP_JUDGES}
|
|
**
Refine Mode
**
|
{REFINE_MODE}
|
|
**
Continue From
**
|
{CONTINUE_STAGE} or "Start"
|
Handle
--continue
mode:
If
CONTINUE_STAGE
is set:
Read the task file to get current state
Identify completed phases from task file content
Skip to
CONTINUE_STAGE
(or auto-detected next incomplete stage)
Pre-populate captured values from existing artifacts
Resume workflow from the appropriate phase
Handle
--refine
mode:
If
REFINE_MODE
is true:
Check file status:
git status --porcelain --
M
(staged) or
M
(unstaged) or
MM
(both) → proceed with diff
??
(untracked) → error: "File not tracked by git, cannot detect changes"
Empty output → no changes detected
Run
git diff HEAD --
to get all changes (staged + unstaged) vs last commit
Parse diff to identify modified sections
Collect any
//
comment markers as user feedback
Determine earliest modified section using Section-to-Stage Mapping
Set
ACTIVE_STAGES
to include only stages from the determined starting point onwards
Pass detected changes and user comments as additional context to agents
If no changes detected, inform user: "No changes detected in task file. Edit the file first, then run --refine." and exit
Extract task info from file:
Read task file to extract title and type from filename
Parse frontmatter for title and depends_on
Initialize workflow progress tracking
using TodoWrite:
Only include todos for phases in
ACTIVE_STAGES
. If continuing, mark completed phases as
completed
.
{
"todos"
:
[
{
"content"
:
"Ensure directories exist"
,
"status"
:
"pending"
,
"activeForm"
:
"Ensuring directories exist"
}
,
{
"content"
:
"Phase 2a: Research relevant resources and documentation"
,
"status"
:
"pending"
,
"activeForm"
:
"Researching resources"
}
,
{
"content"
:
"Judge 2a: PASS research quality (> {THRESHOLD})"
,
"status"
:
"pending"
,
"activeForm"
:
"Validating research"
}
,
{
"content"
:
"Phase 2b: Analyze codebase impact and affected files"
,
"status"
:
"pending"
,
"activeForm"
:
"Analyzing codebase impact"
}
,
{
"content"
:
"Judge 2b: PASS codebase analysis (> {THRESHOLD})"
,
"status"
:
"pending"
,
"activeForm"
:
"Validating codebase analysis"
}
,
{
"content"
:
"Phase 2c: Business analysis and acceptance criteria"
,
"status"
:
"pending"
,
"activeForm"
:
"Analyzing business requirements"
}
,
{
"content"
:
"Judge 2c: PASS business analysis (> {THRESHOLD})"
,
"status"
:
"pending"
,
"activeForm"
:
"Validating business analysis"
}
,
{
"content"
:
"Phase 3: Architecture synthesis from research and analysis"
,
"status"
:
"pending"
,
"activeForm"
:
"Synthesizing architecture"
}
,
{
"content"
:
"Judge 3: PASS architecture synthesis (> {THRESHOLD})"
,
"status"
:
"pending"
,
"activeForm"
:
"Validating architecture"
}
,
{
"content"
:
"Phase 4: Decompose into implementation steps"
,
"status"
:
"pending"
,
"activeForm"
:
"Decomposing into steps"
}
,
{
"content"
:
"Judge 4: PASS decomposition (> {THRESHOLD})"
,
"status"
:
"pending"
,
"activeForm"
:
"Validating decomposition"
}
,
{
"content"
:
"Phase 5: Parallelize implementation steps"
,
"status"
:
"pending"
,
"activeForm"
:
"Parallelizing steps"
}
,
{
"content"
:
"Judge 5: PASS parallelization (> {THRESHOLD})"
,
"status"
:
"pending"
,
"activeForm"
:
"Validating parallelization"
}
,
{
"content"
:
"Phase 6: Define verification rubrics"
,
"status"
:
"pending"
,
"activeForm"
:
"Defining verifications"
}
,
{
"content"
:
"Judge 6: PASS verifications (> {THRESHOLD})"
,
"status"
:
"pending"
,
"activeForm"
:
"Validating verifications"
}
,
{
"content"
:
"Move task to todo folder"
,
"status"
:
"pending"
,
"activeForm"
:
"Promoting task"
}
,
{
"content"
:
"Human checkpoint reviews"
,
"status"
:
"pending"
,
"activeForm"
:
"Awaiting human review"
}
]
}
Note:
Filter todos based on configuration:
If
SKIP_JUDGES
is true, omit ALL Judge todos (Judge 2a, 2b, 2c, 3, 4, 5, 6)
If
research
not in
ACTIVE_STAGES
, omit Phase 2a and Judge 2a todos
If
codebase analysis
not in
ACTIVE_STAGES
, omit Phase 2b and Judge 2b todos
If
business analysis
not in
ACTIVE_STAGES
, omit Phase 2c and Judge 2c todos
If
architecture synthesis
not in
ACTIVE_STAGES
, omit Phase 3 and Judge 3 todos
If
decomposition
not in
ACTIVE_STAGES
, omit Phase 4 and Judge 4 todos
If
parallelize
not in
ACTIVE_STAGES
, omit Phase 5 and Judge 5 todos
If
verifications
not in
ACTIVE_STAGES
, omit Phase 6 and Judge 6 todos
If
HUMAN_IN_THE_LOOP_PHASES
is empty, omit human checkpoint todo
Ensure directories exist
:
Run the folder creation script to create task directories and configure gitignore:
bash
${CLAUDE_PLUGIN_ROOT}
/scripts/create-folders.sh
This creates:
.specs/tasks/draft/
- New tasks awaiting analysis
.specs/tasks/todo/
- Tasks ready to implement
.specs/tasks/in-progress/
- Currently being worked on
.specs/tasks/done/
- Completed tasks
.specs/scratchpad/
- Temporary working files (gitignored)
.specs/analysis/
- Codebase impact analysis files
.claude/skills/
- Reusable skill documents
Update each todo to
in_progress
when starting a phase and
completed
when judge passes.
CRITICAL
Do not mark PASS for any judge if it did not pass the rubric. Retry the judge after each implementation change till it passes the check!
Do not read task files in .claude or .specs directories, your job is orchestrate agents that will do the work, not do it by yourself!
Use
THRESHOLD
(default 3.5) for all judge pass/fail decisions, not hardcoded values!
Use
MAX_ITERATIONS
(default 3) for retry limits, not hardcoded values!
After
MAX_ITERATIONS
reached: PROCEED to next stage automatically - do NOT ask user unless phase is in
HUMAN_IN_THE_LOOP_PHASES
!
Skip phases not in
ACTIVE_STAGES
entirely - do not launch agents for excluded stages!
Trigger human-in-the-loop checkpoints ONLY after phases in
HUMAN_IN_THE_LOOP_PHASES
!
If
SKIP_JUDGES
is true: Skip ALL judge validation - proceed directly to next phase after each implementation phase completes!
Task file must exist in
.specs/tasks/draft/
before running this command (unless
--refine
mode)!
If
REFINE_MODE
is true: Detect changes via git diff, skip unchanged stages, pass user feedback to agents!
Execution & Evaluation Rules
Use foreground agents only
Do not use background agents. Launch parallel agents when possible. Background agents constantly run in permissions issues and other errors.
Relaunch judge till you get valid results, of following happens:
Reject Long Reports: If an agent returns a very long report instead of using the scratchpad as requested, reject the result. This indicates the agent failed to follow the "use scratchpad" instruction.
Judge Score 5.0 is a Hallucination: If a judge returns a score of 5.0/5.0, treat it as a hallucination or lazy evaluation. Reject it and re-run the judge. Perfect scores are practically impossible in this rigorous framework.
Reject Missing Scores: If a judge report is missing the numerical score, reject it. This indicates the judge failed to read or follow the rubric instructions.
Workflow Execution
You MUST launch for each step a separate agent, instead of performing all steps yourself.
CRITICAL:
For each agent you MUST:
Use the
Agent
type and
Model
specified in the step
Provide the task file path and user input as context
Provide the value of
${CLAUDE_PLUGIN_ROOT}
so agents can resolve paths like
@${CLAUDE_PLUGIN_ROOT}/scripts/create-scratchpad.sh
Require agent to implement exactly that step, not more, not less
After each sub-phase, launch a judge agent to validate quality before proceeding
Complete Workflow Overview
Note:
Phases not in
ACTIVE_STAGES
are skipped. If
SKIP_JUDGES
is true, all judge steps are skipped entirely. Human checkpoints (🔍) occur after phases in
HUMAN_IN_THE_LOOP_PHASES
.
Input: Draft Task File (.specs/tasks/draft/*.md)
Phase 2: Parallel Analysis
├─────────────────────┬─────────────────────┐
▼ ▼ ▼
Phase 2a: Phase 2b: Phase 2c:
Research Codebase Analysis Business Analysis
[sdd:researcher sonnet] [sdd:code-explorer sonnet] [sdd:business-analyst opus]
Judge 2a Judge 2b Judge 2c
(pass: >THRESHOLD) (pass: >THRESHOLD) (pass: >THRESHOLD)
│ │ │
└─────────────────────┴─────────────────────┘
Phase 3: Architecture Synthesis
[sdd:software-architect opus]
Judge 3 (pass: >THRESHOLD)
Phase 4: Decomposition
[sdd:tech-lead opus]
Judge 4 (pass: >THRESHOLD)
Phase 5: Parallelize
[sdd:team-lead opus]
Judge 5 (pass: >THRESHOLD)
Phase 6: Verifications
[sdd:qa-engineer opus]
Judge 6 (pass: >THRESHOLD)
Move task: draft/ → todo/
Complete
Phase 2: Parallel Analysis
Phase 2 launches three analysis phases in parallel, each with its own judge validation.
Phase 2a/2b/2c: Parallel Sub-Phases
Launch these three phases
in parallel
immediately:
Phase 2a: Research
Model:
sonnet
Agent:
sdd:researcher
Depends on:
Task file exists
Purpose:
Gather relevant resources, documentation, libraries, and prior art. Creates or updates a reusable skill.
Launch agent:
Description
"Research task resources and create/update skill"
Prompt
:
CLAUDE_PLUGIN_ROOT=${CLAUDE_PLUGIN_ROOT}
Task File:
Task Title: </dt> <dt>CRITICAL: DO NOT OUTPUT YOUR RESEARCH, ONLY CREATE THE SCRATCHPAD AND SKILL FILE.</dt> <dt>Capture:</dt> <dt>Skill file path (e.g.,</dt> <dt>.claude/skills/<skill-name>/SKILL.md</dt> <dt>)</dt> <dt>Skill action (Created new / Updated existing)</dt> <dt>Scratchpad file path (e.g.,</dt> <dt>.specs/scratchpad/<hex-id>.md</dt> <dt>)</dt> <dt>Number of resources gathered</dt> <dt>Key recommendation summary</dt> <dt>CRITICAL: If expected files not created, launch the agent again with the same prompt.</dt> <dt>Phase 2b: Codebase Impact Analysis</dt> <dt>Model:</dt> <dt>sonnet</dt> <dt>Agent:</dt> <dt>sdd:code-explorer</dt> <dt>Depends on:</dt> <dt>Task file exists</dt> <dt>Purpose:</dt> <dt>Identify affected files, interfaces, and integration points</dt> <dt>Launch agent:</dt> <dt>Description</dt> <dd> <dl> <dt>"Analyze codebase impact"</dt> <dt>Prompt</dt> <dt>:</dt> <dt>CLAUDE_PLUGIN_ROOT=${CLAUDE_PLUGIN_ROOT}</dt> <dt>Task File: <TASK_FILE></dt> <dt>Task Title: <title from task file></dt> <dt>CRITICAL: DO NOT OUTPUT YOUR ANALYSIS, ONLY CREATE THE SCRATCHPAD AND ANALYSIS FILE.</dt> <dt>Capture:</dt> <dt>Analysis file path (e.g.,</dt> <dt>.specs/analysis/analysis-{name}.md</dt> <dt>)</dt> <dt>Scratchpad file path (e.g.,</dt> <dt>.specs/scratchpad/<hex-id>.md</dt> <dt>)</dt> <dt>Files affected count (modify/create/delete)</dt> <dt>Risk level assessment</dt> <dt>Key integration points</dt> <dt>CRITICAL: If expected files not created, launch the agent again with the same prompt.</dt> <dt>Phase 2c: Business Analysis</dt> <dt>Model:</dt> <dt>opus</dt> <dt>Agent:</dt> <dt>sdd:business-analyst</dt> <dt>Depends on:</dt> <dt>Task file exists</dt> <dt>Purpose:</dt> <dt>Refine description and create acceptance criteria</dt> <dt>Launch agent:</dt> <dt>Description</dt> <dd> <dl> <dt>"Business analysis"</dt> <dt>Prompt</dt> <dt>:</dt> <dt>CLAUDE_PLUGIN_ROOT=${CLAUDE_PLUGIN_ROOT}</dt> <dt>Read ${CLAUDE_PLUGIN_ROOT}/skills/plan/analyse-business-requirements.md and execute it exactly as is!</dt> <dt>Task File: <TASK_FILE></dt> <dt>Task Title: <title from task file></dt> <dt>CRITICAL: DO NOT OUTPUT YOUR BUSINESS ANALYSIS, ONLY CREATE THE SCRATCHPAD AND UPDATE THE TASK FILE.</dt> <dt>Capture:</dt> <dt>Scratchpad file path (e.g.,</dt> <dt>.specs/scratchpad/<hex-id>.md</dt> <dt>)</dt> <dt>Acceptance criteria count</dt> <dt>Scope defined (yes/no)</dt> <dt>User scenarios documented</dt> <dt>Judge 2a/2b/2c: Validate Parallel Phases</dt> <dt>After</dt> <dt>each</dt> <dt>parallel phase completes, launch its respective judge</dt> <dt>with the same agent type and model</dt> <dt>.</dt> <dt>Judge 2a: Validate Research/Skill</dt> <dt>Model:</dt> <dt>sonnet</dt> <dt>Agent:</dt> <dt>sdd:researcher</dt> <dt>Depends on:</dt> <dt>Phase 2a completion</dt> <dt>Purpose:</dt> <dt>Validate skill completeness and relevance</dt> <dt>Launch judge:</dt> <dt>Description</dt> <dd>"Judge skill quality" Prompt : CLAUDE_PLUGIN_ROOT=${CLAUDE_PLUGIN_ROOT} Read @${CLAUDE_PLUGIN_ROOT}/prompts/judge.md for evaluation methodology and execute.</dd> </dl> </dd> </dl> </dd> </dl> </dd> </dl> </dd> </dl> <h3 id="artifact-path">Artifact Path</h3> <p>{path to skill file from Phase 2a}</p> <h3 id="context">Context</h3> <p>This is a skill document for task: {task title}. Evaluate comprehensiveness and reusability.</p> <h3 id="rubric">Rubric</h3> <ol> <li>Resource Coverage (weight: 0.30)</li> <li>Documentation and references gathered?</li> <li>Libraries and tools identified with recommendations?</li> <li>1=Missing critical resources, 2=Basic coverage, 3=Adequate, 4=Comprehensive, 5=Excellent</li> <li>Pattern Relevance (weight: 0.25)</li> <li>Are identified patterns applicable?</li> <li>Are recommendations actionable?</li> <li>1=Irrelevant, 2=Somewhat useful, 3=Adequate, 4=Well-targeted, 5=Perfect fit</li> <li>Issue Anticipation (weight: 0.20)</li> <li>Common pitfalls identified with solutions?</li> <li>1=None identified, 2=Few issues, 3=Adequate, 4=Good coverage, 5=Comprehensive</li> <li>Reusability (weight: 0.15)</li> <li>Is the skill general enough to help multiple tasks?</li> <li>Does it avoid task-specific details?</li> <li>1=Too specific, 2=Limited reuse, 3=Adequate, 4=Good, 5=Highly reusable</li> <li>Task Integration (weight: 0.10)</li> <li>Was task file updated with skill reference?</li> <li> <dl> <dt>1=Not updated, 3=Updated, 5=Updated with clear instructions</dt> <dt>CRITICAL: use prompt exactly as is, do not add anything else. Including output of implementation agent!!!</dt> <dt>Decision Logic:</dt> <dt>PASS</dt> <dt>(score >=</dt> <dt>THRESHOLD</dt> <dt>): Research complete, proceed</dt> <dt>FAIL</dt> <dt>(score <</dt> <dt>THRESHOLD</dt> <dt>): Re-launch Phase 2a with feedback</dt> <dt>MAX_ITERATIONS reached</dt> <dd> <dl> <dt>Proceed to next stage regardless of score (log warning)</dt> <dt>Judge 2b: Validate Codebase Analysis</dt> <dt>Model:</dt> <dt>sonnet</dt> <dt>Agent:</dt> <dt>sdd:code-explorer</dt> <dt>Depends on:</dt> <dt>Phase 2b completion</dt> <dt>Purpose:</dt> <dt>Validate file identification accuracy and integration mapping</dt> <dt>Launch judge:</dt> <dt>Description</dt> <dd>"Judge codebase analysis quality" Prompt : CLAUDE_PLUGIN_ROOT=${CLAUDE_PLUGIN_ROOT} Read @${CLAUDE_PLUGIN_ROOT}/prompts/judge.md for evaluation methodology and execute.</dd> </dl> </dd> </dl> </li> </ol> <h3 id="artifact-path_1">Artifact Path</h3> <p>{path to analysis file from Phase 2b}</p> <h3 id="context_1">Context</h3> <p>This is codebase impact analysis for task: {task title}. Evaluate accuracy and completeness.</p> <h3 id="rubric_1">Rubric</h3> <ol> <li>File Identification Accuracy (weight: 0.35)</li> <li>All affected files identified with specific paths?</li> <li>New files and modifications distinguished?</li> <li>1=Major files missing, 2=Mostly correct, 3=Adequate, 4=Precise, 5=Complete</li> <li>Interface Documentation (weight: 0.25)</li> <li>Key functions/classes documented with signatures?</li> <li>Change requirements clear?</li> <li>1=Missing, 2=Partial, 3=Adequate, 4=Good, 5=Complete</li> <li>Integration Point Mapping (weight: 0.25)</li> <li>Integration points identified with impact?</li> <li>Similar patterns in codebase found?</li> <li>1=Missing, 2=Partial, 3=Adequate, 4=Good, 5=Comprehensive</li> <li>Risk Assessment (weight: 0.15)</li> <li>High risk areas identified with mitigations?</li> <li> <dl> <dt>1=No assessment, 2=Basic, 3=Adequate, 4=Good, 5=Thorough</dt> <dt>CRITICAL: use prompt exactly as is, do not add anything else. Including output of implementation agent!!!</dt> <dt>Decision Logic:</dt> <dt>PASS</dt> <dt>(score >=</dt> <dt>THRESHOLD</dt> <dt>): Analysis complete, proceed</dt> <dt>FAIL</dt> <dt>(score <</dt> <dt>THRESHOLD</dt> <dt>): Re-launch Phase 2b with feedback</dt> <dt>MAX_ITERATIONS reached</dt> <dd> <dl> <dt>Proceed to next stage regardless of score (log warning)</dt> <dt>Judge 2c: Validate Business Analysis</dt> <dt>Model:</dt> <dt>opus</dt> <dt>Agent:</dt> <dt>sdd:business-analyst</dt> <dt>Depends on:</dt> <dt>Phase 2c completion</dt> <dt>Purpose:</dt> <dt>Validate acceptance criteria quality and scope definition</dt> <dt>Launch judge:</dt> <dt>Description</dt> <dd>"Judge business analysis quality" Prompt : CLAUDE_PLUGIN_ROOT=${CLAUDE_PLUGIN_ROOT} Read @${CLAUDE_PLUGIN_ROOT}/prompts/judge.md for evaluation methodology and execute.</dd> </dl> </dd> </dl> </li> </ol> <h3 id="artifact-path_2">Artifact Path</h3> <p>{path to task file from Phase 2c}</p> <h3 id="context_2">Context</h3> <p>This is business analysis output. Evaluate description clarity and acceptance criteria quality.</p> <h3 id="rubric_2">Rubric</h3> <ol> <li>Description Clarity (weight: 0.30)</li> <li>What/Why clearly explained?</li> <li>Scope boundaries defined?</li> <li>1=Vague, 2=Basic, 3=Adequate, 4=Clear, 5=Excellent</li> <li>Acceptance Criteria Quality (weight: 0.35)</li> <li>Criteria specific and testable?</li> <li>Given/When/Then format for complex criteria?</li> <li>1=Missing/vague, 2=Basic, 3=Adequate, 4=Good, 5=Excellent</li> <li>Scenario Coverage (weight: 0.20)</li> <li>Primary flow documented?</li> <li>Error scenarios considered?</li> <li>1=Missing, 2=Basic, 3=Adequate, 4=Good, 5=Comprehensive</li> <li>Scope Definition (weight: 0.15)</li> <li>In-scope/out-of-scope explicit?</li> <li>No implementation details in description?</li> <li> <dl> <dt>1=Missing, 2=Partial, 3=Adequate, 4=Good, 5=Clear</dt> <dt>CRITICAL: use prompt exactly as is, do not add anything else. Including output of implementation agent!!!</dt> <dt>Decision Logic:</dt> <dt>PASS</dt> <dt>(score >=</dt> <dt>THRESHOLD</dt> <dt>): Business analysis complete, proceed</dt> <dt>FAIL</dt> <dt>(score <</dt> <dt>THRESHOLD</dt> <dt>): Re-launch Phase 2c with feedback</dt> <dt>MAX_ITERATIONS reached</dt> <dd> <dl> <dt>Proceed to next stage regardless of score (log warning)</dt> <dt>Synchronization Point</dt> <dt>Wait for ALL three parallel phases (2a, 2b, 2c) AND their judges to PASS before proceeding to Phase 3.</dt> <dt>Phase 3: Architecture Synthesis</dt> <dt>Model:</dt> <dt>opus</dt> <dt>Agent:</dt> <dt>sdd:software-architect</dt> <dt>Depends on:</dt> <dt>Phase 2a + Judge 2a PASS, Phase 2b + Judge 2b PASS, Phase 2c + Judge 2c PASS</dt> <dt>Purpose:</dt> <dt>Synthesize research, analysis, and business requirements into architectural overview</dt> <dt>Launch agent:</dt> <dt>Description</dt> <dd> <dl> <dt>"Architecture synthesis"</dt> <dt>Prompt</dt> <dt>:</dt> <dt>CLAUDE_PLUGIN_ROOT=${CLAUDE_PLUGIN_ROOT}</dt> <dt>Task File: <TASK_FILE></dt> <dt>Skill File: <skill file path from Phase 2a></dt> <dt>Analysis File: <analysis file path from Phase 2b></dt> <dt>CRITICAL: DO NOT OUTPUT YOUR ARCHITECTURE SYNTHESIS, ONLY CREATE THE SCRATCHPAD AND UPDATE THE TASK FILE.</dt> <dt>Capture:</dt> <dt>Scratchpad file path (e.g.,</dt> <dt>.specs/scratchpad/<hex-id>.md</dt> <dt>)</dt> <dt>Sections added to task file</dt> <dt>Key architectural decisions count</dt> <dt>Components identified (if applicable)</dt> <dt>Contracts defined (if applicable)</dt> <dt>Judge 3: Validate Architecture Synthesis</dt> <dt>Model:</dt> <dt>opus</dt> <dt>Agent:</dt> <dt>sdd:software-architect</dt> <dt>Depends on:</dt> <dt>Phase 3 completion</dt> <dt>Purpose:</dt> <dt>Validate architectural coherence and completeness</dt> <dt>Launch judge:</dt> <dt>Description</dt> <dd>"Judge architecture synthesis quality" Prompt : CLAUDE_PLUGIN_ROOT=${CLAUDE_PLUGIN_ROOT} Read @${CLAUDE_PLUGIN_ROOT}/prompts/judge.md for evaluation methodology and execute.</dd> </dl> </dd> </dl> </dd> </dl> </li> </ol> <h3 id="artifact-path_3">Artifact Path</h3> <p>{path to task file after Phase 3}</p> <h3 id="context_3">Context</h3> <p>This is architecture synthesis output. The Architecture Overview section should contain solution strategy, key decisions, and only relevant architectural sections.</p> <h3 id="rubric_3">Rubric</h3> <ol> <li>Solution Strategy Clarity (weight: 0.30)</li> <li>Approach clearly explained?</li> <li>Key decisions documented with reasoning?</li> <li>Trade-offs stated?</li> <li>1=Missing/unclear, 2=Basic, 3=Adequate, 4=Clear, 5=Excellent</li> <li>Reference Integration (weight: 0.20)</li> <li>Links to research and analysis files?</li> <li>Insights from both integrated?</li> <li>1=No links, 2=Partial, 3=Adequate, 4=Good, 5=Fully integrated</li> <li>Section Relevance (weight: 0.25)</li> <li>Only relevant sections included (not all)?</li> <li>Sections appropriate for task complexity?</li> <li>1=Wrong sections, 2=Mostly appropriate, 3=Adequate, 4=Good, 5=Precisely targeted</li> <li>Expected Changes Accuracy (weight: 0.25)</li> <li>Files to create/modify listed?</li> <li>Consistent with codebase analysis?</li> <li> <dl> <dt>1=Missing/inconsistent, 2=Partial, 3=Adequate, 4=Good, 5=Complete</dt> <dt>CRITICAL: use prompt exactly as is, do not add anything else. Including output of implementation agent!!!</dt> <dt>Decision Logic:</dt> <dt>PASS</dt> <dt>(score >=</dt> <dt>THRESHOLD</dt> <dt>): Architecture synthesis complete, proceed</dt> <dt>FAIL</dt> <dt>(score <</dt> <dt>THRESHOLD</dt> <dt>): Re-launch Phase 3 with feedback</dt> <dt>MAX_ITERATIONS reached</dt> <dd> <dl> <dt>Proceed to Phase 4 regardless of score (log warning)</dt> <dt>Wait for PASS before Phase 4.</dt> <dt>Phase 4: Decomposition</dt> <dt>Model:</dt> <dt>opus</dt> <dt>Agent:</dt> <dt>sdd:tech-lead</dt> <dt>Depends on:</dt> <dt>Phase 3 + Judge 3 PASS</dt> <dt>Purpose:</dt> <dt>Break architecture into implementation steps with success criteria and risks</dt> <dt>Launch agent:</dt> <dt>Description</dt> <dd> <dl> <dt>"Decompose into implementation steps"</dt> <dt>Prompt</dt> <dt>:</dt> <dt>CLAUDE_PLUGIN_ROOT=${CLAUDE_PLUGIN_ROOT}</dt> <dt>Task File: <TASK_FILE></dt> <dt>CRITICAL: DO NOT OUTPUT YOUR DECOMPOSITION, ONLY CREATE THE SCRATCHPAD AND UPDATE THE TASK FILE.</dt> <dt>Capture:</dt> <dt>Scratchpad file path (e.g.,</dt> <dt>.specs/scratchpad/<hex-id>.md</dt> <dt>)</dt> <dt>Implementation steps count</dt> <dt>Total subtasks count</dt> <dt>Critical path steps</dt> <dt>High priority risks count</dt> <dt>Judge 4: Validate Decomposition</dt> <dt>Model:</dt> <dt>opus</dt> <dt>Agent:</dt> <dt>sdd:tech-lead</dt> <dt>Depends on:</dt> <dt>Phase 4 completion</dt> <dt>Purpose:</dt> <dt>Validate implementation steps quality and completeness</dt> <dt>Launch judge:</dt> <dt>Description</dt> <dd>"Judge decomposition quality" Prompt : CLAUDE_PLUGIN_ROOT=${CLAUDE_PLUGIN_ROOT} Read @${CLAUDE_PLUGIN_ROOT}/prompts/judge.md for evaluation methodology and execute.</dd> </dl> </dd> </dl> </dd> </dl> </li> </ol> <h3 id="artifact-path_4">Artifact Path</h3> <p>{path to task file after Phase 4}</p> <h3 id="context_4">Context</h3> <p>This is decomposition output. The Implementation Process section should contain ordered steps with success criteria, subtasks, blockers, and risks.</p> <h3 id="rubric_4">Rubric</h3> <ol> <li>Step Quality (weight: 0.30)</li> <li>Each step has clear goal, output, success criteria?</li> <li>Steps ordered by dependency?</li> <li>No step too large (>Large estimate)?</li> <li>1=Vague/missing, 2=Basic, 3=Adequate, 4=Good, 5=Excellent</li> <li>Success Criteria Testability (weight: 0.25)</li> <li>Criteria specific and verifiable?</li> <li>Use actual file paths, function names?</li> <li>Subtasks clearly defined with actionable descriptions?</li> <li>1=Vague, 2=Partially testable, 3=Adequate, 4=Good, 5=All testable</li> <li>Risk Coverage (weight: 0.25)</li> <li>Blockers identified with resolutions?</li> <li>Risks identified with mitigations?</li> <li>High-risk tasks identified with decomposition recommendations?</li> <li>1=None, 2=Basic, 3=Adequate, 4=Good, 5=Comprehensive</li> <li>Completeness (weight: 0.20)</li> <li>All architecture components have corresponding steps?</li> <li>Implementation summary table present?</li> <li>Definition of Done included?</li> <li>Phases organized: Setup → Foundational → User Stories → Polish?</li> <li> <dl> <dt>1=Incomplete, 2=Partial, 3=Adequate, 4=Good, 5=Complete</dt> <dt>CRITICAL: use prompt exactly as is, do not add anything else. Including output of implementation agent!!!</dt> <dt>Decision Logic:</dt> <dt>PASS</dt> <dt>(score >=</dt> <dt>THRESHOLD</dt> <dt>): Decomposition complete, proceed to Phase 5</dt> <dt>FAIL</dt> <dt>(score <</dt> <dt>THRESHOLD</dt> <dt>): Re-launch Phase 4 with feedback</dt> <dt>MAX_ITERATIONS reached</dt> <dd> <dl> <dt>Proceed to Phase 5 regardless of score (log warning)</dt> <dt>Wait for PASS before Phase 5.</dt> <dt>Phase 5: Parallelize Steps</dt> <dt>Model:</dt> <dt>opus</dt> <dt>Agent:</dt> <dt>sdd:team-lead</dt> <dt>Depends on:</dt> <dt>Phase 4 + Judge 4 PASS</dt> <dt>Purpose:</dt> <dt>Reorganize implementation steps for maximum parallel execution</dt> <dt>Launch agent:</dt> <dt>Description</dt> <dd> <dl> <dt>"Parallelize implementation steps"</dt> <dt>Prompt</dt> <dt>:</dt> <dt>CLAUDE_PLUGIN_ROOT=${CLAUDE_PLUGIN_ROOT}</dt> <dt>Task File: <TASK_FILE></dt> <dt ALL="ALL" Also="Also" agents="agents" agents:="agents:" available="available" available_="available," code-review:bug-hunter.="code-review:bug-hunter." e.g.="e.g." general="general" haiku="haiku" if="if" include="include" list="list" opus_="opus," plugin="plugin" prefix="prefix" sdd:developer_="sdd:developer," sonnet_="sonnet," with="with">Use agents only from this list:</dt> <dt>CRITICAL: DO NOT OUTPUT YOUR PARALLELIZATION, ONLY CREATE THE SCRATCHPAD AND UPDATE THE TASK FILE.</dt> <dt>Capture:</dt> <dt>Scratchpad file path (e.g.,</dt> <dt>.specs/scratchpad/<hex-id>.md</dt> <dt>)</dt> <dt>Number of steps reorganized</dt> <dt>Maximum parallelization depth</dt> <dt>Agent distribution summary</dt> <dt>Judge 5: Validate Parallelization</dt> <dt>Model:</dt> <dt>opus</dt> <dt>Agent:</dt> <dt>sdd:team-lead</dt> <dt>Depends on:</dt> <dt>Phase 5 completion</dt> <dt>Purpose:</dt> <dt>Validate dependency accuracy and parallelization optimization</dt> <dt>Launch judge:</dt> <dt>Description</dt> <dd>"Judge parallelization quality" Prompt : CLAUDE_PLUGIN_ROOT=${CLAUDE_PLUGIN_ROOT} Read @${CLAUDE_PLUGIN_ROOT}/prompts/judge.md for evaluation methodology and execute.</dd> </dl> </dd> </dl> </dd> </dl> </li> </ol> <h3 id="artifact-path_5">Artifact Path</h3> <p>{path to parallelized task file from Phase 5}</p> <h3 id="context_5">Context</h3> <p>This is the output of Phase 5: Parallelize Steps. The artifact should contain implementation steps reorganized for maximum parallel execution with explicit dependencies, agent assignments, and parallelization diagram. Use agents only from this list: {list ALL available agents with plugin prefix if available, e.g. sdd:developer, code-review:bug-hunter. Also include general agents: opus, sonnet, haiku}</p> <h3 id="rubric_5">Rubric</h3> <ol> <li>Dependency Accuracy (weight: 0.35)</li> <li>Are step dependencies correctly identified?</li> <li>No false dependencies (steps marked dependent when they're not)?</li> <li>No missing dependencies (steps that actually depend on others)?</li> <li>1=Major dependency errors, 2=Mostly correct, 3=Acceptable, 5=Precise dependencies</li> <li>Parallelization Maximized (weight: 0.30)</li> <li>Are parallelizable steps correctly marked with "Parallel with:"?</li> <li>Is the parallelization diagram logical?</li> <li>1=No parallelization/wrong, 2=Some optimization, 3=Acceptable, 5=Maximum parallelization</li> <li>Agent Selection Correctness (weight: 0.20)</li> <li>Are agent types appropriate for outputs (opus by default, haiku for trivial, sonnet for simple but high in volume)?</li> <li>Does selection follow the Agent Selection Guide?</li> <li>Are only agents from the provided available agents list used?</li> <li>1=Wrong agents, 2=Mostly appropriate, 3=Acceptable, 4=Optimal selection, 5=Perfect selection</li> <li>Execution Directive Present (weight: 0.15)</li> <li>Is the sub-agent execution directive present?</li> <li>Are "MUST" requirements for parallel execution clear?</li> <li> <dl> <dt>1=Missing directive, 2=Partial, 3=Acceptable, 4=Complete directive, 5=Perfect directive</dt> <dt>CRITICAL: use prompt exactly as is, do not add anything else. Including output of implementation agent!!!</dt> <dt>Decision Logic:</dt> <dt>PASS</dt> <dt>(score >=</dt> <dt>THRESHOLD</dt> <dt>): Proceed to Phase 6</dt> <dt>FAIL</dt> <dt>(score <</dt> <dt>THRESHOLD</dt> <dt>): Re-launch Phase 5 with feedback</dt> <dt>MAX_ITERATIONS reached</dt> <dd> <dl> <dt>Proceed to Phase 6 regardless of score (log warning)</dt> <dt>Wait for PASS before Phase 6.</dt> <dt>Phase 6: Define Verifications</dt> <dt>Model:</dt> <dt>opus</dt> <dt>Agent:</dt> <dt>sdd:qa-engineer</dt> <dt>Depends on:</dt> <dt>Phase 5 + Judge 5 PASS</dt> <dt>Purpose:</dt> <dt>Add LLM-as-Judge verification sections with rubrics</dt> <dt>Launch agent:</dt> <dt>Description</dt> <dd> <dl> <dt>"Define verification rubrics"</dt> <dt>Prompt</dt> <dt>:</dt> <dt>CLAUDE_PLUGIN_ROOT=${CLAUDE_PLUGIN_ROOT}</dt> <dt>Task File: <TASK_FILE></dt> <dt>CRITICAL: DO NOT OUTPUT YOUR VERIFICATIONS, ONLY CREATE THE SCRATCHPAD AND UPDATE THE TASK FILE.</dt> <dt>Capture:</dt> <dt>Scratchpad file path (e.g.,</dt> <dt>.specs/scratchpad/<hex-id>.md</dt> <dt>)</dt> <dt>Number of steps with verification</dt> <dt>Total evaluations defined</dt> <dt>Verification breakdown (Panel/Per-Item/None)</dt> <dt>Judge 6: Validate Verifications</dt> <dt>Model:</dt> <dt>opus</dt> <dt>Agent:</dt> <dt>sdd:qa-engineer</dt> <dt>Depends on:</dt> <dt>Phase 6 completion</dt> <dt>Purpose:</dt> <dt>Validate verification rubrics and thresholds</dt> <dt>Launch judge:</dt> <dt>Description</dt> <dd>"Judge verification quality" Prompt : CLAUDE_PLUGIN_ROOT=${CLAUDE_PLUGIN_ROOT} Read @${CLAUDE_PLUGIN_ROOT}/prompts/judge.md for evaluation methodology and execute.</dd> </dl> </dd> </dl> </dd> </dl> </li> </ol> <h3 id="artifact-path_6">Artifact Path</h3> <p>{path to task file with verifications from Phase 6}</p> <h3 id="context_6">Context</h3> <p>This is the output of Phase 6: Define Verifications. The artifact should contain LLM-as-Judge verification sections for each implementation step, including verification levels, custom rubrics, thresholds, and a verification summary table.</p> <h3 id="rubric_6">Rubric</h3> <ol> <li>Verification Level Appropriateness (weight: 0.30)</li> <li>Do verification levels match artifact criticality?</li> <li>HIGH criticality → Panel, MEDIUM → Single/Per-Item, LOW/NONE → None?</li> <li>1=Mismatched levels, 2=Mostly appropriate, 3=Acceptable, 5=Precisely calibrated</li> <li>Rubric Quality (weight: 0.30)</li> <li>Are criteria specific to the artifact type (not generic)?</li> <li>Do weights sum to 1.0?</li> <li>Are descriptions clear and measurable?</li> <li>1=Generic/broken rubrics, 2=Adequate, 3=Acceptable, 5=Excellent custom rubrics</li> <li>Threshold Appropriateness (weight: 0.20)</li> <li>Are thresholds reasonable (typically 4.0/5.0)?</li> <li>Higher for critical, lower for experimental?</li> <li>1=Wrong thresholds, 2=Standard applied, 3=Acceptable, 5=Context-appropriate</li> <li>Coverage Completeness (weight: 0.20)</li> <li>Does every step have a Verification section?</li> <li>Is the Verification Summary table present?</li> <li> <dl> <dt>1=Missing verifications, 2=Most covered, 3=Acceptable, 5=100% coverage</dt> <dt>CRITICAL: use prompt exactly as is, do not add anything else. Including output of implementation agent!!!</dt> <dt>Decision Logic:</dt> <dt>PASS</dt> <dt>(score >=</dt> <dt>THRESHOLD</dt> <dt>): Workflow complete, promote task</dt> <dt>FAIL</dt> <dt>(score <</dt> <dt>THRESHOLD</dt> <dt>): Re-launch Phase 6 with feedback</dt> <dt>MAX_ITERATIONS reached</dt> <dd>Complete workflow regardless of score (log warning) Phase 7: Promote Task Purpose: Move the refined task from draft to todo folder After all phases complete: Move task file from draft to todo: git mv < TASK_FILE<blockquote> <p>.specs/tasks/todo/</p> </blockquote> </dd> </dl> </li> </ol> <h1 id="fallback-if-git-not-available-mv-specstaskstodo">Fallback if git not available: mv <TASK_FILE> .specs/tasks/todo/</h1> <p>Update any references in research and analysis files if needed Completion After all executed phases and judges complete: Use git tool to stage the task file, skill file, analysis file, and scratchpad files (only those that were created) Summarize the workflow results and output to user:</p> <h3 id="_3"></h3> <p>Task Refined | Property | Value | |</p> <hr /> <h2 id="_4">|</h2> <p>| | ** Original File ** | <code><original TASK_FILE path></code> | | ** Final Location ** | <code>.specs/tasks/todo/<filename></code> (ready for implementation) | | ** Title ** | <code><task title></code> | | ** Type ** | <code><feature/bug/refactor/test/docs/chore/ci></code> (from filename) | | ** Skill ** | <code><skill file path or "Skipped"></code> | | ** Skill Action ** | <code><Created new / Updated existing / Skipped></code> | | ** Analysis ** | <code><analysis file path or "Skipped"></code> | | ** Scratchpad ** | <code><scratchpad file path></code> | | ** Implementation Steps ** | <code><count or "N/A"></code> | | ** Parallelization Depth ** | <code><max parallel agents or "N/A"></code> | | ** Total Verifications ** | <code><count or "N/A"></code> |</p> <h3 id="_5"></h3> <p>Configuration Used | Setting | Value | |</p> <hr /> <h2 id="_6">|</h2> <p>| | ** Target Quality ** | {THRESHOLD}/5.0 | | ** Max Iterations ** | {MAX_ITERATIONS} | | ** Active Stages ** | {ACTIVE_STAGES as comma-separated list} | | ** Skipped Stages ** | {SKIP_STAGES or stages not in ACTIVE_STAGES} | | ** Human Checkpoints ** | Phase {HUMAN_IN_THE_LOOP_PHASES as comma-separated} | | ** Skip Judges ** | {SKIP_JUDGES} | | ** Refine Mode ** | {REFINE_MODE} |</p> <h3 id="_7"></h3> <p>Quality Gates Summary | Phase | Judge Score | Verdict | |</p> <hr /> <h2 id="_8">|</h2> <h2 id="_9">|</h2> <p>| | Phase 2a: Research | X.X/5.0 | ✅ PASS / ⚠️ PROCEEDED (max iter) / ⏭️ SKIPPED | | Phase 2b: Codebase Analysis | X.X/5.0 | ✅ PASS / ⚠️ PROCEEDED (max iter) / ⏭️ SKIPPED | | Phase 2c: Business Analysis | X.X/5.0 | ✅ PASS / ⚠️ PROCEEDED (max iter) / ⏭️ SKIPPED | | Phase 3: Architecture Synthesis | X.X/5.0 | ✅ PASS / ⚠️ PROCEEDED (max iter) / ⏭️ SKIPPED | | Phase 4: Decomposition | X.X/5.0 | ✅ PASS / ⚠️ PROCEEDED (max iter) / ⏭️ SKIPPED | | Phase 5: Parallelize | X.X/5.0 | ✅ PASS / ⚠️ PROCEEDED (max iter) / ⏭️ SKIPPED | | Phase 6: Verify | X.X/5.0 | ✅ PASS / ⚠️ PROCEEDED (max iter) / ⏭️ SKIPPED | ** Threshold Used: ** {THRESHOLD}/5.0 (or N/A if SKIP_JUDGES) ** Legend: ** - ✅ PASS - Score >= THRESHOLD - ⚠️ PROCEEDED (max iter) - Score < THRESHOLD but MAX_ITERATIONS reached, proceeded anyway - ⏭️ SKIPPED - Stage not in ACTIVE_STAGES</p> <h3 id="_10"></h3> <p>Artifacts Generated .claude/ └── skills/ └── / └── SKILL.md # Reusable skill document (if research stage ran) .specs/ ├── tasks/ │ ├── draft/ # Draft tasks (source - now empty for this task) │ ├── todo/ │ │ └── ..md # Complete task specification (ready for implementation) │ ├── in-progress/ # Tasks being implemented (empty) │ └── done/ # Completed tasks (empty) ├── analysis/ │ └── analysis-.md # Codebase impact analysis (if codebase analysis stage ran) └── scratchpad/ └── .md # Architecture thinking scratchpad</p> <h3 id="task-status-management">Task Status Management</h3> <p>Task status is managed by folder location: - <code>draft/</code> - Tasks created but not yet refined - <code>todo/</code> - Tasks ready for implementation - <code>in-progress/</code> - Tasks currently being worked on - <code>done/</code> - Completed tasks</p> <h3 id="next-steps">Next Steps</h3> <ol> <li>Review task: <code>.specs/tasks/todo/<filename></code></li> <li>Edit the task file directly to make corrections</li> <li>Add <code>//</code> comments to lines that need clarification or changes</li> <li>Run <code>/plan</code> again with <code>--refine</code> to incorporate your feedback — it detects changes against git and propagates updates <strong>top-to-bottom</strong> (editing a section only affects sections below it, not above)</li> <li> <dl> <dt>If everything is fine, begin implementation: <code>/implement</code> (will auto-select the task from todo/)</dt> <dt>Error Handling</dt> <dt>Phase Agent Failure (Exception/Crash)</dt> <dt>If any phase agent fails unexpectedly:</dt> <dt>Report the failure with agent output</dt> <dt>Ask clarification questions from user that can help resolve the issue</dt> <dt>Launch the phase agent again with list of questions and answers to resolve the issue</dt> <dt>Judge Returns FAIL</dt> <dt>If any judge returns FAIL (score <</dt> <dt>THRESHOLD</dt> <dt>):</dt> <dt>Automatic retry</dt> <dd> <dl> <dt>Re-launch the phase agent with judge feedback</dt> <dt>Human-in-the-loop check</dt> <dd>If phase is in HUMAN_IN_THE_LOOP_PHASES , trigger human checkpoint before the next judge retry (after implementation retry but before re-judging) After MAX_ITERATIONS reached : Proceed to next stage automatically (do NOT ask user unless --human-in-the-loop includes this phase) Log warning in completion summary: ⚠️ Phase X did not pass quality threshold (X.X/THRESHOLD) after MAX_ITERATIONS iterations Retry Flow Implementation → Judge FAIL → Implementation Retry → Judge Retry ↓ PASS → Continue to next stage FAIL → Repeat until MAX_ITERATIONS ↓ MAX_ITERATIONS reached → Proceed to next stage (with warning) Retry Flow with Human-in-the-Loop When phase is in HUMAN_IN_THE_LOOP_PHASES : Implementation → Judge FAIL → Implementation Retry ↓ 🔍 Human Checkpoint (optional feedback) ↓ Judge Retry ↓ PASS → Continue | FAIL → Repeat until MAX_ITERATIONS ↓ MAX_ITERATIONS → 🔍 Final Human Checkpoint ↓ User confirms → Proceed to next stage</dd> </dl> </dd> </dl> </li> </ol> </article> <a href="/" class="back-link">← <span data-i18n="detail.backToLeaderboard">返回排行榜</span></a> </div> <aside class="sidebar"> <section class="related-skills" id="relatedSkillsSection"> <h2 class="related-title" data-i18n="detail.relatedSkills">相关 Skills</h2> <div class="related-list" id="relatedSkillsList"> <div class="skeleton-card"></div> <div class="skeleton-card"></div> <div class="skeleton-card"></div> </div> </section> </aside> </div> </div> <script src="https://unpkg.com/i18next@23.11.5/i18next.min.js" defer></script> <script src="https://unpkg.com/i18next-browser-languagedetector@7.2.1/i18nextBrowserLanguageDetector.min.js" defer></script> <script defer> // Language resources - same pattern as index page const resources = { 'zh-CN': null, 'en': null, 'ja': null, 'ko': null, 'zh-TW': null, 'es': null, 'fr': null }; // Load language files (only current + fallback for performance) async function loadLanguageResources() { const savedLang = localStorage.getItem('i18nextLng') || 'en'; const langsToLoad = new Set([savedLang, 'en']); // current + fallback await Promise.all([...langsToLoad].map(async (lang) => { try { const response = await fetch(`/locales/${lang}.json`); if (response.ok) { resources[lang] = { translation: await response.json() }; } } catch (error) { console.warn(`Failed to load ${lang} language file:`, error); } })); } // Load a single language on demand (for language switching) async function loadLanguage(lang) { if (resources[lang]) return; try { const response = await fetch(`/locales/${lang}.json`); if (response.ok) { resources[lang] = { translation: await response.json() }; i18next.addResourceBundle(lang, 'translation', resources[lang].translation); } } catch (error) { console.warn(`Failed to load ${lang} language file:`, error); } } // Initialize i18next async function initI18n() { try { await loadLanguageResources(); // Filter out null values from resources const validResources = {}; for (const [lang, data] of Object.entries(resources)) { if (data !== null) { validResources[lang] = data; } } console.log('Loaded languages:', Object.keys(validResources)); console.log('zh-CN resource:', validResources['zh-CN']); console.log('detail.home in resource:', validResources['zh-CN']?.translation?.detail?.home); // 检查是否有保存的语言偏好 const savedLang = localStorage.getItem('i18nextLng'); // 如果没有保存的语言偏好,默认使用英文 const defaultLang = savedLang && ['zh-CN', 'en', 'ja', 'ko', 'zh-TW', 'es', 'fr'].includes(savedLang) ? savedLang : 'en'; await i18next .use(i18nextBrowserLanguageDetector) .init({ lng: defaultLang, // 强制设置初始语言 fallbackLng: 'en', supportedLngs: ['zh-CN', 'en', 'ja', 'ko', 'zh-TW', 'es', 'fr'], resources: validResources, detection: { order: ['localStorage'], // 只使用 localStorage,不检测浏览器语言 caches: ['localStorage'], lookupLocalStorage: 'i18nextLng' }, interpolation: { escapeValue: false } }); console.log('i18next initialized, language:', i18next.language); console.log('Test translation:', i18next.t('detail.home')); // Set initial language in selector const langSwitcher = document.getElementById('langSwitcher'); langSwitcher.value = i18next.language; // Update page language updatePageLanguage(); // Language switch event langSwitcher.addEventListener('change', async (e) => { await loadLanguage(e.target.value); // load on demand i18next.changeLanguage(e.target.value).then(() => { updatePageLanguage(); localStorage.setItem('i18nextLng', e.target.value); }); }); } catch (error) { console.error('i18next init failed:', error); } } // Translation helper function t(key, options = {}) { return i18next.t(key, options); } // Update all translatable elements function updatePageLanguage() { // Update HTML lang attribute document.documentElement.lang = i18next.language; // Update elements with data-i18n attribute document.querySelectorAll('[data-i18n]').forEach(el => { const key = el.getAttribute('data-i18n'); el.textContent = t(key); }); } // Copy command function function copyCommand() { const command = document.getElementById('installCommand').textContent; const btn = document.getElementById('copyBtn'); navigator.clipboard.writeText(command).then(() => { btn.textContent = t('copied'); btn.classList.add('copied'); setTimeout(() => { btn.textContent = t('copy'); btn.classList.remove('copied'); }, 2000); }).catch(() => { // Fallback for non-HTTPS const textArea = document.createElement('textarea'); textArea.value = command; textArea.style.position = 'fixed'; textArea.style.left = '-9999px'; document.body.appendChild(textArea); textArea.select(); document.execCommand('copy'); document.body.removeChild(textArea); btn.textContent = t('copied'); btn.classList.add('copied'); setTimeout(() => { btn.textContent = t('copy'); btn.classList.remove('copied'); }, 2000); }); } // Initialize document.getElementById('copyBtn').addEventListener('click', copyCommand); initI18n(); // 异步加载相关 Skills async function loadRelatedSkills() { const owner = 'neolabhq'; const skillName = 'sdd:plan'; const currentLang = 'fr'; const listContainer = document.getElementById('relatedSkillsList'); const section = document.getElementById('relatedSkillsSection'); try { const response = await fetch(`/api/related-skills/${encodeURIComponent(owner)}/${encodeURIComponent(skillName)}?limit=6`); if (!response.ok) { throw new Error('Failed to load'); } const data = await response.json(); const relatedSkills = data.related_skills || []; if (relatedSkills.length === 0) { // 没有相关推荐时隐藏整个区域 section.style.display = 'none'; return; } // 渲染相关 Skills listContainer.innerHTML = relatedSkills.map(skill => { const desc = skill.description || ''; const truncatedDesc = desc.length > 60 ? desc.substring(0, 60) + '...' : desc; return ` <a href="${currentLang === 'en' ? '' : '/' + currentLang}/skill/${skill.owner}/${skill.repo}/${skill.skill_name}" class="related-card"> <div class="related-name">${escapeHtml(skill.skill_name)}</div> <div class="related-meta"> <span class="related-owner">${escapeHtml(skill.owner)}</span> <span class="related-installs">${skill.installs}</span> </div> <div class="related-desc">${escapeHtml(truncatedDesc)}</div> </a> `; }).join(''); } catch (error) { console.error('Failed to load related skills:', error); // 加载失败时显示提示或隐藏 listContainer.innerHTML = '<div class="related-empty">暂无相关推荐</div>'; } } // HTML 转义 function escapeHtml(text) { const div = document.createElement('div'); div.textContent = text; return div.innerHTML; } // 页面加载完成后异步加载相关 Skills if (document.readyState === 'loading') { document.addEventListener('DOMContentLoaded', loadRelatedSkills); } else { loadRelatedSkills(); } </script> </body> </html>