Task Quality KPI Framework
Overview
The
Task Quality KPI Framework
provides
objective, quantitative metrics
for evaluating task implementation quality.
Key Architecture: KPIs are auto-generated by a hook - you read the results, not run scripts. ┌─────────────────────────────────────────────────────────────┐ │ HOOK (auto-executes) │ │ Trigger: PostToolUse on TASK-*.md │ │ Script: task-kpi-analyzer.py │ │ Output: TASK-XXX--kpi.json │ ├─────────────────────────────────────────────────────────────┤ │ SKILL / AGENT (reads output) │ │ Input: TASK-XXX--kpi.json │ │ Action: Make evaluation decisions │ └─────────────────────────────────────────────────────────────┘ Why This Architecture? Problem Solution Skills can't execute scripts Hook auto-runs on file save Subjective review_status Quantitative 0-10 scores "Looks good to me" Evidence-based evaluation Binary pass/fail Graduated quality levels KPI File Location After any task file modification, find KPI data at: docs/specs/[ID]/tasks/TASK-XXX--kpi.json KPI Categories ┌─────────────────────────────────────────────────────────────┐ │ OVERALL SCORE (0-10) │ ├─────────────────────────────────────────────────────────────┤ │ Spec Compliance (30%) │ │ ├── Acceptance Criteria Met (0-10) │ │ ├── Requirements Coverage (0-10) │ │ └── No Scope Creep (0-10) │ ├─────────────────────────────────────────────────────────────┤ │ Code Quality (25%) │ │ ├── Static Analysis (0-10) │ │ ├── Complexity (0-10) │ │ └── Patterns Alignment (0-10) │ ├─────────────────────────────────────────────────────────────┤ │ Test Coverage (25%) │ │ ├── Unit Tests Present (0-10) │ │ ├── Test/Code Ratio (0-10) │ │ └── Coverage Percentage (0-10) │ ├─────────────────────────────────────────────────────────────┤ │ Contract Fulfillment (20%) │ │ ├── Provides Verified (0-10) │ │ └── Expects Satisfied (0-10) │ └─────────────────────────────────────────────────────────────┘ Category Weights Category Weight Why Spec Compliance 30% Most important - did we build what was asked? Code Quality 25% Technical excellence Test Coverage 25% Verification and confidence Contract Fulfillment 20% Integration with other tasks When to Use Reading KPI data for task quality evaluation Understanding quality metrics and scoring breakdown Deciding whether to iterate or approve based on quantitative data Integrating KPI checks into automated loops ( agents_loop.py ) Generating evidence-based evaluation reports Instructions 1. Reading KPI Data (Primary Use) DO NOT run scripts - read the auto-generated file: Read the KPI file: docs/specs/001-feature/tasks/TASK-001--kpi.json 2. Understanding the Data The KPI file contains: { "task_id" : "TASK-001" , "evaluated_at" : "2026-01-15T10:30:00Z" , "overall_score" : 8.2 , "passed_threshold" : true , "threshold" : 7.5 , "kpi_scores" : [ { "category" : "Spec Compliance" , "weight" : 30 , "score" : 8.5 , "weighted_score" : 2.55 , "metrics" : { "acceptance_criteria_met" : 9.0 , "requirements_coverage" : 8.0 , "no_scope_creep" : 8.5 } , "evidence" : [ "Acceptance criteria: 9/10 checked" , "Requirements coverage: 8/10" ] } ] , "recommendations" : [ "Code Quality: Moderate improvements possible" ] , "summary" : "Score: 8.2/10 - PASSED" } 3. Making Decisions Use overall_score and passed_threshold : IF passed_threshold == true: → Task meets quality standards → Approve and proceed IF passed_threshold == false: → Task needs improvement → Check recommendations for specific targets → Create fix specification Integration with Workflow In Task Review (evaluator-agent)

Review Process 1. Read KPI file: TASK-XXX--kpi.json 2. Extract overall_score and kpi_scores 3. Read task file to validate 4. Generate evaluation report 5. Decision based on passed_threshold In agents_loop

Check KPI file exists

kpi_path

spec_path / "tasks" / f" { task_id } --kpi.json" if kpi_path . exists ( ) : kpi_data = json . loads ( kpi_path . read_text ( ) ) if kpi_data [ "passed_threshold" ] :

Quality threshold met

advance_state ( "update_done" ) else :

Need more work

fix_targets

kpi_data [ "recommendations" ] create_fix_task ( fix_targets ) advance_state ( "fix" ) else :

KPI not generated yet - task may not be implemented

log_warning ( "No KPI data found" ) Multi-Iteration Loop Instead of max 3 retries, iterate until quality threshold met: Iteration 1: Score 6.2 → FAILED → Fix: Improve test coverage Iteration 2: Score 7.1 → FAILED → Fix: Refactor complex functions Iteration 3: Score 7.8 → PASSED → Proceed Each iteration updates the KPI file automatically on task save. Threshold Guidelines Score Quality Level Action 9.0-10.0 Exceptional Approve, document best practices 8.0-8.9 Good Approve with minor notes 7.0-7.9 Acceptable Approve (if threshold 7.5) 6.0-6.9 Below Standard Request specific improvements < 6.0 Poor Significant rework required Recommended Thresholds Project Type Threshold Rationale Production MVP 8.0 High quality required Internal Tool 7.0 Good enough Prototype 6.0 Functional over perfect Critical System 8.5 No compromises Metric Details Spec Compliance Metrics Acceptance Criteria Met Calculates: (checked_criteria / total_criteria) * 10 Source: Task file checkbox count Example: 9/10 checked = 9.0 Requirements Coverage Calculates: Count of REQ-IDs this task covers Source: traceability-matrix.md Example: 4 requirements covered = 8.0 No Scope Creep Calculates: (implemented_files / expected_files) * 10 Source: Task "Files to Create" vs actual files Penalizes: Missing files or unexpected additions Code Quality Metrics Static Analysis Java: Maven Checkstyle TypeScript: ESLint Python: ruff Score: 10 if passes, 5 if issues found Complexity Calculates: Functions >50 lines Score: 10 - (long_functions_ratio * 5) Penalizes: Large, complex functions Patterns Alignment Checks: Knowledge Graph patterns Source: knowledge-graph.json Validates: Implementation follows project patterns Test Coverage Metrics Unit Tests Present Calculates: min(10, test_files * 5) 2 test files = maximum score Penalizes: Missing tests Test/Code Ratio Calculates: (test_count / code_count) * 10 1:1 ratio = 10/10 Ideal: At least 1 test file per code file Coverage Percentage Source: Coverage reports (JaCoCo, lcov, etc.) Calculates: coverage_percent / 10 80% coverage = 8.0 Contract Fulfillment Metrics Provides Verified Checks: Files exist and export expected symbols Source: Task provides frontmatter Validates: Contract satisfied Expects Satisfied Checks: Dependencies provide required files/symbols Source: Task expects frontmatter Validates: Prerequisites met When KPI File is Missing If TASK-XXX--kpi.json doesn't exist: Task was never modified - Hook runs on file save Hook failed - Check Claude Code logs Task is new - Save the file first to trigger hook DO NOT try to calculate KPIs manually. The hook runs automatically when: Task file is saved (Write tool) Task file is edited (Edit tool) Best Practices 1. Always Check KPI File Exists Before evaluating: Check if KPI file exists: docs/specs/[ID]/tasks/TASK-XXX--kpi.json If missing: - Task may not be implemented yet - Ask user to save the task file first 2. Trust the Metrics The KPIs are objective. Only override with documented evidence: Critical security issue not in metrics Logic error not caught by static analysis Exceptional quality not measured 3. Iterate on Low KPIs Target specific categories: ❌ "Fix code quality issues" ✅ "Improve Code Quality KPI from 5.2 to 7.0: - Complexity: Refactor processData() (5→8) - Patterns: Add error handling (6→8)" 4. Track KPI Trends Monitor quality over time: Sprint 1: Average KPI 6.8 Sprint 2: Average KPI 7.3 (+0.5) Sprint 3: Average KPI 7.9 (+0.6) Troubleshooting KPI File Not Generated Check: Hook enabled in hooks.json Task file name matches pattern TASK-*.md File was actually saved (not just viewed) KPI Scores Seem Wrong Validate: Check evidence field for data sources Verify files exist at expected paths Some metrics need build tools (Maven, npm) Low Scores Despite Good Code Possible causes: Missing test files No coverage report generated Acceptance criteria not checked Lint rules too strict Fix the root cause, not just the score. Examples Example 1: Reading KPI Data Read the KPI file to evaluate task quality: docs/specs/001-feature/tasks/TASK-042--kpi.json Based on the data: - Overall score: 6.8/10 (below threshold) - Lowest KPI: Test Coverage (5.0/10) - Recommendation: Add unit tests Decision: REQUEST FIXES - target Test Coverage improvement Example 2: Iteration Decision Iteration 1 KPI: Score 6.2 → FAILED - Spec Compliance: 7.0 ✓ - Code Quality: 5.5 ✗ - Test Coverage: 6.0 ✗ Fix targets: 1. Refactor complex functions (Code Quality) 2. Add test coverage (Test Coverage) Iteration 2 KPI: Score 7.8 → PASSED ✓ Example 3: agents_loop Integration

In agents_loop, after implementation step

kpi_file

spec_dir / "tasks" / f" { task_id } --kpi.json" if kpi_file . exists ( ) : kpi = json . loads ( kpi_file . read_text ( ) ) if kpi [ "passed_threshold" ] : print ( f"✅ Task passed quality check: { kpi [ 'overall_score' ] } /10" ) advance_state ( "update_done" ) else : print ( f"❌ Task failed quality check: { kpi [ 'overall_score' ] } /10" ) print ( "Recommendations:" ) for rec in kpi [ "recommendations" ] : print ( f" - { rec } " ) advance_state ( "fix" ) References evaluator-agent.md - Agent that uses KPI data for evaluation hooks.json - Hook configuration for auto-generation task-kpi-analyzer.py - Hook script (do not execute directly) agents_loop.py - Orchestrator that reads KPI for decisions

task-quality-kpi

安装