response-rater

安装量: 55
排名: #13492

安装

npx skills add https://github.com/oimiragieo/agent-studio --skill response-rater

Response Rater Skill Step 1: Define Rating Rubric Use appropriate rubric for the content type: For Plans : Dimension Weight Description Completeness 20% All required sections present Feasibility 20% Plan is realistic and achievable Risk Mitigation 20% Risks identified with mitigations Agent Coverage 20% Appropriate agents assigned Integration 20% Fits with existing systems For Responses : Dimension Weight Description Correctness 25% Technically accurate Completeness 25% Addresses all requirements Clarity 25% Easy to understand Actionability 25% Provides clear next steps Step 2: Evaluate Each Dimension Score each dimension 1-10:

Dimension Scores

Completeness: 8/10

Has objectives, steps, and timeline

Missing risk assessment section

Feasibility: 7/10

Most steps are achievable

Step 3 timeline is aggressive

Risk Mitigation: 5/10

Only 1 risk identified

No mitigation strategies

Agent Coverage: 9/10

All steps have assigned agents

Good agent-task matching

Integration: 8/10

Uses existing APIs

Minor compatibility concerns Step 3: Calculate Overall Score Average the weighted scores: Overall = (8×0.2) + (7×0.2) + (5×0.2) + (9×0.2) + (8×0.2) = 7.4/10 Step 4: Generate Recommendations Provide actionable improvements:

Recommendations

High Priority 1. Add risk assessment section with 3-5 risks 2. Include mitigation strategies for each risk

Medium Priority 3. Extend Step 3 timeline by 2 days 4. Add fallback plan for external API dependency

Low Priority 5. Add success metrics for each step Step 5: Make Pass/Fail Decision Apply minimum score thresholds: Task Type Minimum Score Standard 7/10 Enterprise 8/10 Critical 9/10

Decision
**
Score
**
7.4/10
**
Threshold
**
7/10 (Standard)
**
Result
**
PASS ✅
Plan approved with recommendations for improvement.
Consistent Rubrics
Use same dimensions for comparable content
Clear Thresholds
Define pass/fail criteria upfront
Actionable Feedback
Provide specific improvements
Prioritize Issues
Order recommendations by impact
Document Rationale
Explain score justifications Rate this plan for user authentication implementation Response :

Plan Rating: User Authentication

Scores

Completeness: 8/10 - Good structure, missing tests section

Feasibility: 9/10 - Realistic timeline and approach

Risk Mitigation: 6/10 - Only 1 risk identified

Agent Coverage: 9/10 - Good agent assignments

Integration: 8/10 - Compatible with existing auth

Overall: 8.0/10 ✅ PASS

Top Recommendations 1. Add risk section with security and dependency risks 2. Include test plan for each authentication flow 3. Add rollback procedure for failed deployment Iron Laws ALWAYS use the same rubric dimensions when rating comparable content — inconsistent dimensions make scores meaningless and prevent valid comparison across sessions. NEVER issue a pass/fail decision without documenting score justification for each dimension — unjustified scores cannot be reviewed, challenged, or improved. ALWAYS apply defined minimum thresholds (7/10 standard, 8/10 enterprise, 9/10 critical) — ad-hoc thresholds produce inconsistent approval gates that erode trust in the rating system. NEVER provide vague recommendations — every recommendation must reference the specific dimension it addresses and state the concrete change required. ALWAYS prioritize recommendations by impact — high-priority items that would materially improve the score must be clearly distinguished from low-impact suggestions. Anti-Patterns Anti-Pattern Why It Fails Correct Approach Using different rubric dimensions for comparable content Scores cannot be compared across sessions; the rating loses its evaluative value Always use the same rubric (plans rubric for plans, responses rubric for responses) Omitting score justification for individual dimensions Scores without justification cannot be reviewed, verified, or acted upon Document specific evidence for each dimension score (what was present, what was missing) Setting thresholds arbitrarily per session Inconsistent thresholds invalidate the pass/fail gate; teams lose confidence in approvals Always apply the defined thresholds: 7/10 standard, 8/10 enterprise, 9/10 critical Providing vague recommendations ("improve quality", "add more detail") Vague feedback cannot be acted upon; no change results from the review Reference the specific dimension, score gap, and required concrete change for each recommendation Listing recommendations without priority ordering Equal-weight feedback causes raters to address low-impact items first Always order by impact: High (affects pass/fail threshold) before Medium before Low Memory Protocol (MANDATORY) Before starting: cat .claude/context/memory/learnings.md After completing: New pattern -> .claude/context/memory/learnings.md Issue found -> .claude/context/memory/issues.md Decision made -> .claude/context/memory/decisions.md ASSUME INTERRUPTION: Your context may reset. If it's not in memory, it didn't happen.

返回排行榜