Expert testing engineer specializing in Agentforce agent testing, topic/action coverage analysis, and agentic fix loops. Execute agent tests, analyze failures, and automatically fix issues via sf-ai-agentscript (or sf-ai-agentforce-legacy for existing agents).

Core Responsibilities

Test Execution: Run agent tests via sf agent test run with coverage analysis
Test Spec Generation: Create YAML test specifications for agents
Coverage Analysis: Track topic selection accuracy, action invocation rates
Preview Testing: Interactive simulated and live agent testing
Agentic Fix Loop: Automatically fix failing agents and re-test
Cross-Skill Orchestration: Delegate fixes to sf-ai-agentforce, data to sf-data

📚 Document Map

| CLI commands | cli-commands.md | Complete sf agent test/preview reference

| Test spec format | test-spec-reference.md | YAML specification format and examples

| Auto-fix workflow | agentic-fix-loops.md | Automated test-fix cycles and Python scripts

| Live preview setup | connected-app-setup.md | OAuth for live preview mode

| Coverage metrics | coverage-analysis.md | Topic/action coverage analysis

| Fix decision tree | agentic-fix-loop.md | Detailed fix strategies

⚡ Quick Links:

Scoring System - 5-category validation
CLI Command Reference - Essential commands
Agentic Fix Loop - Auto-fix workflow
Test Spec Reference - Complete YAML format guide
Automated Testing - Python scripts and workflows

⚠️ CRITICAL: Orchestration Order

sf-metadata → sf-apex → sf-flow → sf-deploy → sf-ai-agentscript → sf-deploy → sf-ai-agentforce-testing (you are here)

Why testing is LAST:

Agent must be published before running automated tests
Agent must be activated for preview mode
All dependencies (Flows, Apex) must be deployed first
Test data (via sf-data) should exist before testing actions

⚠️ MANDATORY Delegation:

Fixes: ALWAYS use Skill(skill="sf-ai-agentscript") for agent script fixes (or sf-ai-agentforce-legacy for existing legacy agents)
Test Data: Use Skill(skill="sf-data") for action test data
OAuth Setup: Use Skill(skill="sf-connected-apps") for live preview

⚠️ CRITICAL: Org Requirements (Agent Testing Center)

Agent testing requires the Agent Testing Center feature, which is NOT enabled by default in all orgs.

Check if Agent Testing Center is Enabled

# This will fail if Agent Testing Center is not enabled
sf agent test list --target-org [alias]

# Expected errors if NOT enabled:
# "Not available for deploy for this organization"
# "INVALID_TYPE: Cannot use: AiEvaluationDefinition in this organization"

Orgs WITHOUT Agent Testing Center

| Standard DevHub | ❌ Not available | Request feature enablement

| SDO Demo Orgs | ❌ Not available | Use scratch org with feature

| Scratch Orgs | ✅ If feature enabled | Include in scratch-def.json

Enabling Agent Testing Center

Scratch Org - Add to scratch-def.json:

{
  "features": ["AgentTestingCenter", "EinsteinGPTForSalesforce"]
}

Production/Sandbox - Contact Salesforce to enable the feature
Fallback - Use sf agent preview for manual testing (see Automated Testing Guide)

⚠️ CRITICAL: Prerequisites Checklist

Before running agent tests, verify:

| Agent Testing Center enabled | sf agent test list --target-org [alias] | ⚠️ CRITICAL - tests will fail without this

| Agent exists | sf data query --use-tooling-api --query "SELECT Id FROM BotDefinition WHERE DeveloperName='X'" | Can't test non-existent agent

| Agent published | sf agent validate authoring-bundle --api-name X | Must be published to test

| Agent activated | Check activation status | Required for preview mode

| Dependencies deployed | Flows and Apex in org | Actions will fail without them

| Connected App (live) | OAuth configured | Required for --use-live-actions

Workflow (6-Phase Pattern)

Phase 1: Prerequisites

Use AskUserQuestion to gather:

Agent name/API name
Target org alias
Test mode (simulated vs live)
Coverage threshold (default: 80%)
Enable agentic fix loop?

Then:

Verify agent is published and activated
Check for existing test specs: Glob: **/*.yaml, Glob: **/tests/*.yaml
Create TodoWrite tasks

Phase 2: Test Spec Creation

Option A: Interactive Generation (no automation available)

# Interactive test spec generation
sf agent generate test-spec --output-file ./tests/agent-spec.yaml

# ⚠️ NOTE: There is NO --api-name flag! The command is interactive-only.

Option B: Automated Generation (Python script)

# Generate from agent file
python3 hooks/scripts/generate-test-spec.py \
  --agent-file /path/to/Agent.agent \
  --output tests/agent-spec.yaml \
  --verbose

See Test Spec Reference for complete YAML format guide.

Create Test in Org:

sf agent test create --spec ./tests/agent-spec.yaml --api-name MyAgentTest --target-org [alias]

Phase 3: Test Execution

Automated Tests:

sf agent test run --api-name MyAgentTest --wait 10 --result-format json --target-org [alias]

Interactive Preview (Simulated):

sf agent preview --api-name AgentName --output-dir ./logs --target-org [alias]

Interactive Preview (Live):

sf agent preview --api-name AgentName --use-live-actions --client-app AppName --apex-debug --target-org [alias]

Phase 4: Results Analysis

Parse test results JSON and display formatted summary:

📊 AGENT TEST RESULTS
════════════════════════════════════════════════════════════════

Agent: Customer_Support_Agent
Org: my-sandbox
Duration: 45.2s
Mode: Simulated

SUMMARY
───────────────────────────────────────────────────────────────
✅ Passed:    18
❌ Failed:    2
⏭️ Skipped:   0
📈 Topic Selection: 95%
🎯 Action Invocation: 90%

FAILED TESTS
───────────────────────────────────────────────────────────────
❌ test_complex_order_inquiry
   Utterance: "What's the status of orders 12345 and 67890?"
   Expected: get_order_status invoked 2 times
   Actual: get_order_status invoked 1 time
   Category: ACTION_INVOCATION_COUNT_MISMATCH

COVERAGE SUMMARY
───────────────────────────────────────────────────────────────
Topics Tested:       4/5 (80%) ⚠️
Actions Tested:      6/8 (75%) ⚠️
Guardrails Tested:   3/3 (100%) ✅

Phase 5: Agentic Fix Loop

When tests fail, automatically fix via sf-ai-agentscript:

| TOPIC_NOT_MATCHED | Topic description doesn't match utterance | Add keywords to topic description

| ACTION_NOT_INVOKED | Action description not triggered | Improve action description

| WRONG_ACTION_SELECTED | Wrong action chosen | Differentiate descriptions

| ACTION_FAILED | Flow/Apex error | Delegate to sf-flow or sf-apex

| GUARDRAIL_NOT_TRIGGERED | System instructions permissive | Add explicit guardrails

Auto-Fix Command Example:

Skill(skill="sf-ai-agentscript", args="Fix agent [AgentName] - Error: [category] - [details]")

See Agentic Fix Loops Guide for:

Complete decision tree
Detailed fix strategies for each error type
Cross-skill orchestration workflow
Python scripts for automated testing
Example fix loop executions

Phase 6: Coverage Improvement

If coverage < threshold:

Identify untested topics/actions from results
Add test cases to spec YAML
Update test: sf agent test create --spec ./tests/agent-spec.yaml --force-overwrite
Re-run: sf agent test run --api-name MyAgentTest --wait 10

Scoring System (100 Points)

| Topic Selection Coverage | 25 | All topics have test cases; various phrasings tested

| Action Invocation | 25 | All actions tested with valid inputs/outputs

| Edge Case Coverage | 20 | Negative tests; empty inputs; special characters; boundaries

| Test Spec Quality | 15 | Proper YAML; descriptions provided; categories assigned

| Agentic Fix Success | 15 | Auto-fixes resolve issues within 3 attempts

Scoring Thresholds:

⭐⭐⭐⭐⭐ 90-100 pts → Production Ready
⭐⭐⭐⭐   80-89 pts → Good, minor improvements
⭐⭐⭐    70-79 pts → Acceptable, needs work
⭐⭐      60-69 pts → Below standard
⭐        <60 pts  → BLOCKED - Major issues

⛔ TESTING GUARDRAILS (MANDATORY)

BEFORE running tests, verify:

| Agent published | sf agent list --target-org [alias] | Can't test unpublished agent

| Agent activated | Check status | Preview requires activation

| Flows deployed | sf org list metadata --metadata-type Flow | Actions need Flows

| Connected App (live) | Check OAuth | Live mode requires auth

NEVER do these:

| Test unpublished agent | Tests fail silently | Publish first: sf agent publish authoring-bundle

| Skip simulated testing | Live mode hides logic bugs | Always test simulated first

| Ignore guardrail tests | Security gaps in production | Always test harmful/off-topic inputs

| Single phrasing per topic | Misses routing failures | Test 3+ phrasings per topic

CLI Command Reference

Test Lifecycle Commands

| sf agent generate test-spec | Create test YAML | sf agent generate test-spec --output-dir ./tests

| sf agent test create | Deploy test to org | sf agent test create --spec ./tests/spec.yaml --target-org alias

| sf agent test run | Execute tests | sf agent test run --api-name Test --wait 10 --target-org alias

| sf agent test results | Get results | sf agent test results --job-id ID --result-format json

| sf agent test resume | Resume async test | sf agent test resume --use-most-recent --target-org alias

| sf agent test list | List test runs | sf agent test list --target-org alias

Preview Commands

| sf agent preview | Interactive testing | sf agent preview --api-name Agent --target-org alias

| --use-live-actions | Use real Flows/Apex | sf agent preview --use-live-actions --client-app App

| --output-dir | Save transcripts | sf agent preview --output-dir ./logs

| --apex-debug | Capture debug logs | sf agent preview --apex-debug

Result Formats

| human | Terminal display (default) | --result-format human

| json | CI/CD parsing | --result-format json

| junit | Test reporting | --result-format junit

| tap | Test Anything Protocol | --result-format tap

Test Spec Quick Reference

Basic Template:

subjectType: AGENT
subjectName: <Agent_Name>

testCases:
  # Topic routing
  - utterance: "What's on your menu?"
    expectation:
      topic: product_faq
      actionSequence: []

  # Action invocation
  - utterance: "Search for Harry Potter books"
    expectation:
      topic: book_search
      actionSequence:
        - search_catalog

  # Edge case
  - utterance: ""
    expectation:
      graceful_handling: true

For complete YAML format reference, see Test Spec Reference

Cross-Skill Integration

Required Delegations:

| Fix agent script | sf-ai-agentscript | Skill(skill="sf-ai-agentscript", args="Fix...")

| Create test data | sf-data | Skill(skill="sf-data", args="Create...")

| Fix failing Flow | sf-flow | Skill(skill="sf-flow", args="Fix...")

| Setup OAuth | sf-connected-apps | Skill(skill="sf-connected-apps", args="Create...")

| Analyze debug logs | sf-debug | Skill(skill="sf-debug", args="Analyze...")

For complete orchestration workflow, see Agentic Fix Loops

Automated Testing (Python Scripts)

This skill includes Python scripts for fully automated agent testing:

| generate-test-spec.py | Parse .agent files, generate YAML test specs

| run-automated-tests.py | Orchestrate full test workflow with fix suggestions

Quick Usage:

# Generate test spec from agent file
python3 hooks/scripts/generate-test-spec.py \
  --agent-file /path/to/Agent.agent \
  --output specs/Agent-tests.yaml

# Run full automated workflow
python3 hooks/scripts/run-automated-tests.py \
  --agent-name MyAgent \
  --agent-dir /path/to/project \
  --target-org dev

For complete documentation, see Agentic Fix Loops Guide

Templates Reference

| basic-test-spec.yaml | Quick start (3-5 tests) | templates/

| comprehensive-test-spec.yaml | Full coverage (20+ tests) | templates/

| guardrail-tests.yaml | Security/safety scenarios | templates/

| escalation-tests.yaml | Human handoff scenarios | templates/

| standard-test-spec.yaml | Reference format | templates/

💡 Key Insights

| sf agent test create fails | "Required fields are missing: [MasterLabel]" | Use sf agent generate test-spec (interactive) or UI instead

| Tests fail silently | No results returned | Agent not published - run sf agent publish authoring-bundle

| Topic not matched | Wrong topic selected | Add keywords to topic description (see Fix Loops)

| Action not invoked | Action never called | Improve action description, add explicit reference

| Live preview 401 | Authentication error | Connected App not configured - use sf-connected-apps

| Async tests stuck | Job never completes | Use sf agent test resume --use-most-recent

| Empty responses | Agent doesn't respond | Check agent is activated

| Agent Testing Center unavailable | "INVALID_TYPE" error | Use sf agent preview as fallback

| Topic expectation empty | Test always passes topic check | Bug in CLI YAML→XML conversion; use interactive mode

| ⚠️ --use-most-recent broken | "Nonexistent flag" error on sf agent test results | Use --job-id explicitly - the flag is documented but NOT implemented

| Topic name mismatch | Expected GeneralCRM, got MigrationDefaultTopic | Standard Salesforce copilots route to MigrationDefaultTopic - verify actual topic names from first test run

| Test data missing | "No matching records" in outcome | Verify test utterances reference records that actually exist in org (e.g., "Edge Communications" not "Acme")

| Action assertion fails unexpectedly | Expected [A], actual [A,B] but marked PASS | Action matching uses SUPERSET logic - actual can have MORE actions than expected and still pass

🔄 Two Fix Strategies

When agent tests fail, there are TWO valid approaches:

| Custom Agent (you control it) | Fix the agent via sf-ai-agentforce | Topic descriptions, action configurations need adjustment

| Managed/Standard Agent (Salesforce copilot) | Fix test expectations in YAML | Test expectations don't match actual agent behavior

Decision Flow:

Test Failed → Can you modify the agent?
                    │
          ┌────────┴────────┐
          ↓                 ↓
         YES                NO
          ↓                 ↓
    Fix Agent          Fix Test Spec
    (sf-ai-agentforce)  (update YAML)

Example: Fixing Test Expectations

# BEFORE (wrong expectations)
expectedTopic: GeneralCRM
expectedActions:
  - IdentifyRecordByName
  - GetRecordDetails

# AFTER (matches actual behavior)
expectedTopic: MigrationDefaultTopic
expectedActions:
  - IdentifyRecordByName
  - QueryRecords

🔄 Automated Test-Fix Loop

NEW in v1.1.0 | Claude Code can now orchestrate fully automated test-fix cycles

Overview

The test-fix loop enables Claude Code to:

Run tests → sf agent test run with JSON output
Analyze failures → Parse results and categorize issues
Fix agent → Invoke sf-ai-agentforce skill to apply fixes
Retest → Loop until all tests pass or max retries (3) reached
Escalate → Skip unfixable tests and continue with others

Quick Start

# Run the test-fix loop
./hooks/scripts/test-fix-loop.sh Test_Agentforce_v1 AgentforceTesting 3

# Exit codes:
#   0 = All tests passed
#   1 = Fixes needed (Claude Code should invoke sf-ai-agentforce)
#   2 = Max attempts reached, escalate to human
#   3 = Error (org unreachable, test not found, etc.)

Claude Code Integration

When Claude Code runs the test-fix loop:

USER: Run automated test-fix loop for Coral_Cloud_Agent

CLAUDE CODE:
1. bash hooks/scripts/test-fix-loop.sh Test_Agentforce_v1 AgentforceTesting
2. If exit code 1 (FIX_NEEDED):
   - Parse failure details from output
   - Invoke: Skill(skill="sf-ai-agentscript", args="Fix topic X: add keyword Y")
   - Re-run: CURRENT_ATTEMPT=2 bash hooks/scripts/test-fix-loop.sh ...
3. Repeat until exit code 0 (success) or 2 (max retries)

Ralph Wiggum Integration (Hands-Off)

For fully automated loops without user intervention:

/ralph-wiggum:ralph-loop
> Run agentic test-fix loop for Test_Agentforce_v1 in AgentforceTesting until all tests pass

Claude Code will autonomously:

Execute test-fix cycles
Apply fixes via sf-ai-agentscript skill
Track attempts and escalate when needed
Report final status

Failure Categories & Auto-Fix Strategies

| TOPIC_NOT_MATCHED | ✅ Yes | Add keywords to topic classificationDescription

| ACTION_NOT_INVOKED | ✅ Yes | Improve action description, add trigger conditions

| WRONG_ACTION_SELECTED | ✅ Yes | Differentiate action descriptions

| GUARDRAIL_NOT_TRIGGERED | ✅ Yes | Add explicit guardrails to system instructions

| ACTION_INVOCATION_FAILED | ⚠️ Conditional | Delegate to sf-flow or sf-apex skill

| RESPONSE_QUALITY_ISSUE | ✅ Yes | Add response format rules to topic instructions

Environment Variables

| CURRENT_ATTEMPT | Current attempt number (auto-incremented) | 1

| MAX_WAIT_MINUTES | Timeout for test execution | 10

| SKIP_TESTS | Comma-separated test names to skip | (none)

| VERBOSE | Enable detailed output | false

Machine-Readable Output

The script outputs structured data for Claude Code parsing:

---BEGIN_MACHINE_READABLE---
FIX_NEEDED: true
TEST_API_NAME: Test_Agentforce_v1
TARGET_ORG: AgentforceTesting
CURRENT_ATTEMPT: 1
MAX_ATTEMPTS: 3
NEXT_COMMAND: CURRENT_ATTEMPT=2 ./test-fix-loop.sh Test_Agentforce_v1 AgentforceTesting 3
---END_MACHINE_READABLE---

🐛 Known Issues & CLI Bugs

Last Updated: 2026-01-04 | Tested With: sf CLI v2.118.16

CRITICAL: `sf agent test create` MasterLabel Bug

Status: 🔴 BLOCKING - Prevents YAML-based test creation

Error:

Error (SfError): Required fields are missing: [MasterLabel]

Root Cause: The CLI generates XML from YAML but omits the required <name> element (MasterLabel).

Generated XML (broken):

<AiEvaluationDefinition xmlns="http://soap.sforce.com/2006/04/metadata">
    <subjectName>My_Agent</subjectName>
    <subjectType>AGENT</subjectType>
    <!-- ❌ MISSING: <name>Test Name</name> -->
    <testCase>...</testCase>
</AiEvaluationDefinition>

Working XML (from existing tests):

<AiEvaluationDefinition xmlns="http://soap.sforce.com/2006/04/metadata">
    <description>Test description</description>
    <name>Test Name</name>  <!-- ✅ REQUIRED -->
    <subjectName>My_Agent</subjectName>
    <subjectType>AGENT</subjectType>
    <testCase>...</testCase>
</AiEvaluationDefinition>

Workarounds:

✅ Use sf agent generate test-spec --from-definition to convert existing XML to YAML (produces correct format)
✅ Use interactive sf agent generate test-spec wizard (works correctly)
✅ Create tests via Salesforce Testing Center UI
✅ Deploy XML metadata directly (bypass YAML conversion)

MEDIUM: Interactive Mode Not Scriptable

Status: 🟡 Blocks CI/CD automation

Issue: sf agent generate test-spec only works interactively:

No --quiet, --json, or non-interactive flags
Piped input causes "User force closed the prompt" error
Cannot automate in CI/CD pipelines

What Works:

# Interactive (requires terminal)
sf agent generate test-spec --output-file ./tests/my-test.yaml

# Convert existing XML to YAML (non-interactive)
sf agent generate test-spec --from-definition path/to/test.xml --output-file ./output.yaml

Workaround: Use Python scripts in hooks/scripts/ to generate YAML programmatically.

MEDIUM: YAML vs XML Format Discrepancy

Issue: Documentation shows one YAML format, but Salesforce stores as different XML structure.

Doc Shows (doesn't map correctly):

testCases:
  - utterance: "Hello"
    expectation:
      topic: Welcome
      actionSequence: []

Actual Working Format (from --from-definition):

testCases:
  - utterance: "Hello"
    expectedTopic: Welcome
    expectedActions: []
    expectedOutcome: "Greeting response shown"

Key Mappings:

| expectedTopic | <expectation><name>topic_sequence_match</name><expectedValue>...</expectedValue>

| expectedActions | <expectation><name>action_sequence_match</name><expectedValue>[...]</expectedValue>

| expectedOutcome | <expectation><name>bot_response_rating</name><expectedValue>...</expectedValue>

LOW: Expectation Name Variations

Issue: Different test creation methods use different expectation names:

| topic_assertion | topic_sequence_match

| actions_assertion | action_sequence_match

| output_validation | bot_response_rating

Impact: May cause confusion when comparing test results from different sources.

Quick Start Example

# 1. Check if Agent Testing Center is enabled
sf agent test list --target-org dev

# 2. Generate test spec (automated)
python3 hooks/scripts/generate-test-spec.py \
  --agent-file ./agents/MyAgent.agent \
  --output ./tests/myagent-tests.yaml

# 3. Create test in org
sf agent test create \
  --spec ./tests/myagent-tests.yaml \
  --api-name MyAgentTest \
  --target-org dev

# 4. Run tests
sf agent test run \
  --api-name MyAgentTest \
  --wait 10 \
  --result-format json \
  --target-org dev

# 5. View results
sf agent test results \
  --use-most-recent \
  --verbose \
  --result-format json \
  --target-org dev

For complete workflows and fix loops, see:

Agentic Fix Loops - Automated testing and fix workflows
Test Spec Reference - Complete YAML format guide

安装