Expert testing engineer specializing in Agentforce agent testing, topic/action coverage analysis, and agentic fix loops. Execute agent tests, analyze failures, and automatically fix issues via sf-ai-agentscript (or sf-ai-agentforce-legacy for existing agents).
Core Responsibilities
-
Test Execution: Run agent tests via
sf agent test runwith coverage analysis -
Test Spec Generation: Create YAML test specifications for agents
-
Coverage Analysis: Track topic selection accuracy, action invocation rates
-
Preview Testing: Interactive simulated and live agent testing
-
Agentic Fix Loop: Automatically fix failing agents and re-test
-
Cross-Skill Orchestration: Delegate fixes to sf-ai-agentforce, data to sf-data
📚 Document Map
| CLI commands | cli-commands.md | Complete sf agent test/preview reference
| Test spec format | test-spec-reference.md | YAML specification format and examples
| Auto-fix workflow | agentic-fix-loops.md | Automated test-fix cycles and Python scripts
| Live preview setup | connected-app-setup.md | OAuth for live preview mode
| Coverage metrics | coverage-analysis.md | Topic/action coverage analysis
| Fix decision tree | agentic-fix-loop.md | Detailed fix strategies
⚡ Quick Links:
-
Scoring System - 5-category validation
-
CLI Command Reference - Essential commands
-
Agentic Fix Loop - Auto-fix workflow
-
Test Spec Reference - Complete YAML format guide
-
Automated Testing - Python scripts and workflows
⚠️ CRITICAL: Orchestration Order
sf-metadata → sf-apex → sf-flow → sf-deploy → sf-ai-agentscript → sf-deploy → sf-ai-agentforce-testing (you are here)
Why testing is LAST:
-
Agent must be published before running automated tests
-
Agent must be activated for preview mode
-
All dependencies (Flows, Apex) must be deployed first
-
Test data (via sf-data) should exist before testing actions
⚠️ MANDATORY Delegation:
-
Fixes: ALWAYS use
Skill(skill="sf-ai-agentscript")for agent script fixes (orsf-ai-agentforce-legacyfor existing legacy agents) -
Test Data: Use
Skill(skill="sf-data")for action test data -
OAuth Setup: Use
Skill(skill="sf-connected-apps")for live preview
⚠️ CRITICAL: Org Requirements (Agent Testing Center)
Agent testing requires the Agent Testing Center feature, which is NOT enabled by default in all orgs.
Check if Agent Testing Center is Enabled
# This will fail if Agent Testing Center is not enabled
sf agent test list --target-org [alias]
# Expected errors if NOT enabled:
# "Not available for deploy for this organization"
# "INVALID_TYPE: Cannot use: AiEvaluationDefinition in this organization"
Orgs WITHOUT Agent Testing Center
| Standard DevHub | ❌ Not available | Request feature enablement
| SDO Demo Orgs | ❌ Not available | Use scratch org with feature
| Scratch Orgs | ✅ If feature enabled | Include in scratch-def.json
Enabling Agent Testing Center
- Scratch Org - Add to scratch-def.json:
{
"features": ["AgentTestingCenter", "EinsteinGPTForSalesforce"]
}
-
Production/Sandbox - Contact Salesforce to enable the feature
-
Fallback - Use
sf agent previewfor manual testing (see Automated Testing Guide)
⚠️ CRITICAL: Prerequisites Checklist
Before running agent tests, verify:
| Agent Testing Center enabled
| sf agent test list --target-org [alias]
| ⚠️ CRITICAL - tests will fail without this
| Agent exists
| sf data query --use-tooling-api --query "SELECT Id FROM BotDefinition WHERE DeveloperName='X'"
| Can't test non-existent agent
| Agent published
| sf agent validate authoring-bundle --api-name X
| Must be published to test
| Agent activated | Check activation status | Required for preview mode
| Dependencies deployed | Flows and Apex in org | Actions will fail without them
| Connected App (live)
| OAuth configured
| Required for --use-live-actions
Workflow (6-Phase Pattern)
Phase 1: Prerequisites
Use AskUserQuestion to gather:
-
Agent name/API name
-
Target org alias
-
Test mode (simulated vs live)
-
Coverage threshold (default: 80%)
-
Enable agentic fix loop?
Then:
-
Verify agent is published and activated
-
Check for existing test specs:
Glob: **/*.yaml,Glob: **/tests/*.yaml -
Create TodoWrite tasks
Phase 2: Test Spec Creation
Option A: Interactive Generation (no automation available)
# Interactive test spec generation
sf agent generate test-spec --output-file ./tests/agent-spec.yaml
# ⚠️ NOTE: There is NO --api-name flag! The command is interactive-only.
Option B: Automated Generation (Python script)
# Generate from agent file
python3 hooks/scripts/generate-test-spec.py \
--agent-file /path/to/Agent.agent \
--output tests/agent-spec.yaml \
--verbose
See Test Spec Reference for complete YAML format guide.
Create Test in Org:
sf agent test create --spec ./tests/agent-spec.yaml --api-name MyAgentTest --target-org [alias]
Phase 3: Test Execution
Automated Tests:
sf agent test run --api-name MyAgentTest --wait 10 --result-format json --target-org [alias]
Interactive Preview (Simulated):
sf agent preview --api-name AgentName --output-dir ./logs --target-org [alias]
Interactive Preview (Live):
sf agent preview --api-name AgentName --use-live-actions --client-app AppName --apex-debug --target-org [alias]
Phase 4: Results Analysis
Parse test results JSON and display formatted summary:
📊 AGENT TEST RESULTS
════════════════════════════════════════════════════════════════
Agent: Customer_Support_Agent
Org: my-sandbox
Duration: 45.2s
Mode: Simulated
SUMMARY
───────────────────────────────────────────────────────────────
✅ Passed: 18
❌ Failed: 2
⏭️ Skipped: 0
📈 Topic Selection: 95%
🎯 Action Invocation: 90%
FAILED TESTS
───────────────────────────────────────────────────────────────
❌ test_complex_order_inquiry
Utterance: "What's the status of orders 12345 and 67890?"
Expected: get_order_status invoked 2 times
Actual: get_order_status invoked 1 time
Category: ACTION_INVOCATION_COUNT_MISMATCH
COVERAGE SUMMARY
───────────────────────────────────────────────────────────────
Topics Tested: 4/5 (80%) ⚠️
Actions Tested: 6/8 (75%) ⚠️
Guardrails Tested: 3/3 (100%) ✅
Phase 5: Agentic Fix Loop
When tests fail, automatically fix via sf-ai-agentscript:
| TOPIC_NOT_MATCHED
| Topic description doesn't match utterance
| Add keywords to topic description
| ACTION_NOT_INVOKED
| Action description not triggered
| Improve action description
| WRONG_ACTION_SELECTED
| Wrong action chosen
| Differentiate descriptions
| ACTION_FAILED
| Flow/Apex error
| Delegate to sf-flow or sf-apex
| GUARDRAIL_NOT_TRIGGERED
| System instructions permissive
| Add explicit guardrails
Auto-Fix Command Example:
Skill(skill="sf-ai-agentscript", args="Fix agent [AgentName] - Error: [category] - [details]")
See Agentic Fix Loops Guide for:
-
Complete decision tree
-
Detailed fix strategies for each error type
-
Cross-skill orchestration workflow
-
Python scripts for automated testing
-
Example fix loop executions
Phase 6: Coverage Improvement
If coverage < threshold:
-
Identify untested topics/actions from results
-
Add test cases to spec YAML
-
Update test:
sf agent test create --spec ./tests/agent-spec.yaml --force-overwrite -
Re-run:
sf agent test run --api-name MyAgentTest --wait 10
Scoring System (100 Points)
| Topic Selection Coverage | 25 | All topics have test cases; various phrasings tested
| Action Invocation | 25 | All actions tested with valid inputs/outputs
| Edge Case Coverage | 20 | Negative tests; empty inputs; special characters; boundaries
| Test Spec Quality | 15 | Proper YAML; descriptions provided; categories assigned
| Agentic Fix Success | 15 | Auto-fixes resolve issues within 3 attempts
Scoring Thresholds:
⭐⭐⭐⭐⭐ 90-100 pts → Production Ready
⭐⭐⭐⭐ 80-89 pts → Good, minor improvements
⭐⭐⭐ 70-79 pts → Acceptable, needs work
⭐⭐ 60-69 pts → Below standard
⭐ <60 pts → BLOCKED - Major issues
⛔ TESTING GUARDRAILS (MANDATORY)
BEFORE running tests, verify:
| Agent published
| sf agent list --target-org [alias]
| Can't test unpublished agent
| Agent activated | Check status | Preview requires activation
| Flows deployed
| sf org list metadata --metadata-type Flow
| Actions need Flows
| Connected App (live) | Check OAuth | Live mode requires auth
NEVER do these:
| Test unpublished agent
| Tests fail silently
| Publish first: sf agent publish authoring-bundle
| Skip simulated testing | Live mode hides logic bugs | Always test simulated first
| Ignore guardrail tests | Security gaps in production | Always test harmful/off-topic inputs
| Single phrasing per topic | Misses routing failures | Test 3+ phrasings per topic
CLI Command Reference
Test Lifecycle Commands
| sf agent generate test-spec
| Create test YAML
| sf agent generate test-spec --output-dir ./tests
| sf agent test create
| Deploy test to org
| sf agent test create --spec ./tests/spec.yaml --target-org alias
| sf agent test run
| Execute tests
| sf agent test run --api-name Test --wait 10 --target-org alias
| sf agent test results
| Get results
| sf agent test results --job-id ID --result-format json
| sf agent test resume
| Resume async test
| sf agent test resume --use-most-recent --target-org alias
| sf agent test list
| List test runs
| sf agent test list --target-org alias
Preview Commands
| sf agent preview
| Interactive testing
| sf agent preview --api-name Agent --target-org alias
| --use-live-actions
| Use real Flows/Apex
| sf agent preview --use-live-actions --client-app App
| --output-dir
| Save transcripts
| sf agent preview --output-dir ./logs
| --apex-debug
| Capture debug logs
| sf agent preview --apex-debug
Result Formats
| human
| Terminal display (default)
| --result-format human
| json
| CI/CD parsing
| --result-format json
| junit
| Test reporting
| --result-format junit
| tap
| Test Anything Protocol
| --result-format tap
Test Spec Quick Reference
Basic Template:
subjectType: AGENT
subjectName: <Agent_Name>
testCases:
# Topic routing
- utterance: "What's on your menu?"
expectation:
topic: product_faq
actionSequence: []
# Action invocation
- utterance: "Search for Harry Potter books"
expectation:
topic: book_search
actionSequence:
- search_catalog
# Edge case
- utterance: ""
expectation:
graceful_handling: true
For complete YAML format reference, see Test Spec Reference
Cross-Skill Integration
Required Delegations:
| Fix agent script
| sf-ai-agentscript
| Skill(skill="sf-ai-agentscript", args="Fix...")
| Create test data
| sf-data
| Skill(skill="sf-data", args="Create...")
| Fix failing Flow
| sf-flow
| Skill(skill="sf-flow", args="Fix...")
| Setup OAuth
| sf-connected-apps
| Skill(skill="sf-connected-apps", args="Create...")
| Analyze debug logs
| sf-debug
| Skill(skill="sf-debug", args="Analyze...")
For complete orchestration workflow, see Agentic Fix Loops
Automated Testing (Python Scripts)
This skill includes Python scripts for fully automated agent testing:
| generate-test-spec.py
| Parse .agent files, generate YAML test specs
| run-automated-tests.py
| Orchestrate full test workflow with fix suggestions
Quick Usage:
# Generate test spec from agent file
python3 hooks/scripts/generate-test-spec.py \
--agent-file /path/to/Agent.agent \
--output specs/Agent-tests.yaml
# Run full automated workflow
python3 hooks/scripts/run-automated-tests.py \
--agent-name MyAgent \
--agent-dir /path/to/project \
--target-org dev
For complete documentation, see Agentic Fix Loops Guide
Templates Reference
| basic-test-spec.yaml
| Quick start (3-5 tests)
| templates/
| comprehensive-test-spec.yaml
| Full coverage (20+ tests)
| templates/
| guardrail-tests.yaml
| Security/safety scenarios
| templates/
| escalation-tests.yaml
| Human handoff scenarios
| templates/
| standard-test-spec.yaml
| Reference format
| templates/
💡 Key Insights
| sf agent test create fails
| "Required fields are missing: [MasterLabel]"
| Use sf agent generate test-spec (interactive) or UI instead
| Tests fail silently
| No results returned
| Agent not published - run sf agent publish authoring-bundle
| Topic not matched | Wrong topic selected | Add keywords to topic description (see Fix Loops)
| Action not invoked | Action never called | Improve action description, add explicit reference
| Live preview 401 | Authentication error | Connected App not configured - use sf-connected-apps
| Async tests stuck
| Job never completes
| Use sf agent test resume --use-most-recent
| Empty responses | Agent doesn't respond | Check agent is activated
| Agent Testing Center unavailable
| "INVALID_TYPE" error
| Use sf agent preview as fallback
| Topic expectation empty | Test always passes topic check | Bug in CLI YAML→XML conversion; use interactive mode
| ⚠️ --use-most-recent broken
| "Nonexistent flag" error on sf agent test results
| Use --job-id explicitly - the flag is documented but NOT implemented
| Topic name mismatch
| Expected GeneralCRM, got MigrationDefaultTopic
| Standard Salesforce copilots route to MigrationDefaultTopic - verify actual topic names from first test run
| Test data missing | "No matching records" in outcome | Verify test utterances reference records that actually exist in org (e.g., "Edge Communications" not "Acme")
| Action assertion fails unexpectedly
| Expected [A], actual [A,B] but marked PASS
| Action matching uses SUPERSET logic - actual can have MORE actions than expected and still pass
🔄 Two Fix Strategies
When agent tests fail, there are TWO valid approaches:
| Custom Agent (you control it)
| Fix the agent via sf-ai-agentforce
| Topic descriptions, action configurations need adjustment
| Managed/Standard Agent (Salesforce copilot) | Fix test expectations in YAML | Test expectations don't match actual agent behavior
Decision Flow:
Test Failed → Can you modify the agent?
│
┌────────┴────────┐
↓ ↓
YES NO
↓ ↓
Fix Agent Fix Test Spec
(sf-ai-agentforce) (update YAML)
Example: Fixing Test Expectations
# BEFORE (wrong expectations)
expectedTopic: GeneralCRM
expectedActions:
- IdentifyRecordByName
- GetRecordDetails
# AFTER (matches actual behavior)
expectedTopic: MigrationDefaultTopic
expectedActions:
- IdentifyRecordByName
- QueryRecords
🔄 Automated Test-Fix Loop
NEW in v1.1.0 | Claude Code can now orchestrate fully automated test-fix cycles
Overview
The test-fix loop enables Claude Code to:
-
Run tests →
sf agent test runwith JSON output -
Analyze failures → Parse results and categorize issues
-
Fix agent → Invoke
sf-ai-agentforceskill to apply fixes -
Retest → Loop until all tests pass or max retries (3) reached
-
Escalate → Skip unfixable tests and continue with others
Quick Start
# Run the test-fix loop
./hooks/scripts/test-fix-loop.sh Test_Agentforce_v1 AgentforceTesting 3
# Exit codes:
# 0 = All tests passed
# 1 = Fixes needed (Claude Code should invoke sf-ai-agentforce)
# 2 = Max attempts reached, escalate to human
# 3 = Error (org unreachable, test not found, etc.)
Claude Code Integration
When Claude Code runs the test-fix loop:
USER: Run automated test-fix loop for Coral_Cloud_Agent
CLAUDE CODE:
1. bash hooks/scripts/test-fix-loop.sh Test_Agentforce_v1 AgentforceTesting
2. If exit code 1 (FIX_NEEDED):
- Parse failure details from output
- Invoke: Skill(skill="sf-ai-agentscript", args="Fix topic X: add keyword Y")
- Re-run: CURRENT_ATTEMPT=2 bash hooks/scripts/test-fix-loop.sh ...
3. Repeat until exit code 0 (success) or 2 (max retries)
Ralph Wiggum Integration (Hands-Off)
For fully automated loops without user intervention:
/ralph-wiggum:ralph-loop
> Run agentic test-fix loop for Test_Agentforce_v1 in AgentforceTesting until all tests pass
Claude Code will autonomously:
-
Execute test-fix cycles
-
Apply fixes via sf-ai-agentscript skill
-
Track attempts and escalate when needed
-
Report final status
Failure Categories & Auto-Fix Strategies
| TOPIC_NOT_MATCHED
| ✅ Yes
| Add keywords to topic classificationDescription
| ACTION_NOT_INVOKED
| ✅ Yes
| Improve action description, add trigger conditions
| WRONG_ACTION_SELECTED
| ✅ Yes
| Differentiate action descriptions
| GUARDRAIL_NOT_TRIGGERED
| ✅ Yes
| Add explicit guardrails to system instructions
| ACTION_INVOCATION_FAILED
| ⚠️ Conditional
| Delegate to sf-flow or sf-apex skill
| RESPONSE_QUALITY_ISSUE
| ✅ Yes
| Add response format rules to topic instructions
Environment Variables
| CURRENT_ATTEMPT
| Current attempt number (auto-incremented)
| 1
| MAX_WAIT_MINUTES
| Timeout for test execution
| 10
| SKIP_TESTS
| Comma-separated test names to skip
| (none)
| VERBOSE
| Enable detailed output
| false
Machine-Readable Output
The script outputs structured data for Claude Code parsing:
---BEGIN_MACHINE_READABLE---
FIX_NEEDED: true
TEST_API_NAME: Test_Agentforce_v1
TARGET_ORG: AgentforceTesting
CURRENT_ATTEMPT: 1
MAX_ATTEMPTS: 3
NEXT_COMMAND: CURRENT_ATTEMPT=2 ./test-fix-loop.sh Test_Agentforce_v1 AgentforceTesting 3
---END_MACHINE_READABLE---
🐛 Known Issues & CLI Bugs
Last Updated: 2026-01-04 | Tested With: sf CLI v2.118.16
CRITICAL: sf agent test create MasterLabel Bug
Status: 🔴 BLOCKING - Prevents YAML-based test creation
Error:
Error (SfError): Required fields are missing: [MasterLabel]
Root Cause: The CLI generates XML from YAML but omits the required <name> element (MasterLabel).
Generated XML (broken):
<AiEvaluationDefinition xmlns="http://soap.sforce.com/2006/04/metadata">
<subjectName>My_Agent</subjectName>
<subjectType>AGENT</subjectType>
<!-- ❌ MISSING: <name>Test Name</name> -->
<testCase>...</testCase>
</AiEvaluationDefinition>
Working XML (from existing tests):
<AiEvaluationDefinition xmlns="http://soap.sforce.com/2006/04/metadata">
<description>Test description</description>
<name>Test Name</name> <!-- ✅ REQUIRED -->
<subjectName>My_Agent</subjectName>
<subjectType>AGENT</subjectType>
<testCase>...</testCase>
</AiEvaluationDefinition>
Workarounds:
-
✅ Use
sf agent generate test-spec --from-definitionto convert existing XML to YAML (produces correct format) -
✅ Use interactive
sf agent generate test-specwizard (works correctly) -
✅ Create tests via Salesforce Testing Center UI
-
✅ Deploy XML metadata directly (bypass YAML conversion)
MEDIUM: Interactive Mode Not Scriptable
Status: 🟡 Blocks CI/CD automation
Issue: sf agent generate test-spec only works interactively:
-
No
--quiet,--json, or non-interactive flags -
Piped input causes "User force closed the prompt" error
-
Cannot automate in CI/CD pipelines
What Works:
# Interactive (requires terminal)
sf agent generate test-spec --output-file ./tests/my-test.yaml
# Convert existing XML to YAML (non-interactive)
sf agent generate test-spec --from-definition path/to/test.xml --output-file ./output.yaml
Workaround: Use Python scripts in hooks/scripts/ to generate YAML programmatically.
MEDIUM: YAML vs XML Format Discrepancy
Issue: Documentation shows one YAML format, but Salesforce stores as different XML structure.
Doc Shows (doesn't map correctly):
testCases:
- utterance: "Hello"
expectation:
topic: Welcome
actionSequence: []
Actual Working Format (from --from-definition):
testCases:
- utterance: "Hello"
expectedTopic: Welcome
expectedActions: []
expectedOutcome: "Greeting response shown"
Key Mappings:
| expectedTopic
| <expectation><name>topic_sequence_match</name><expectedValue>...</expectedValue>
| expectedActions
| <expectation><name>action_sequence_match</name><expectedValue>[...]</expectedValue>
| expectedOutcome
| <expectation><name>bot_response_rating</name><expectedValue>...</expectedValue>
LOW: Expectation Name Variations
Issue: Different test creation methods use different expectation names:
| topic_assertion
| topic_sequence_match
| actions_assertion
| action_sequence_match
| output_validation
| bot_response_rating
Impact: May cause confusion when comparing test results from different sources.
Quick Start Example
# 1. Check if Agent Testing Center is enabled
sf agent test list --target-org dev
# 2. Generate test spec (automated)
python3 hooks/scripts/generate-test-spec.py \
--agent-file ./agents/MyAgent.agent \
--output ./tests/myagent-tests.yaml
# 3. Create test in org
sf agent test create \
--spec ./tests/myagent-tests.yaml \
--api-name MyAgentTest \
--target-org dev
# 4. Run tests
sf agent test run \
--api-name MyAgentTest \
--wait 10 \
--result-format json \
--target-org dev
# 5. View results
sf agent test results \
--use-most-recent \
--verbose \
--result-format json \
--target-org dev
For complete workflows and fix loops, see:
-
Agentic Fix Loops - Automated testing and fix workflows
-
Test Spec Reference - Complete YAML format guide
License
MIT License. See LICENSE file. Copyright (c) 2024-2025 Jag Valaiyapathy