This skill helps you create agentic outside-in tests that verify application behavior from an external user's perspective without any knowledge of internal implementation. Using the gadugi-agentic-test framework, you write declarative YAML scenarios that AI agents execute, observe, and validate.
Key Principle: Tests describe WHAT should happen, not HOW it's implemented. Agents figure out the execution details.
When to Use This Skill [LEVEL 1]
Perfect For
-
Smoke Tests: Quick validation that critical user flows work
-
Behavior-Driven Testing: Verify features from user perspective
-
Cross-Platform Testing: Same test logic for CLI, TUI, Web, Electron
-
Refactoring Safety: Tests remain valid when implementation changes
-
AI-Powered Testing: Let agents handle complex interactions
-
Documentation as Tests: YAML scenarios double as executable specs
Use This Skill When
-
Starting a new project and defining expected behaviors
-
Refactoring code and need tests that won't break with internal changes
-
Testing user-facing applications (CLI tools, TUIs, web apps, desktop apps)
-
Writing acceptance criteria that can be automatically verified
-
Need tests that non-developers can read and understand
-
Want to catch regressions in critical user workflows
-
Testing complex multi-step interactions
Don't Use This Skill When
-
Need unit tests for internal functions (use test-gap-analyzer instead)
-
Testing performance or load characteristics
-
Need precise timing or concurrency control
-
Testing non-interactive batch processes
-
Implementation details matter more than behavior
Core Concepts [LEVEL 1]
Outside-In Testing Philosophy
Traditional Inside-Out Testing:
# Tightly coupled to implementation
def test_calculator_add():
calc = Calculator()
result = calc.add(2, 3)
assert result == 5
assert calc.history == [(2, 3, 5)] # Knows internal state
Agentic Outside-In Testing:
# Implementation-agnostic behavior verification
scenario:
name: "Calculator Addition"
steps:
- action: launch
target: "./calculator"
- action: send_input
value: "add 2 3"
- action: verify_output
contains: "Result: 5"
Benefits:
-
Tests survive refactoring (internal changes don't break tests)
-
Readable by non-developers (YAML is declarative)
-
Platform-agnostic (same structure for CLI/TUI/Web/Electron)
-
AI agents handle complexity (navigation, timing, screenshots)
The Gadugi Agentic Test Framework [LEVEL 2]
Gadugi-agentic-test is a Python framework that:
-
Parses YAML test scenarios with declarative steps
-
Dispatches to specialized agents (CLI, TUI, Web, Electron agents)
-
Executes actions (launch, input, click, wait, verify)
-
Collects evidence (screenshots, logs, output captures)
-
Validates outcomes against expected results
-
Generates reports with evidence trails
Architecture:
YAML Scenario → Scenario Loader → Agent Dispatcher → Execution Engine
↓
[CLI Agent, TUI Agent, Web Agent, Electron Agent]
↓
Observers → Comprehension Agent
↓
Evidence Report
Progressive Disclosure Levels [LEVEL 1]
This skill teaches testing in three levels:
-
Level 1: Fundamentals - Basic single-action tests, simple verification
-
Level 2: Intermediate - Multi-step flows, conditional logic, error handling
-
Level 3: Advanced - Custom agents, visual regression, performance validation
Each example is marked with its level. Start at Level 1 and progress as needed.
Quick Start [LEVEL 1]
Installation
# Install gadugi-agentic-test framework
pip install gadugi-agentic-test
# Verify installation
gadugi-agentic-test --version
Your First Test (CLI Example)
Create test-hello.yaml:
scenario:
name: "Hello World CLI Test"
description: "Verify CLI prints greeting"
type: cli
prerequisites:
- "./hello-world executable exists"
steps:
- action: launch
target: "./hello-world"
- action: verify_output
contains: "Hello, World!"
- action: verify_exit_code
expected: 0
Run the test:
gadugi-agentic-test run test-hello.yaml
Output:
✓ Scenario: Hello World CLI Test
✓ Step 1: Launched ./hello-world
✓ Step 2: Output contains "Hello, World!"
✓ Step 3: Exit code is 0
PASSED (3/3 steps successful)
Evidence saved to: ./evidence/test-hello-20250116-093045/
Understanding the YAML Structure [LEVEL 1]
Every test scenario has this structure:
scenario:
name: "Descriptive test name"
description: "What this test verifies"
type: cli | tui | web | electron
# Optional metadata
tags: [smoke, critical, auth]
timeout: 30s
# What must be true before test runs
prerequisites:
- "Condition 1"
- "Condition 2"
# The test steps (executed sequentially)
steps:
- action: action_name
parameter1: value1
parameter2: value2
- action: verify_something
expected: value
# Optional cleanup
cleanup:
- action: stop_application
Application Types and Agents [LEVEL 2]
CLI Applications [LEVEL 1]
Use Case: Command-line tools, scripts, build tools, package managers
Supported Actions:
-
launch- Start the CLI program -
send_input- Send text or commands via stdin -
send_signal- Send OS signals (SIGINT, SIGTERM) -
wait_for_output- Wait for specific text in stdout/stderr -
verify_output- Check stdout/stderr contains/matches expected text -
verify_exit_code- Validate process exit code -
capture_output- Save output for later verification
Example (see examples/cli/calculator-basic.yaml):
scenario:
name: "CLI Calculator Basic Operations"
type: cli
steps:
- action: launch
target: "./calculator"
args: ["--mode", "interactive"]
- action: send_input
value: "add 5 3\n"
- action: verify_output
contains: "Result: 8"
timeout: 2s
- action: send_input
value: "multiply 4 7\n"
- action: verify_output
contains: "Result: 28"
- action: send_input
value: "exit\n"
- action: verify_exit_code
expected: 0
TUI Applications [LEVEL 1]
Use Case: Terminal user interfaces (htop, vim, tmux, custom dashboard TUIs)
Supported Actions:
-
launch- Start TUI application -
send_keypress- Send keyboard input (arrow keys, enter, ctrl+c, etc.) -
wait_for_screen- Wait for specific text to appear on screen -
verify_screen- Check screen contents match expectations -
capture_screenshot- Save terminal screenshot (ANSI art) -
navigate_menu- Navigate menu structures -
fill_form- Fill TUI form fields
Example (see examples/tui/file-manager-navigation.yaml):
scenario:
name: "TUI File Manager Navigation"
type: tui
steps:
- action: launch
target: "./file-manager"
- action: wait_for_screen
contains: "File Manager v1.0"
timeout: 3s
- action: send_keypress
value: "down"
times: 3
- action: verify_screen
contains: "> documents/"
description: "Third item should be selected"
- action: send_keypress
value: "enter"
- action: wait_for_screen
contains: "documents/"
timeout: 2s
- action: capture_screenshot
save_as: "documents-view.txt"
Web Applications [LEVEL 1]
Use Case: Web apps, dashboards, SPAs, admin panels
Supported Actions:
-
navigate- Go to URL -
click- Click element by selector or text -
type- Type into input fields -
wait_for_element- Wait for element to appear -
verify_element- Check element exists/contains text -
verify_url- Validate current URL -
screenshot- Capture browser screenshot -
scroll- Scroll page or element
Example (see examples/web/dashboard-smoke-test.yaml):
scenario:
name: "Dashboard Smoke Test"
type: web
steps:
- action: navigate
url: "http://localhost:3000/dashboard"
- action: wait_for_element
selector: "h1.dashboard-title"
timeout: 5s
- action: verify_element
selector: "h1.dashboard-title"
contains: "Analytics Dashboard"
- action: verify_element
selector: ".widget-stats"
count: 4
description: "Should have 4 stat widgets"
- action: click
selector: "button.refresh-data"
- action: wait_for_element
selector: ".loading-spinner"
disappears: true
timeout: 10s
- action: screenshot
save_as: "dashboard-loaded.png"
Electron Applications [LEVEL 2]
Use Case: Desktop apps built with Electron (VS Code, Slack, Discord clones)
Supported Actions:
-
launch- Start Electron app -
window_action- Interact with windows (focus, minimize, close) -
menu_click- Click application menu items -
dialog_action- Handle native dialogs (open file, save, confirm) -
ipc_send- Send IPC message to main process -
verify_window- Check window state/properties -
All web actions (since Electron uses Chromium)
Example (see examples/electron/single-window-basic.yaml):
scenario:
name: "Electron Single Window Test"
type: electron
steps:
- action: launch
target: "./dist/my-app"
wait_for_window: true
timeout: 10s
- action: verify_window
title: "My Application"
visible: true
- action: menu_click
path: ["File", "New Document"]
- action: wait_for_element
selector: ".document-editor"
- action: type
selector: ".document-editor"
value: "Hello from test"
- action: menu_click
path: ["File", "Save"]
- action: dialog_action
type: save_file
filename: "test-document.txt"
- action: verify_window
title_contains: "test-document.txt"
Test Scenario Anatomy [LEVEL 2]
Metadata Section
scenario:
name: "Clear descriptive name"
description: "Detailed explanation of what this test verifies"
type: cli | tui | web | electron
# Optional fields
tags: [smoke, regression, auth, payment]
priority: high | medium | low
timeout: 60s # Overall scenario timeout
retry_on_failure: 2 # Retry count
# Environment requirements
environment:
variables:
API_URL: "http://localhost:8080"
DEBUG: "true"
files:
- "./config.json must exist"
Prerequisites
Prerequisites are conditions that must be true before the test runs. The framework validates these before execution.
prerequisites:
- "./application binary exists"
- "Port 8080 is available"
- "Database is running"
- "User account test@example.com exists"
- "File ./test-data.json exists"
If prerequisites fail, the test is skipped (not failed).
Steps
Steps execute sequentially. Each step has:
-
action: Required - the action to perform
-
Parameters: Action-specific parameters
-
description: Optional - human-readable explanation
-
timeout: Optional - step-specific timeout
-
continue_on_failure: Optional - don't fail scenario if step fails
steps:
# Simple action
- action: launch
target: "./app"
# Action with multiple parameters
- action: verify_output
contains: "Success"
timeout: 5s
description: "App should print success message"
# Continue even if this fails
- action: click
selector: ".optional-button"
continue_on_failure: true
Verification Actions [LEVEL 1]
Verification actions check expected outcomes. They fail the test if expectations aren't met.
Common Verifications:
# CLI: Check output contains text
- action: verify_output
contains: "Expected text"
# CLI: Check output matches regex
- action: verify_output
matches: "Result: \\d+"
# CLI: Check exit code
- action: verify_exit_code
expected: 0
# Web/TUI: Check element exists
- action: verify_element
selector: ".success-message"
# Web/TUI: Check element contains text
- action: verify_element
selector: "h1"
contains: "Welcome"
# Web: Check URL
- action: verify_url
equals: "http://localhost:3000/dashboard"
# Web: Check element count
- action: verify_element
selector: ".list-item"
count: 5
# Electron: Check window state
- action: verify_window
title: "My App"
visible: true
focused: true
Cleanup Section
Cleanup runs after all steps complete (success or failure). Use for teardown actions.
cleanup:
- action: stop_application
force: true
- action: delete_file
path: "./temp-test-data.json"
- action: reset_database
connection: "test_db"
Advanced Patterns [LEVEL 2]
Conditional Logic
Execute steps based on conditions:
steps:
- action: launch
target: "./app"
- action: verify_output
contains: "Login required"
id: login_check
# Only run if login_check passed
- action: send_input
value: "login admin password123\n"
condition: login_check.passed
Variables and Templating [LEVEL 2]
Define variables and use them throughout the scenario:
scenario:
name: "Test with Variables"
type: cli
variables:
username: "testuser"
api_url: "http://localhost:8080"
steps:
- action: launch
target: "./app"
args: ["--api", "${api_url}"]
- action: send_input
value: "login ${username}\n"
- action: verify_output
contains: "Welcome, ${username}!"
Loops and Repetition [LEVEL 2]
Repeat actions multiple times:
steps:
- action: launch
target: "./app"
# Repeat action N times
- action: send_keypress
value: "down"
times: 5
# Loop over list
- action: send_input
value: "${item}\n"
for_each:
- "apple"
- "banana"
- "cherry"
Error Handling [LEVEL 2]
Handle expected errors gracefully:
steps:
- action: send_input
value: "invalid command\n"
# Verify error message appears
- action: verify_output
contains: "Error: Unknown command"
expected_failure: true
# App should still be running
- action: verify_running
expected: true
Multi-Step Workflows [LEVEL 2]
Complex scenarios with multiple phases:
scenario:
name: "E-commerce Purchase Flow"
type: web
steps:
# Phase 1: Authentication
- action: navigate
url: "http://localhost:3000/login"
- action: type
selector: "#username"
value: "test@example.com"
- action: type
selector: "#password"
value: "password123"
- action: click
selector: "button[type=submit]"
- action: wait_for_url
contains: "/dashboard"
# Phase 2: Product Selection
- action: navigate
url: "http://localhost:3000/products"
- action: click
text: "Add to Cart"
nth: 1
- action: verify_element
selector: ".cart-badge"
contains: "1"
# Phase 3: Checkout
- action: click
selector: ".cart-icon"
- action: click
text: "Proceed to Checkout"
- action: fill_form
fields:
"#shipping-address": "123 Test St"
"#city": "Testville"
"#zip": "12345"
- action: click
selector: "#place-order"
- action: wait_for_element
selector: ".order-confirmation"
timeout: 10s
- action: verify_element
selector: ".order-number"
exists: true
Level 3: Advanced Topics [LEVEL 3]
Custom Comprehension Agents
The framework uses AI agents to interpret application output and determine if tests pass. You can customize these agents for domain-specific logic.
Default Comprehension Agent:
-
Observes raw output (text, HTML, screenshots)
-
Applies general reasoning to verify expectations
-
Returns pass/fail with explanation
Custom Comprehension Agent (see examples/custom-agents/custom-comprehension-agent.yaml):
scenario:
name: "Financial Dashboard Test with Custom Agent"
type: web
# Define custom comprehension logic
comprehension_agent:
model: "gpt-4"
system_prompt: |
You are a financial data validator. When verifying dashboard content:
1. All monetary values must use proper formatting ($1,234.56)
2. Percentages must include % symbol
3. Dates must be in MM/DD/YYYY format
4. Negative values must be red
5. Chart data must be logically consistent
Be strict about formatting and data consistency.
examples:
- input: "Total Revenue: 45000"
output: "FAIL - Missing currency symbol and comma separator"
- input: "Total Revenue: $45,000.00"
output: "PASS - Correctly formatted"
steps:
- action: navigate
url: "http://localhost:3000/financial-dashboard"
- action: verify_element
selector: ".revenue-widget"
use_custom_comprehension: true
description: "Revenue should be properly formatted"
Visual Regression Testing [LEVEL 3]
Compare screenshots against baseline images:
scenario:
name: "Visual Regression - Homepage"
type: web
steps:
- action: navigate
url: "http://localhost:3000"
- action: wait_for_element
selector: ".page-loaded"
- action: screenshot
save_as: "homepage.png"
- action: visual_compare
screenshot: "homepage.png"
baseline: "./baselines/homepage-baseline.png"
threshold: 0.05 # 5% difference allowed
highlight_differences: true
Performance Validation [LEVEL 3]
Measure and validate performance metrics:
scenario:
name: "Performance - Dashboard Load Time"
type: web
performance:
metrics:
- page_load_time
- first_contentful_paint
- time_to_interactive
steps:
- action: navigate
url: "http://localhost:3000/dashboard"
measure_timing: true
- action: verify_performance
metric: page_load_time
less_than: 3000 # 3 seconds
- action: verify_performance
metric: first_contentful_paint
less_than: 1500 # 1.5 seconds
Multi-Window Coordination (Electron) [LEVEL 3]
Test applications with multiple windows:
scenario:
name: "Multi-Window Chat Application"
type: electron
steps:
- action: launch
target: "./chat-app"
- action: menu_click
path: ["Window", "New Chat"]
- action: verify_window
count: 2
- action: window_action
window: 1
action: focus
- action: type
selector: ".message-input"
value: "Hello from window 1"
- action: click
selector: ".send-button"
- action: window_action
window: 2
action: focus
- action: wait_for_element
selector: ".message"
contains: "Hello from window 1"
timeout: 5s
IPC Testing (Electron) [LEVEL 3]
Test Inter-Process Communication between renderer and main:
scenario:
name: "Electron IPC Communication"
type: electron
steps:
- action: launch
target: "./my-app"
- action: ipc_send
channel: "get-system-info"
- action: ipc_expect
channel: "system-info-reply"
timeout: 3s
- action: verify_ipc_payload
contains:
platform: "darwin"
arch: "x64"
Custom Reporters [LEVEL 3]
Generate custom test reports:
scenario:
name: "Test with Custom Reporting"
type: cli
reporting:
format: custom
template: "./report-template.html"
include:
- screenshots
- logs
- timing_data
- video_recording
email:
enabled: true
recipients: ["team@example.com"]
on_failure_only: true
steps:
# ... test steps ...
Framework Integration [LEVEL 2]
Running Tests
Single test:
gadugi-agentic-test run test-scenario.yaml
Multiple tests:
gadugi-agentic-test run tests/*.yaml
With options:
gadugi-agentic-test run test.yaml \
--verbose \
--evidence-dir ./test-evidence \
--retry 2 \
--timeout 60s
CI/CD Integration
GitHub Actions (.github/workflows/agentic-tests.yml):
name: Agentic Tests
on: [push, pull_request]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Install gadugi-agentic-test
run: pip install gadugi-agentic-test
- name: Run tests
run: gadugi-agentic-test run tests/agentic/*.yaml
- name: Upload evidence
if: always()
uses: actions/upload-artifact@v3
with:
name: test-evidence
path: ./evidence/
Evidence Collection
The framework automatically collects evidence for debugging:
evidence/
scenario-name-20250116-093045/
├── scenario.yaml # Original test scenario
├── execution-log.json # Detailed execution log
├── screenshots/ # All captured screenshots
│ ├── step-1.png
│ ├── step-3.png
│ └── step-5.png
├── output-captures/ # CLI/TUI output
│ ├── stdout.txt
│ └── stderr.txt
├── timing.json # Performance metrics
└── report.html # Human-readable report
Best Practices [LEVEL 2]
1. Start Simple, Add Complexity
Begin with basic smoke tests, then add detail:
# Level 1: Basic smoke test
steps:
- action: launch
target: "./app"
- action: verify_output
contains: "Ready"
# Level 2: Add interaction
steps:
- action: launch
target: "./app"
- action: send_input
value: "command\n"
- action: verify_output
contains: "Success"
# Level 3: Add error handling and edge cases
steps:
- action: launch
target: "./app"
- action: send_input
value: "invalid\n"
- action: verify_output
contains: "Error"
- action: send_input
value: "command\n"
- action: verify_output
contains: "Success"
2. Use Descriptive Names and Descriptions
# Bad
scenario:
name: "Test 1"
steps:
- action: click
selector: "button"
# Good
scenario:
name: "User Login Flow - Valid Credentials"
description: "Verifies user can log in with valid email and password"
steps:
- action: click
selector: "button[type=submit]"
description: "Submit login form"
3. Verify Critical Paths Only
Don't test every tiny detail. Focus on user-facing behavior:
# Bad - Tests implementation details
- action: verify_element
selector: ".internal-cache-status"
contains: "initialized"
# Good - Tests user-visible behavior
- action: verify_element
selector: ".welcome-message"
contains: "Welcome back"
4. Use Prerequisites for Test Dependencies
scenario:
name: "User Profile Edit"
prerequisites:
- "User testuser@example.com exists"
- "User is logged in"
- "Database is seeded with test data"
steps:
# Test assumes prerequisites are met
- action: navigate
url: "/profile"
5. Keep Tests Independent
Each test should set up its own state and clean up:
scenario: name: "Create Document"
steps: # Create test user (don't assume exists) - action: api_call endpoint: "/api/users" method: POST data: { email: "test@example.com" }