This skill helps you create agentic outside-in tests that verify application behavior from an external user's perspective without any knowledge of internal implementation. Using the gadugi-agentic-test framework, you write declarative YAML scenarios that AI agents execute, observe, and validate.

Key Principle: Tests describe WHAT should happen, not HOW it's implemented. Agents figure out the execution details.

When to Use This Skill [LEVEL 1]

Perfect For

Smoke Tests: Quick validation that critical user flows work
Behavior-Driven Testing: Verify features from user perspective
Cross-Platform Testing: Same test logic for CLI, TUI, Web, Electron
Refactoring Safety: Tests remain valid when implementation changes
AI-Powered Testing: Let agents handle complex interactions
Documentation as Tests: YAML scenarios double as executable specs

Use This Skill When

Starting a new project and defining expected behaviors
Refactoring code and need tests that won't break with internal changes
Testing user-facing applications (CLI tools, TUIs, web apps, desktop apps)
Writing acceptance criteria that can be automatically verified
Need tests that non-developers can read and understand
Want to catch regressions in critical user workflows
Testing complex multi-step interactions

Don't Use This Skill When

Need unit tests for internal functions (use test-gap-analyzer instead)
Testing performance or load characteristics
Need precise timing or concurrency control
Testing non-interactive batch processes
Implementation details matter more than behavior

Core Concepts [LEVEL 1]

Outside-In Testing Philosophy

Traditional Inside-Out Testing:

# Tightly coupled to implementation
def test_calculator_add():
    calc = Calculator()
    result = calc.add(2, 3)
    assert result == 5
    assert calc.history == [(2, 3, 5)]  # Knows internal state

Agentic Outside-In Testing:

# Implementation-agnostic behavior verification
scenario:
  name: "Calculator Addition"
  steps:
    - action: launch
      target: "./calculator"
    - action: send_input
      value: "add 2 3"
    - action: verify_output
      contains: "Result: 5"

Benefits:

Tests survive refactoring (internal changes don't break tests)
Readable by non-developers (YAML is declarative)
Platform-agnostic (same structure for CLI/TUI/Web/Electron)
AI agents handle complexity (navigation, timing, screenshots)

The Gadugi Agentic Test Framework [LEVEL 2]

Gadugi-agentic-test is a Python framework that:

Parses YAML test scenarios with declarative steps
Dispatches to specialized agents (CLI, TUI, Web, Electron agents)
Executes actions (launch, input, click, wait, verify)
Collects evidence (screenshots, logs, output captures)
Validates outcomes against expected results
Generates reports with evidence trails

Architecture:

YAML Scenario → Scenario Loader → Agent Dispatcher → Execution Engine
                                          ↓
                     [CLI Agent, TUI Agent, Web Agent, Electron Agent]
                                          ↓
                           Observers → Comprehension Agent
                                          ↓
                                   Evidence Report

Progressive Disclosure Levels [LEVEL 1]

This skill teaches testing in three levels:

Level 1: Fundamentals - Basic single-action tests, simple verification
Level 2: Intermediate - Multi-step flows, conditional logic, error handling
Level 3: Advanced - Custom agents, visual regression, performance validation

Each example is marked with its level. Start at Level 1 and progress as needed.

Quick Start [LEVEL 1]

Installation

# Install gadugi-agentic-test framework
pip install gadugi-agentic-test

# Verify installation
gadugi-agentic-test --version

Your First Test (CLI Example)

Create test-hello.yaml:

scenario:
  name: "Hello World CLI Test"
  description: "Verify CLI prints greeting"
  type: cli

  prerequisites:
    - "./hello-world executable exists"

  steps:
    - action: launch
      target: "./hello-world"

    - action: verify_output
      contains: "Hello, World!"

    - action: verify_exit_code
      expected: 0

Run the test:

gadugi-agentic-test run test-hello.yaml

Output:

✓ Scenario: Hello World CLI Test
  ✓ Step 1: Launched ./hello-world
  ✓ Step 2: Output contains "Hello, World!"
  ✓ Step 3: Exit code is 0

PASSED (3/3 steps successful)
Evidence saved to: ./evidence/test-hello-20250116-093045/

Understanding the YAML Structure [LEVEL 1]

Every test scenario has this structure:

scenario:
  name: "Descriptive test name"
  description: "What this test verifies"
  type: cli | tui | web | electron

  # Optional metadata
  tags: [smoke, critical, auth]
  timeout: 30s

  # What must be true before test runs
  prerequisites:
    - "Condition 1"
    - "Condition 2"

  # The test steps (executed sequentially)
  steps:
    - action: action_name
      parameter1: value1
      parameter2: value2

    - action: verify_something
      expected: value

  # Optional cleanup
  cleanup:
    - action: stop_application

Application Types and Agents [LEVEL 2]

CLI Applications [LEVEL 1]

Use Case: Command-line tools, scripts, build tools, package managers

Supported Actions:

launch - Start the CLI program
send_input - Send text or commands via stdin
send_signal - Send OS signals (SIGINT, SIGTERM)
wait_for_output - Wait for specific text in stdout/stderr
verify_output - Check stdout/stderr contains/matches expected text
verify_exit_code - Validate process exit code
capture_output - Save output for later verification

Example (see examples/cli/calculator-basic.yaml):

scenario:
  name: "CLI Calculator Basic Operations"
  type: cli

  steps:
    - action: launch
      target: "./calculator"
      args: ["--mode", "interactive"]

    - action: send_input
      value: "add 5 3\n"

    - action: verify_output
      contains: "Result: 8"
      timeout: 2s

    - action: send_input
      value: "multiply 4 7\n"

    - action: verify_output
      contains: "Result: 28"

    - action: send_input
      value: "exit\n"

    - action: verify_exit_code
      expected: 0

TUI Applications [LEVEL 1]

Use Case: Terminal user interfaces (htop, vim, tmux, custom dashboard TUIs)

Supported Actions:

launch - Start TUI application
send_keypress - Send keyboard input (arrow keys, enter, ctrl+c, etc.)
wait_for_screen - Wait for specific text to appear on screen
verify_screen - Check screen contents match expectations
capture_screenshot - Save terminal screenshot (ANSI art)
navigate_menu - Navigate menu structures
fill_form - Fill TUI form fields

Example (see examples/tui/file-manager-navigation.yaml):

scenario:
  name: "TUI File Manager Navigation"
  type: tui

  steps:
    - action: launch
      target: "./file-manager"

    - action: wait_for_screen
      contains: "File Manager v1.0"
      timeout: 3s

    - action: send_keypress
      value: "down"
      times: 3

    - action: verify_screen
      contains: "> documents/"
      description: "Third item should be selected"

    - action: send_keypress
      value: "enter"

    - action: wait_for_screen
      contains: "documents/"
      timeout: 2s

    - action: capture_screenshot
      save_as: "documents-view.txt"

Web Applications [LEVEL 1]

Use Case: Web apps, dashboards, SPAs, admin panels

Supported Actions:

navigate - Go to URL
click - Click element by selector or text
type - Type into input fields
wait_for_element - Wait for element to appear
verify_element - Check element exists/contains text
verify_url - Validate current URL
screenshot - Capture browser screenshot
scroll - Scroll page or element

Example (see examples/web/dashboard-smoke-test.yaml):

scenario:
  name: "Dashboard Smoke Test"
  type: web

  steps:
    - action: navigate
      url: "http://localhost:3000/dashboard"

    - action: wait_for_element
      selector: "h1.dashboard-title"
      timeout: 5s

    - action: verify_element
      selector: "h1.dashboard-title"
      contains: "Analytics Dashboard"

    - action: verify_element
      selector: ".widget-stats"
      count: 4
      description: "Should have 4 stat widgets"

    - action: click
      selector: "button.refresh-data"

    - action: wait_for_element
      selector: ".loading-spinner"
      disappears: true
      timeout: 10s

    - action: screenshot
      save_as: "dashboard-loaded.png"

Electron Applications [LEVEL 2]

Use Case: Desktop apps built with Electron (VS Code, Slack, Discord clones)

Supported Actions:

launch - Start Electron app
window_action - Interact with windows (focus, minimize, close)
menu_click - Click application menu items
dialog_action - Handle native dialogs (open file, save, confirm)
ipc_send - Send IPC message to main process
verify_window - Check window state/properties
All web actions (since Electron uses Chromium)

Example (see examples/electron/single-window-basic.yaml):

scenario:
  name: "Electron Single Window Test"
  type: electron

  steps:
    - action: launch
      target: "./dist/my-app"
      wait_for_window: true
      timeout: 10s

    - action: verify_window
      title: "My Application"
      visible: true

    - action: menu_click
      path: ["File", "New Document"]

    - action: wait_for_element
      selector: ".document-editor"

    - action: type
      selector: ".document-editor"
      value: "Hello from test"

    - action: menu_click
      path: ["File", "Save"]

    - action: dialog_action
      type: save_file
      filename: "test-document.txt"

    - action: verify_window
      title_contains: "test-document.txt"

Test Scenario Anatomy [LEVEL 2]

Metadata Section

scenario:
  name: "Clear descriptive name"
  description: "Detailed explanation of what this test verifies"
  type: cli | tui | web | electron

  # Optional fields
  tags: [smoke, regression, auth, payment]
  priority: high | medium | low
  timeout: 60s # Overall scenario timeout
  retry_on_failure: 2 # Retry count

  # Environment requirements
  environment:
    variables:
      API_URL: "http://localhost:8080"
      DEBUG: "true"
    files:
      - "./config.json must exist"

Prerequisites

Prerequisites are conditions that must be true before the test runs. The framework validates these before execution.

prerequisites:
  - "./application binary exists"
  - "Port 8080 is available"
  - "Database is running"
  - "User account test@example.com exists"
  - "File ./test-data.json exists"

If prerequisites fail, the test is skipped (not failed).

Steps

Steps execute sequentially. Each step has:

action: Required - the action to perform
Parameters: Action-specific parameters
description: Optional - human-readable explanation
timeout: Optional - step-specific timeout
continue_on_failure: Optional - don't fail scenario if step fails

steps:
  # Simple action
  - action: launch
    target: "./app"

  # Action with multiple parameters
  - action: verify_output
    contains: "Success"
    timeout: 5s
    description: "App should print success message"

  # Continue even if this fails
  - action: click
    selector: ".optional-button"
    continue_on_failure: true

Verification Actions [LEVEL 1]

Verification actions check expected outcomes. They fail the test if expectations aren't met.

Common Verifications:

# CLI: Check output contains text
- action: verify_output
  contains: "Expected text"

# CLI: Check output matches regex
- action: verify_output
  matches: "Result: \\d+"

# CLI: Check exit code
- action: verify_exit_code
  expected: 0

# Web/TUI: Check element exists
- action: verify_element
  selector: ".success-message"

# Web/TUI: Check element contains text
- action: verify_element
  selector: "h1"
  contains: "Welcome"

# Web: Check URL
- action: verify_url
  equals: "http://localhost:3000/dashboard"

# Web: Check element count
- action: verify_element
  selector: ".list-item"
  count: 5

# Electron: Check window state
- action: verify_window
  title: "My App"
  visible: true
  focused: true

Cleanup Section

Cleanup runs after all steps complete (success or failure). Use for teardown actions.

cleanup:
  - action: stop_application
    force: true

  - action: delete_file
    path: "./temp-test-data.json"

  - action: reset_database
    connection: "test_db"

Advanced Patterns [LEVEL 2]

Conditional Logic

Execute steps based on conditions:

steps:
  - action: launch
    target: "./app"

  - action: verify_output
    contains: "Login required"
    id: login_check

  # Only run if login_check passed
  - action: send_input
    value: "login admin password123\n"
    condition: login_check.passed

Variables and Templating [LEVEL 2]

Define variables and use them throughout the scenario:

scenario:
  name: "Test with Variables"
  type: cli

  variables:
    username: "testuser"
    api_url: "http://localhost:8080"

  steps:
    - action: launch
      target: "./app"
      args: ["--api", "${api_url}"]

    - action: send_input
      value: "login ${username}\n"

    - action: verify_output
      contains: "Welcome, ${username}!"

Loops and Repetition [LEVEL 2]

Repeat actions multiple times:

steps:
  - action: launch
    target: "./app"

  # Repeat action N times
  - action: send_keypress
    value: "down"
    times: 5

  # Loop over list
  - action: send_input
    value: "${item}\n"
    for_each:
      - "apple"
      - "banana"
      - "cherry"

Error Handling [LEVEL 2]

Handle expected errors gracefully:

steps:
  - action: send_input
    value: "invalid command\n"

  # Verify error message appears
  - action: verify_output
    contains: "Error: Unknown command"
    expected_failure: true

  # App should still be running
  - action: verify_running
    expected: true

Multi-Step Workflows [LEVEL 2]

Complex scenarios with multiple phases:

scenario:
  name: "E-commerce Purchase Flow"
  type: web

  steps:
    # Phase 1: Authentication
    - action: navigate
      url: "http://localhost:3000/login"

    - action: type
      selector: "#username"
      value: "test@example.com"

    - action: type
      selector: "#password"
      value: "password123"

    - action: click
      selector: "button[type=submit]"

    - action: wait_for_url
      contains: "/dashboard"

    # Phase 2: Product Selection
    - action: navigate
      url: "http://localhost:3000/products"

    - action: click
      text: "Add to Cart"
      nth: 1

    - action: verify_element
      selector: ".cart-badge"
      contains: "1"

    # Phase 3: Checkout
    - action: click
      selector: ".cart-icon"

    - action: click
      text: "Proceed to Checkout"

    - action: fill_form
      fields:
        "#shipping-address": "123 Test St"
        "#city": "Testville"
        "#zip": "12345"

    - action: click
      selector: "#place-order"

    - action: wait_for_element
      selector: ".order-confirmation"
      timeout: 10s

    - action: verify_element
      selector: ".order-number"
      exists: true

Level 3: Advanced Topics [LEVEL 3]

Custom Comprehension Agents

The framework uses AI agents to interpret application output and determine if tests pass. You can customize these agents for domain-specific logic.

Default Comprehension Agent:

Observes raw output (text, HTML, screenshots)
Applies general reasoning to verify expectations
Returns pass/fail with explanation

Custom Comprehension Agent (see examples/custom-agents/custom-comprehension-agent.yaml):

scenario:
  name: "Financial Dashboard Test with Custom Agent"
  type: web

  # Define custom comprehension logic
  comprehension_agent:
    model: "gpt-4"
    system_prompt: |
      You are a financial data validator. When verifying dashboard content:
      1. All monetary values must use proper formatting ($1,234.56)
      2. Percentages must include % symbol
      3. Dates must be in MM/DD/YYYY format
      4. Negative values must be red
      5. Chart data must be logically consistent

      Be strict about formatting and data consistency.

    examples:
      - input: "Total Revenue: 45000"
        output: "FAIL - Missing currency symbol and comma separator"
      - input: "Total Revenue: $45,000.00"
        output: "PASS - Correctly formatted"

  steps:
    - action: navigate
      url: "http://localhost:3000/financial-dashboard"

    - action: verify_element
      selector: ".revenue-widget"
      use_custom_comprehension: true
      description: "Revenue should be properly formatted"

Visual Regression Testing [LEVEL 3]

Compare screenshots against baseline images:

scenario:
  name: "Visual Regression - Homepage"
  type: web

  steps:
    - action: navigate
      url: "http://localhost:3000"

    - action: wait_for_element
      selector: ".page-loaded"

    - action: screenshot
      save_as: "homepage.png"

    - action: visual_compare
      screenshot: "homepage.png"
      baseline: "./baselines/homepage-baseline.png"
      threshold: 0.05 # 5% difference allowed
      highlight_differences: true

Performance Validation [LEVEL 3]

Measure and validate performance metrics:

scenario:
  name: "Performance - Dashboard Load Time"
  type: web

  performance:
    metrics:
      - page_load_time
      - first_contentful_paint
      - time_to_interactive

  steps:
    - action: navigate
      url: "http://localhost:3000/dashboard"
      measure_timing: true

    - action: verify_performance
      metric: page_load_time
      less_than: 3000 # 3 seconds

    - action: verify_performance
      metric: first_contentful_paint
      less_than: 1500 # 1.5 seconds

Multi-Window Coordination (Electron) [LEVEL 3]

Test applications with multiple windows:

scenario:
  name: "Multi-Window Chat Application"
  type: electron

  steps:
    - action: launch
      target: "./chat-app"

    - action: menu_click
      path: ["Window", "New Chat"]

    - action: verify_window
      count: 2

    - action: window_action
      window: 1
      action: focus

    - action: type
      selector: ".message-input"
      value: "Hello from window 1"

    - action: click
      selector: ".send-button"

    - action: window_action
      window: 2
      action: focus

    - action: wait_for_element
      selector: ".message"
      contains: "Hello from window 1"
      timeout: 5s

IPC Testing (Electron) [LEVEL 3]

Test Inter-Process Communication between renderer and main:

scenario:
  name: "Electron IPC Communication"
  type: electron

  steps:
    - action: launch
      target: "./my-app"

    - action: ipc_send
      channel: "get-system-info"

    - action: ipc_expect
      channel: "system-info-reply"
      timeout: 3s

    - action: verify_ipc_payload
      contains:
        platform: "darwin"
        arch: "x64"

Custom Reporters [LEVEL 3]

Generate custom test reports:

scenario:
  name: "Test with Custom Reporting"
  type: cli

  reporting:
    format: custom
    template: "./report-template.html"
    include:
      - screenshots
      - logs
      - timing_data
      - video_recording

    email:
      enabled: true
      recipients: ["team@example.com"]
      on_failure_only: true

  steps:
    # ... test steps ...

Framework Integration [LEVEL 2]

Running Tests

Single test:

gadugi-agentic-test run test-scenario.yaml

Multiple tests:

gadugi-agentic-test run tests/*.yaml

With options:

gadugi-agentic-test run test.yaml \
  --verbose \
  --evidence-dir ./test-evidence \
  --retry 2 \
  --timeout 60s

CI/CD Integration

GitHub Actions (.github/workflows/agentic-tests.yml):

name: Agentic Tests

on: [push, pull_request]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3

      - name: Install gadugi-agentic-test
        run: pip install gadugi-agentic-test

      - name: Run tests
        run: gadugi-agentic-test run tests/agentic/*.yaml

      - name: Upload evidence
        if: always()
        uses: actions/upload-artifact@v3
        with:
          name: test-evidence
          path: ./evidence/

Evidence Collection

The framework automatically collects evidence for debugging:

evidence/
  scenario-name-20250116-093045/
    ├── scenario.yaml          # Original test scenario
    ├── execution-log.json     # Detailed execution log
    ├── screenshots/           # All captured screenshots
    │   ├── step-1.png
    │   ├── step-3.png
    │   └── step-5.png
    ├── output-captures/       # CLI/TUI output
    │   ├── stdout.txt
    │   └── stderr.txt
    ├── timing.json            # Performance metrics
    └── report.html            # Human-readable report

Best Practices [LEVEL 2]

1. Start Simple, Add Complexity

Begin with basic smoke tests, then add detail:

# Level 1: Basic smoke test
steps:
  - action: launch
    target: "./app"
  - action: verify_output
    contains: "Ready"

# Level 2: Add interaction
steps:
  - action: launch
    target: "./app"
  - action: send_input
    value: "command\n"
  - action: verify_output
    contains: "Success"

# Level 3: Add error handling and edge cases
steps:
  - action: launch
    target: "./app"
  - action: send_input
    value: "invalid\n"
  - action: verify_output
    contains: "Error"
  - action: send_input
    value: "command\n"
  - action: verify_output
    contains: "Success"

2. Use Descriptive Names and Descriptions

# Bad
scenario:
  name: "Test 1"
  steps:
    - action: click
      selector: "button"

# Good
scenario:
  name: "User Login Flow - Valid Credentials"
  description: "Verifies user can log in with valid email and password"
  steps:
    - action: click
      selector: "button[type=submit]"
      description: "Submit login form"

3. Verify Critical Paths Only

Don't test every tiny detail. Focus on user-facing behavior:

# Bad - Tests implementation details
- action: verify_element
  selector: ".internal-cache-status"
  contains: "initialized"

# Good - Tests user-visible behavior
- action: verify_element
  selector: ".welcome-message"
  contains: "Welcome back"

4. Use Prerequisites for Test Dependencies

scenario:
  name: "User Profile Edit"

  prerequisites:
    - "User testuser@example.com exists"
    - "User is logged in"
    - "Database is seeded with test data"

  steps:
    # Test assumes prerequisites are met
    - action: navigate
      url: "/profile"

5. Keep Tests Independent

Each test should set up its own state and clean up:

scenario: name: "Create Document"

steps: # Create test user (don't assume exists) - action: api_call endpoint: "/api/users" method: POST data: { email: "test@example.com" }

安装

When to Use This Skill [LEVEL 1]

Perfect For

Use This Skill When

Don't Use This Skill When

Core Concepts [LEVEL 1]

Outside-In Testing Philosophy

The Gadugi Agentic Test Framework [LEVEL 2]

Progressive Disclosure Levels [LEVEL 1]

Quick Start [LEVEL 1]

Installation

Your First Test (CLI Example)

Understanding the YAML Structure [LEVEL 1]

Application Types and Agents [LEVEL 2]

CLI Applications [LEVEL 1]

TUI Applications [LEVEL 1]

Web Applications [LEVEL 1]

Electron Applications [LEVEL 2]

Test Scenario Anatomy [LEVEL 2]

Metadata Section

Prerequisites

Steps

Verification Actions [LEVEL 1]

Cleanup Section

Advanced Patterns [LEVEL 2]

Conditional Logic

Variables and Templating [LEVEL 2]

Loops and Repetition [LEVEL 2]

Error Handling [LEVEL 2]

Multi-Step Workflows [LEVEL 2]

Level 3: Advanced Topics [LEVEL 3]

Custom Comprehension Agents

Visual Regression Testing [LEVEL 3]

Performance Validation [LEVEL 3]

Multi-Window Coordination (Electron) [LEVEL 3]

IPC Testing (Electron) [LEVEL 3]

Custom Reporters [LEVEL 3]

Framework Integration [LEVEL 2]

Running Tests

CI/CD Integration

Evidence Collection

Best Practices [LEVEL 2]

1. Start Simple, Add Complexity

2. Use Descriptive Names and Descriptions

3. Verify Critical Paths Only

4. Use Prerequisites for Test Dependencies

5. Keep Tests Independent