Pentest Validation
REQUIRED for Tier 2-3
source_repo : ./src
REQUIRED for Tier 1+
exploitation_tier : 2
1=pattern-only, 2=payload-test, 3=full-exploit
vuln_types :
Which pipelines to run
- injection
SQL, NoSQL, command injection
- xss
Reflected, stored, DOM XSS
- auth
Auth bypass, session, JWT
- ssrf
URL scheme abuse, metadata
max_cost_usd : 15
Budget cap per run
timeout_minutes : 30
Time cap per run
require_authorization : true
MUST confirm target ownership
no_production : true
Block production URLs
production_patterns :
URL patterns to block
- -
- ".prod."
- -
- "api.*"
- -
- "www.*"
- Safeguards (Mandatory)
- Authorization Gate
- Every pentest validation run MUST:
- Display target URL and exploitation tier to user
- Require explicit confirmation: "I own/authorized testing of this target"
- Log authorization with timestamp
- Block if target URL matches production patterns
- What This Skill Does NOT Do
- Full autonomous reconnaissance (Nmap, Subfinder)
- Zero-day exploit development
- Attack targets without explicit authorization
- Test production systems
- Store actual exfiltrated data (only proof of access)
- Social engineering or phishing simulation
- Port scanning or service discovery
- Validation Pipelines
- Injection Pipeline
- Attack
- Tier 1 (Pattern)
- Tier 2 (Payload)
- Tier 3 (Full)
- SQL injection
- String concat in query
- ' OR '1'='1
- response diff
- UNION SELECT data extraction
- NoSQL injection
- $where
- ,
- $gt
- in query
- Operator injection test
- Collection enumeration
- Command injection
- exec()
- ,
- system()
- calls
- Command delimiter test
- Reverse shell proof
- LDAP injection
- String concat in filter
- Wildcard injection
- Directory enumeration
- XSS Pipeline
- Attack
- Tier 1 (Pattern)
- Tier 2 (Payload)
- Tier 3 (Full)
- Reflected XSS
- No output encoding
- reflection
- Browser JS execution via Playwright
- Stored XSS
- innerHTML
- assignment
- Payload stored + retrieved
- Cookie theft PoC
- DOM XSS
- document.write(location)
- Fragment injection
- DOM manipulation proof
- Auth Pipeline
- Attack
- Tier 1 (Pattern)
- Tier 2 (Payload)
- Tier 3 (Full)
- JWT none
- No algorithm validation
- Modified JWT accepted
- Admin access with forged token
- Session fixation
- No session rotation
- Pre-set session reused
- Cross-user session hijack
- Credential stuffing
- No rate limiting
- 100 attempts unblocked
- Valid credential discovery
- IDOR
- No authorization check
- Access other user data
- Full CRUD on foreign resources
- SSRF Pipeline
- Attack
- Tier 1 (Pattern)
- Tier 2 (Payload)
- Tier 3 (Full)
- Internal URL
- User-controlled URL fetch
- http://169.254.169.254
- Cloud metadata extraction
- DNS rebinding
- URL validation bypass
- Rebind to internal IP
- Internal service access
- Protocol smuggling
- URL scheme not restricted
- file:///etc/passwd
- File content in response
- Agent Coordination
- Orchestration Pattern
- // Phase 1: Recon (parallel scans)
- await
- Task
- (
- "Security Scan"
- ,
- {
- target
- :
- "./src"
- ,
- layers
- :
- {
- sast
- :
- true
- ,
- dast
- :
- true
- ,
- dependencies
- :
- true
- ,
- secrets
- :
- true
- }
- }
- ,
- "qe-security-scanner"
- )
- ;
- // Phase 2: Analysis (parallel review)
- await
- Promise
- .
- all
- (
- [
- Task
- (
- "Code Security Review"
- ,
- {
- findings
- :
- phase1Results
- ,
- depth
- :
- "comprehensive"
- }
- ,
- "qe-security-reviewer"
- )
- ,
- Task
- (
- "Compliance Audit"
- ,
- {
- findings
- :
- phase1Results
- ,
- frameworks
- :
- [
- "owasp-top-10"
- ]
- }
- ,
- "qe-security-auditor"
- )
- ]
- )
- ;
- // Phase 3: Validation (graduated exploitation)
- await
- Task
- (
- "Exploit Validation"
- ,
- {
- findings
- :
- [
- ...
- phase1Results
- ,
- ...
- phase2Results
- ]
- ,
- target_url
- :
- "https://staging.app.com"
- ,
- exploitation_tier
- :
- 2
- ,
- vuln_types
- :
- [
- "injection"
- ,
- "xss"
- ,
- "auth"
- ,
- "ssrf"
- ]
- ,
- max_cost_usd
- :
- 15
- ,
- timeout_minutes
- :
- 30
- }
- ,
- "qe-pentest-validator"
- )
- ;
- // Phase 4: Report ("No Exploit, No Report" gate)
- await
- Task
- (
- "Security Quality Gate"
- ,
- {
- findings
- :
- phase3Results
- .
- confirmedFindings
- ,
- gate
- :
- "no-exploit-no-report"
- ,
- require_poc
- :
- true
- }
- ,
- "qe-quality-gate"
- )
- ;
- Finding Classification
- Status
- Meaning
- Action
- confirmed-exploitable
- Exploitation succeeded with PoC
- Report with evidence
- likely-exploitable
- Partial exploitation, defenses detected
- Report with caveats
- not-exploitable
- All exploitation attempts failed
- Filter from report
- inconclusive
- WAF/defense blocked, unclear if vulnerable
- Report for manual review
- Exploit Playbook Memory
- Namespace Structure
- aqe/pentest/
- playbook/
- exploit/{vuln_type}/{tech_stack}/{technique}
- bypass/{defense_type}/{technique}
- payload/{vuln_type}/{variant}
- results/
- validation-{timestamp}
- poc/
- {finding_id}-poc
- Learning Loop
- Before validation
-
- Query playbook for known patterns matching findings
- During validation
-
- Try known payloads first (higher success rate)
- After validation
-
- Store new successful patterns with confidence scores
- Over time
- Agent converges on most effective payloads per tech stack
Cost Optimization
Estimated Cost by Scenario
Scenario
Tier Mix
Findings
Est. Cost
Est. Time
PR check (source only)
100% Tier 1
5
$0
<5s
Sprint validation
70% T1, 30% T2
15
$2-5
5-10 min
Release validation
40% T1, 40% T2, 20% T3
25
$8-15
15-30 min
Full pentest
20% T1, 30% T2, 50% T3
40
$15-30
30-60 min
Cost vs Shannon Comparison
Metric
Shannon
AQE Pentest Validation
Cost per run
~$50
$5-15 (graduated tiers)
Runtime
60-90 min
15-30 min (parallel pipelines)
False positive rate
Low (exploit-proven)
Low (same principle)
Learning
None (static prompts)
ReasoningBank playbook
Success Metrics
Metric
Target
Measurement
False positive reduction
60% of findings eliminated Pre/post validator comparison Exploit confirmation rate 80% of confirmed findings truly exploitable Manual PoC verification Cost per run <$15 USD Token tracking per pipeline Time per run <30 minutes Execution time metrics Playbook growth 100+ patterns after 6 months Memory namespace count