backtest-expert

安装量: 213
排名: #4106

安装

npx skills add https://github.com/tradermonty/claude-trading-skills --skill backtest-expert
Backtest Expert
Systematic approach to backtesting trading strategies based on professional methodology that prioritizes robustness over optimistic results.
Core Philosophy
Goal
Find strategies that "break the least", not strategies that "profit the most" on paper.
Principle
Add friction, stress test assumptions, and see what survives. If a strategy holds up under pessimistic conditions, it's more likely to work in live trading.
When to Use This Skill
Use this skill when:
Developing or validating systematic trading strategies
Evaluating whether a trading idea is robust enough for live implementation
Troubleshooting why a backtest might be misleading
Learning proper backtesting methodology
Avoiding common pitfalls (curve-fitting, look-ahead bias, survivorship bias)
Assessing parameter sensitivity and regime dependence
Setting realistic expectations for slippage and execution costs
Prerequisites
Python 3.9+ (for evaluation script)
No API keys required
No external data dependencies — metrics are user-provided
Workflow
1. State the Hypothesis
Define the edge in one sentence.
Example
"Stocks that gap up >3% on earnings and pull back to previous day's close within first hour provide mean-reversion opportunity."
If you can't articulate the edge clearly, don't proceed to testing.
2. Codify Rules with Zero Discretion
Define with complete specificity:
Entry
Exact conditions, timing, price type
Exit
Stop loss, profit target, time-based exit
Position sizing
Fixed $$, % of portfolio, volatility-adjusted
Filters
Market cap, volume, sector, volatility conditions
Universe
What instruments are eligible
Critical
No subjective judgment allowed. Every decision must be rule-based and unambiguous.
3. Run Initial Backtest
Test over:
Minimum 5 years
(preferably 10+)
Multiple market regimes
(bull, bear, high/low volatility)
Realistic costs
Commissions + conservative slippage
Examine initial results for basic viability. If fundamentally broken, iterate on hypothesis.
4. Stress Test the Strategy
This is where 80% of testing time should be spent.
Parameter sensitivity
:
Test stop loss at 50%, 75%, 100%, 125%, 150% of baseline
Test profit target at 80%, 90%, 100%, 110%, 120% of baseline
Vary entry/exit timing by ±15-30 minutes
Look for "plateaus" of stable performance, not narrow spikes
Execution friction
:
Increase slippage to 1.5-2x typical estimates
Model worst-case fills (buy at ask+1 tick, sell at bid-1 tick)
Add realistic order rejection scenarios
Test with pessimistic commission structures
Time robustness
:
Analyze year-by-year performance
Require positive expectancy in majority of years
Ensure strategy doesn't rely on 1-2 exceptional periods
Test in different market regimes separately
Sample size
:
Absolute minimum: 30 trades
Preferred: 100+ trades
High confidence: 200+ trades
5. Out-of-Sample Validation
Walk-forward analysis
:
Optimize on training period (e.g., Year 1-3)
Test on validation period (Year 4)
Roll forward and repeat
Compare in-sample vs out-of-sample performance
Warning signs
:
Out-of-sample <50% of in-sample performance
Need frequent parameter re-optimization
Parameters change dramatically between periods
6. Evaluate Results
Questions to answer
:
Does edge survive pessimistic assumptions?
Is performance stable across parameter variations?
Does strategy work in multiple market regimes?
Is sample size sufficient for statistical confidence?
Are results realistic, not "too good to be true"?
Decision criteria
:
Deploy
Survives all stress tests with acceptable performance
🔄
Refine
Core logic sound but needs parameter adjustment
Abandon
Fails stress tests or relies on fragile assumptions
Use the evaluation script for a structured, quantitative assessment:
python3 skills/backtest-expert/scripts/evaluate_backtest.py
\
--total-trades
150
\
--win-rate
62
\
--avg-win-pct
1.8
\
--avg-loss-pct
1.2
\
--max-drawdown-pct
15
\
--years-tested
8
\
--num-parameters
3
\
--slippage-tested
\
--output-dir reports/
The script scores across 5 dimensions (Sample Size, Expectancy, Risk Management, Robustness, Execution Realism), detects red flags, and outputs a Deploy/Refine/Abandon verdict.
Key Testing Principles
Punish the Strategy
Add friction everywhere:
Commissions higher than reality
Slippage 1.5-2x typical
Worst-case fills
Order rejections
Partial fills
Rationale
Strategies that survive pessimistic assumptions often outperform in live trading.
Seek Plateaus, Not Peaks
Look for parameter ranges where performance is stable, not optimal values that create performance spikes.
Good
Strategy profitable with stop loss anywhere from 1.5% to 3.0%
Bad
Strategy only works with stop loss at exactly 2.13%
Stable performance indicates genuine edge; narrow optima suggest curve-fitting.
Test All Cases, Not Cherry-Picked Examples
Wrong approach
Study hand-picked "market leaders" that worked
Right approach
Test every stock that met criteria, including those that failed
Selective examples create survivorship bias and overestimate strategy quality.
Separate Idea Generation from Validation
Intuition
Useful for generating hypotheses
Validation
Must be purely data-driven
Never let attachment to an idea influence interpretation of test results.
Common Failure Patterns
Recognize these patterns early to save time:
Parameter sensitivity
Only works with exact parameter values
Regime-specific
Great in some years, terrible in others
Slippage sensitivity
Unprofitable when realistic costs added
Small sample
Too few trades for statistical confidence
Look-ahead bias
"Too good to be true" results
Over-optimization
Many parameters, poor out-of-sample results
See
references/failed_tests.md
for detailed examples and diagnostic framework.
Output
reports/backtest_eval_.json
— structured evaluation with per-dimension scores, red flags, and verdict
reports/backtest_eval_.md
— human-readable report with dimension table, key metrics, and red flag details
Resources
Methodology Reference
File
:
references/methodology.md
When to read
For detailed guidance on specific testing techniques.
Contents
:
Stress testing methods
Parameter sensitivity analysis
Slippage and friction modeling
Sample size requirements
Market regime classification
Common biases and pitfalls (survivorship, look-ahead, curve-fitting, etc.)
Failed Tests Reference
File
:
references/failed_tests.md
When to read
When strategy fails tests, or learning from past mistakes.
Contents
:
Why failures are valuable
Common failure patterns with examples
Case study documentation framework
Red flags checklist for evaluating backtests
Critical Reminders
Time allocation
Spend 20% generating ideas, 80% trying to break them.
Context-free requirement
If strategy requires "perfect context" to work, it's not robust enough for systematic trading.
Red flag
If backtest results look too good (>90% win rate, minimal drawdowns, perfect timing), audit carefully for look-ahead bias or data issues.
Tool limitations
Understand your backtesting platform's quirks (interpolation methods, handling of low liquidity, data alignment issues).
Statistical significance
Small edges require large sample sizes to prove. 5% edge per trade needs 100+ trades to distinguish from luck. Discretionary vs Systematic Differences This skill focuses on systematic/quantitative backtesting where: All rules are codified in advance No discretion or "feel" in execution Testing happens on all historical examples, not cherry-picked cases Context (news, macro) is deliberately stripped out Discretionary traders study differently—this skill may not apply to setups requiring subjective judgment.
返回排行榜