- A/B Test Analysis
- Evaluate A/B test results with statistical rigor and translate findings into clear product decisions.
- Context
- You are analyzing A/B test results for
- $ARGUMENTS
- .
- If the user provides data files (CSV, Excel, or analytics exports), read and analyze them directly. Generate Python scripts for statistical calculations when needed.
- Instructions
- Understand the experiment
- :
- What was the hypothesis?
- What was changed (the variant)?
- What is the primary metric? Any guardrail metrics?
- How long did the test run?
- What is the traffic split?
- Validate the test setup
- :
- Sample size
-
- Is the sample large enough for the expected effect size?
- Use the formula: n = (Z²α/2 × 2 × p × (1-p)) / MDE²
- Flag if the test is underpowered (<80% power)
- Duration
-
- Did the test run for at least 1-2 full business cycles?
- Randomization
-
- Any evidence of sample ratio mismatch (SRM)?
- Novelty/primacy effects
-
- Was there enough time to wash out initial behavior changes?
- Calculate statistical significance
- :
- Conversion rate
- for control and variant
- Relative lift
-
- (variant - control) / control × 100
- p-value
-
- Using a two-tailed z-test or chi-squared test
- Confidence interval
-
- 95% CI for the difference
- Statistical significance
-
- Is p < 0.05?
- Practical significance
- Is the lift meaningful for the business? If the user provides raw data, generate and run a Python script to calculate these. Check guardrail metrics : Did any guardrail metrics (revenue, engagement, page load time) degrade? A winning primary metric with degraded guardrails may not be a true win Interpret results : Outcome Recommendation Significant positive lift, no guardrail issues Ship it — roll out to 100% Significant positive lift, guardrail concerns Investigate — understand trade-offs before shipping Not significant, positive trend Extend the test — need more data or larger effect Not significant, flat Stop the test — no meaningful difference detected Significant negative lift Don't ship — revert to control, analyze why Provide the analysis summary :
A/B Test Results: [Test Name]
Hypothesis: [What we expected] Duration: [X days] | Sample: [N control / M variant] | Metric | Control | Variant | Lift | p-value | Significant? | |---|---|---|---|---|---| | [Primary] | X% | Y% | +Z% | 0.0X | Yes/No | | [Guardrail] | ... | ... | ... | ... | ... | Recommendation: [Ship / Extend / Stop / Investigate] Reasoning: [Why] Next steps: [What to do] Think step by step. Save as markdown. Generate Python scripts for calculations if raw data is provided. Further Reading A/B Testing 101 + Examples Testing Product Ideas: The Ultimate Validation Experiments Library Are You Tracking the Right Metrics?