Question 1

What is statistical significance in A/B testing?

Accepted Answer

Statistical significance tells you the probability that the difference between your control and variant is real, not due to random chance. A result is typically considered statistically significant at 95% confidence — meaning there's only a 5% chance (p < 0.05) the observed difference is noise. A 99% confidence threshold (p < 0.01) is used for high-stakes decisions. Running an A/B test for too short a period or with too little traffic produces results that appear significant but aren't.

Question 2

How long should I run an A/B test?

Accepted Answer

Run your test for at least one full business cycle (typically 1–2 weeks) to account for day-of-week effects. End date should not be determined by when results reach significance — stopping as soon as you see 95% confidence is the 'peeking problem' and inflates false positive rates. Use a sample size calculator before you start to determine the minimum visitors needed per variant to detect your minimum detectable effect at 95% confidence, then run until you hit that sample size.

Question 3

What is p-value and how do I interpret it?

Accepted Answer

The p-value is the probability of observing a difference this large (or larger) between your variants if the null hypothesis (no real difference) were true. p = 0.05 means 5% probability the result is random chance — equivalent to 95% confidence. p = 0.01 means 1% probability — 99% confidence. Lower p-value = stronger evidence of a real difference. A p-value does NOT tell you the size of the effect or whether it's practically meaningful — a statistically significant 0.1% conversion lift may not be worth acting on.

Question 4

What is the minimum sample size for an A/B test?

Accepted Answer

Sample size depends on three inputs: (1) Baseline conversion rate — your current control performance; (2) Minimum detectable effect (MDE) — the smallest lift you'd bother to act on (typically 10–20% relative improvement); (3) Desired confidence and power — usually 95% confidence and 80% power. At a 3% baseline conversion rate and 20% MDE (detecting a lift from 3% to 3.6%), you need roughly 5,000 visitors per variant. At a smaller MDE of 10%, you'd need ~20,000 per variant. Underpowered tests produce unreliable results.

Confidence	p-value	Meaning	Use when
80%	p < 0.20	Weak — 1 in 5 chance it's noise	Exploratory tests, low-stakes decisions
90%	p < 0.10	Moderate — 1 in 10 chance it's noise	Low-traffic sites needing faster decisions
95%	p < 0.05	Standard — 1 in 20 chance it's noise	Most A/B tests, industry default
99%	p < 0.01	Strong — 1 in 100 chance it's noise	High-stakes changes, irreversible decisions

A/B Test Calculator

Confidence level reference

Frequently asked questions

Related marketing tools

Stop guessing — start testing