What is an A/B test significance calculator?
This tool tells you whether the difference between two variants in an A/B test is statistically significant or could simply be due to random chance. It uses a two-proportion z-test, the standard method for comparing conversion rates between a control (A) and a variant (B).
How to use it
Enter the number of conversions and the total number of visitors for each variant. The calculator returns the z-score, the two-tailed p-value, and the confidence level. A confidence of 95% or higher (p-value ≤ 0.05) is the common threshold for declaring a winner.
The formula explained
First the pooled proportion is computed as \(\bar{p} = (x_1 + x_2) / (n_1 + n_2)\). The standard error is \(\sqrt{\bar{p}(1-\bar{p})(1/n_1 + 1/n_2)}\). The z-score is the difference between the two observed rates divided by this standard error:
$$Z = \frac{\hat{p}_1 - \hat{p}_2}{\sqrt{\bar{p}\,(1-\bar{p})\left(\frac{1}{n_1}+\frac{1}{n_2}\right)}}$$where
$$\left\{ \begin{aligned} \hat{p}_1 &= \dfrac{\text{Conversions A}}{\text{Visitors A}} \\ \hat{p}_2 &= \dfrac{\text{Conversions B}}{\text{Visitors B}} \\ \bar{p} &= \dfrac{\text{Conv. A} + \text{Conv. B}}{\text{Visitors A} + \text{Visitors B}} \end{aligned} \right.$$The p-value is derived from the standard normal distribution (two-tailed), and confidence equals \((1 - \text{p-value}) \times 100\%\).
Worked example
Variant A: 120 conversions out of 1,000 (12%). Variant B: 150 out of 1,000 (15%). Pooled \(\bar{p} = 270/2000 = 0.135\). \(\text{SE} = \sqrt{0.135 \times 0.865 \times (0.001 + 0.001)} \approx 0.01528\). \(z = (0.12 - 0.15) / 0.01528 \approx -1.963\). The two-tailed p-value \(\approx 0.0496\), giving about 95% confidence — a borderline significant result.
FAQ
What confidence level should I aim for? 95% is the industry standard, meaning a 5% chance the result is a false positive.
Does sample size matter? Yes. Small samples produce large p-values even for real differences; let tests run until each variant has enough visitors.
Why two-tailed? A two-tailed test detects a difference in either direction (B better or worse than A), which is the safer default.