Confidence interval (mean)

03How it works

Why a confidence interval beats a single estimate

A sample mean is a point estimate — a single number drawn from one particular sample. Re-run the survey, the trial, or the production batch and you will almost certainly get a different number. The whole purpose of inferential statistics is to quantify how far that wandering can plausibly take you, and a confidence interval (CI) is the tool that does it. Instead of telling your reader "average satisfaction is 7.4 out of 10" — which sounds precise but is silently wrong about its own precision — a CI says "average satisfaction is 7.4, and the true population mean almost certainly sits somewhere between 6.8 and 8.0". That second sentence is the one decision-makers need: it tells them whether the difference between two products, two cohorts, or two batches is likely real or likely noise.

CIs scale gracefully with everything you change. Bigger sample? The interval shrinks. More variability in the underlying data? It widens. Higher confidence required? It widens again. A regulator demanding 99.9 % certainty before approving a drug will get a wider interval — and need a much bigger trial — than a marketer happy with 90 %. The math makes the trade-off explicit.

The interval is also the right vocabulary for talking to non-statisticians. People understand "between 6.8 and 8.0" instinctively; they have to be coached into reading p-values. That readability is why every credible scientific journal, every audit report, and every A/B-testing dashboard now reports CIs alongside (or in place of) hypothesis tests.

The formula

For the mean of a quantitative variable the two-sided confidence interval is:

CI = x̄ ± critical × (s / √n)

where x̄ is the sample mean, s the sample standard deviation, n the sample size, and critical a multiplier read from a probability table.

Two distributions supply the multiplier:

Z (standard normal) when the population standard deviation σ is known, or when the sample is large enough (n ≥ 30) that the central limit theorem makes the sampling distribution of the mean approximately normal regardless of the underlying data shape. The classic Z values are 1.645 (90 %), 1.96 (95 %), 2.576 (99 %), and 3.291 (99.9 %).
Student's t with df = n − 1 when σ is unknown and the sample is small. The t distribution has heavier tails, so its critical values are bigger — for n = 5 at 95 %, the multiplier is 2.776 instead of 1.96, producing a noticeably wider interval. As n grows the t curve converges to z; by df = 30 the difference is under 0.5 % and most practitioners switch to z.

The quantity s / √n is the standard error of the mean — the standard deviation of x̄ across hypothetical repeated samples. Multiplying it by the critical value scales it to whatever confidence level you picked.

How to use this calculator

Enter the sample mean — the average you computed from your data (e.g. 7.4 out of 10).
Enter the sample standard deviation — the spread of the individual data points around that mean. Most spreadsheet software gives you this with STDEV.S(...) or =STDEV(...).
Enter the sample size n.
Pick a confidence level. 95 % is the journal default, 90 % is common in market research, 99 % and 99.9 % appear in clinical trials and quality control.
Leave Distribution on auto unless you have a specific reason to override. auto picks z when n ≥ 30 and t with df = n − 1 below that threshold. Force z only when σ is genuinely known a priori (rare). Force t to be conservative on small samples even past the 30 threshold.

The result panel shows the lower and upper bounds, the margin of error, the critical value used, the standard error, and which distribution did the work.

Worked example

A clinical trial measures recovery time on 15 patients, finds a mean of 7.4 days with a sample standard deviation of 1.2 days, and wants a 95 % CI.

n = 15 → df = 14, so we use Student's t.
The t critical value at 95 % with df = 14 is 2.145.
Standard error = 1.2 / √15 = 0.310.
Margin of error = 2.145 × 0.310 = 0.665.
CI = 7.4 ± 0.665 = [6.74, 8.07] days.

Now suppose the same numbers come from a much bigger trial of n = 1000 patients. Standard error drops to 1.2 / √1000 = 0.0379, the calculator switches to z (n ≥ 30), critical = 1.96, margin = 0.074. CI = [7.33, 7.47]. Same point estimate, an interval ten times tighter — sample size buys precision.

Pitfalls and misinterpretations

A 95 % CI is not "a 95 % probability that the population mean is in this interval". The frequentist interpretation is "if we repeated the sampling process indefinitely, 95 % of the intervals we constructed would contain the true mean". The parameter is fixed; the interval is random. People who want the probability statement need a Bayesian credible interval instead.

A narrow CI is not the same as an accurate one. If your sampling method is biased — convenience samples, self-selected respondents, dropouts — the CI will be tight but centred on the wrong number. Statistics quantifies sampling error, not measurement error or selection bias. A million-respondent online poll can still be wildly off if the respondents do not look like the population.

CIs assume the data come from a roughly symmetric distribution (or that n is large enough for the CLT to kick in). Heavily skewed data — incomes, response times, biological reaction strengths — should be transformed (log) before applying the formula, or analysed with a non-parametric method like a bootstrap CI.

A CI for the mean is not a prediction interval for an individual observation. The CI tells you where the mean lives; the prediction interval (much wider, factor of ≈ √(1 + 1/n) bigger) tells you where the next single value is likely to fall.

If you compute many CIs simultaneously — comparing 20 product variants, screening 100 genes — the family-wise error rate balloons. The Bonferroni or Benjamini-Hochberg corrections widen each interval to keep the overall confidence level honest.

Finally, a CI says nothing about clinical or business significance. A perfectly significant 0.1-point improvement in user satisfaction may be statistically real and economically irrelevant. Always read the bounds in domain units before acting on them.

CI for a proportion — different formula entirely. The Wald (textbook) version performs poorly near 0 % or 100 %; the Wilson score interval and the Agresti–Coull correction are far more reliable and now the default in most software.
CI for the difference of two means — combine standard errors via Welch's formula when sample variances differ; otherwise pool. This is what powers most A/B-test reports.
Bootstrap CI — resample the data with replacement thousands of times, recompute the statistic each time, take the 2.5th and 97.5th percentiles. No normality assumption, works for any statistic (median, ratio, correlation), but needs a computer. Particularly useful for skewed or otherwise non-normal data.
Bayesian credible interval — the answer to "what is the probability the parameter is in this range, given this prior and this data?" Numerically close to a frequentist CI for an uninformative prior, but interpretable in plain probability language.
Prediction interval — the range a future single observation is expected to fall in, given the current sample. Wider than the CI for the mean.
Tolerance interval — the range covering a stated proportion of the population (e.g. "95 % of bottles will be filled between 498 ml and 502 ml") with a stated confidence. Quality-control standard, distinct from both CIs and prediction intervals.

Choosing the right interval is half the analysis. Mis-applying a mean CI where a proportion CI or a prediction interval was needed is one of the most common errors in applied statistics — and one of the easiest to avoid once the distinctions above are clear.

Why a confidence interval beats a single estimate

The formula

How to use this calculator

Worked example

Pitfalls and misinterpretations

Variations and related intervals

Related calculators