CI for the mean: x̄ ± z·(σ/√n) for known σ; t for small samples.
CI = x̄ ± critical × (s/√n). The auto rule picks z when n ≥ 30 (large-sample, CLT) and Student's t with df = n−1 otherwise. The 95 % interval contains the true mean in ~95 % of repeated samples — it is NOT a probability statement about a single fixed parameter.
A sample mean is a point estimate — a single number drawn from one particular sample. Re-run the survey, the trial, or the production batch and you will almost certainly get a different number. The whole purpose of inferential statistics is to quantify how far that wandering can plausibly take you, and a confidence interval (CI) is the tool that does it. Instead of telling your reader "average satisfaction is 7.4 out of 10" — which sounds precise but is silently wrong about its own precision — a CI says "average satisfaction is 7.4, and the true population mean almost certainly sits somewhere between 6.8 and 8.0". That second sentence is the one decision-makers need: it tells them whether the difference between two products, two cohorts, or two batches is likely real or likely noise.
CIs scale gracefully with everything you change. Bigger sample? The interval shrinks. More variability in the underlying data? It widens. Higher confidence required? It widens again. A regulator demanding 99.9 % certainty before approving a drug will get a wider interval — and need a much bigger trial — than a marketer happy with 90 %. The math makes the trade-off explicit.
The interval is also the right vocabulary for talking to non-statisticians. People understand "between 6.8 and 8.0" instinctively; they have to be coached into reading p-values. That readability is why every credible scientific journal, every audit report, and every A/B-testing dashboard now reports CIs alongside (or in place of) hypothesis tests.
For the mean of a quantitative variable the two-sided confidence interval is:
CI = x̄ ± critical × (s / √n)
where x̄ is the sample mean, s the sample standard deviation, n the sample size, and critical a multiplier read from a probability table.
Two distributions supply the multiplier:
The quantity s / √n is the standard error of the mean — the standard deviation of x̄ across hypothetical repeated samples. Multiplying it by the critical value scales it to whatever confidence level you picked.
STDEV.S(...) or =STDEV(...).auto unless you have a specific reason to override. auto picks z when n ≥ 30 and t with df = n − 1 below that threshold. Force z only when σ is genuinely known a priori (rare). Force t to be conservative on small samples even past the 30 threshold.The result panel shows the lower and upper bounds, the margin of error, the critical value used, the standard error, and which distribution did the work.
A clinical trial measures recovery time on 15 patients, finds a mean of 7.4 days with a sample standard deviation of 1.2 days, and wants a 95 % CI.
Now suppose the same numbers come from a much bigger trial of n = 1000 patients. Standard error drops to 1.2 / √1000 = 0.0379, the calculator switches to z (n ≥ 30), critical = 1.96, margin = 0.074. CI = [7.33, 7.47]. Same point estimate, an interval ten times tighter — sample size buys precision.
A 95 % CI is not "a 95 % probability that the population mean is in this interval". The frequentist interpretation is "if we repeated the sampling process indefinitely, 95 % of the intervals we constructed would contain the true mean". The parameter is fixed; the interval is random. People who want the probability statement need a Bayesian credible interval instead.
A narrow CI is not the same as an accurate one. If your sampling method is biased — convenience samples, self-selected respondents, dropouts — the CI will be tight but centred on the wrong number. Statistics quantifies sampling error, not measurement error or selection bias. A million-respondent online poll can still be wildly off if the respondents do not look like the population.
CIs assume the data come from a roughly symmetric distribution (or that n is large enough for the CLT to kick in). Heavily skewed data — incomes, response times, biological reaction strengths — should be transformed (log) before applying the formula, or analysed with a non-parametric method like a bootstrap CI.
A CI for the mean is not a prediction interval for an individual observation. The CI tells you where the mean lives; the prediction interval (much wider, factor of ≈ √(1 + 1/n) bigger) tells you where the next single value is likely to fall.
If you compute many CIs simultaneously — comparing 20 product variants, screening 100 genes — the family-wise error rate balloons. The Bonferroni or Benjamini-Hochberg corrections widen each interval to keep the overall confidence level honest.
Finally, a CI says nothing about clinical or business significance. A perfectly significant 0.1-point improvement in user satisfaction may be statistically real and economically irrelevant. Always read the bounds in domain units before acting on them.
Choosing the right interval is half the analysis. Mis-applying a mean CI where a proportion CI or a prediction interval was needed is one of the most common errors in applied statistics — and one of the easiest to avoid once the distinctions above are clear.