Z-score calculator

03How it works

Why this calculation

The z-score is the most-used standardization in statistics: it transforms an observation into a number of standard deviations from the population mean, allowing direct comparison across distributions with different units. A grade of 78 in one class and 85 in another aren't comparable until you know the class mean and spread; their z-scores are. A z-score also hooks directly into the standard normal distribution: 95 % of observations fall within ±1.96 σ; ±3 σ marks the 99.7 % rule. Six Sigma quality control, IQ scores, growth charts (paediatric height/weight), psychometric testing, and most A/B testing all rest on z-score arithmetic. This calculator computes the z-score from x, μ and σ, plus the percentile, the one-tail probabilities, and the two-tail p-value, with a normal-curve visualization shading the area below the observation.

The formula

Z-score: z = (x − μ) / σ.

x is the observation, μ the mean, σ the standard deviation (must be > 0). z is dimensionless.

CDF of standard normal Φ(z): probability that a standard normal random variable is ≤ z. Computed via the erf approximation (Abramowitz & Stegun): erf(x) = 1 − (a₁t + a₂t² + a₃t³ + a₄t⁴ + a₅t⁵) e^(−x²), where t = 1 / (1 + p·x), with constants p = 0.3275911 and aᵢ as given. Φ(z) = ½(1 + erf(z / √2)). Accuracy to 1.5 × 10⁻⁷.

Outputs: - z: standardized value. - Percentile: Φ(z) × 100, the percent of the population with values ≤ x. - P(X ≤ x): same as percentile / 100. - P(X > x): 1 − Φ(z). - Two-tailed p-value: 2 × min(Φ(z), 1 − Φ(z)) — the probability of seeing |z| or larger by chance, useful for hypothesis testing.

Interpretation bands: |z| < 0.5 very typical; 0.5–1 typical; 1–2 notable; 2–3 unusual; > 3 extreme.

How to use

Enter the observation x, the population mean μ, and the standard deviation σ. The result panel shows z (headline), percentile, two-tailed p-value, both one-tail probabilities, and a verbal interpretation. The chart draws the standard normal curve, shades the area below z, and marks the z-position with a vertical line.

Worked example

IQ score 120, μ = 100, σ = 15 (standard IQ scaling).

z = (120 − 100) / 15 = +1.333.
Φ(z) = 0.9088 → 90.88 %ile.
P(X ≤ 120) = 90.88 %; P(X > 120) = 9.12 %.
Two-tailed p-value: 2 × 9.12 % = 18.24 %.
Interpretation: Notable (1 < |z| < 2).

A test grade 78, μ = 70, σ = 10.

z = +0.8.
Φ(0.8) = 0.7881 → 78.81 %ile.

Outlier: 245, μ = 100, σ = 25.

z = (245 − 100) / 25 = +5.8.
Φ(5.8) ≈ 1 (computer precision); P(X > 5.8 σ) ≈ 3.3 × 10⁻⁹.
Interpretation: Extreme — an event of this magnitude under the assumed normal model is essentially impossible.

Pitfalls

Normality assumption. The percentile and p-value require the underlying distribution to be normal. For non-normal data (skewed, heavy-tailed, multimodal), z-scores are still computable but their probabilistic interpretation breaks. Real income, web request times, and stock returns are not normal — the z-score for "the 99th percentile of returns" can be much smaller than the +2.33 the normal model predicts.

Population vs sample σ. The formula uses the population σ. If you have a sample and used the sample standard deviation s, you're really computing a t-statistic (Student's t-distribution), not a z. For large samples (n > 30) the two are nearly identical; for small samples use t-tables explicitly.

Outlier sensitivity. Both μ and σ are sensitive to outliers — a single extreme value can pull μ and inflate σ, distorting all z-scores. Robust alternatives (median, MAD) are less affected.

Approximation limits. The erf approximation in the calc is accurate to 10⁻⁷; for deep-tail probabilities (z > 6), use specialized libraries (mpmath, scipy.stats.norm).

Two-tailed vs one-tailed p-value. Two-tailed: P(|Z| ≥ |z|), used when the alternative hypothesis is "different from μ". One-tailed: P(Z ≥ z) or P(Z ≤ z), used when the alternative is "greater than μ" or "less than μ" specifically. The calc shows both; pick the one that matches your hypothesis.

Multiple testing. If you compute z-scores on 100 observations and ask "any with |z| > 2?", you'd expect ~5 by pure chance even under the null. Bonferroni or FDR correction is needed for multiple comparisons.

Standardization is not transformation to normal. z-scoring a non-normal variable doesn't make it normal; it just shifts and rescales. The shape stays.

σ = 0 edge case. If σ = 0, the calc rejects (no variance, no z). All observations are at μ.

Very large |z|. JS double precision overflows for |z| > 38; the percentile saturates at 0 % or 100 %. Real-world z-scores rarely exceed 8.

Confidence intervals vs z-scores. A 95 % CI uses z = 1.96; a 99 % uses z = 2.576. These are quantiles, not observations — the calc takes observations and computes z, not the inverse.

Variations

Inverse-CDF (quantile function): given a percentile, find the corresponding z. Different problem.
Student's t-statistic: the small-sample analog with sample σ.
Sample-size calculator: uses z-scores for confidence intervals.
Anomaly detection: z-scores or modified z-scores (median + MAD) flag outliers.
Six Sigma Cpk index: process capability index based on z-distance to spec limits.