From charlesreid1

Revision as of 01:34, 24 June 2026 by Admin (talk | contribs) (filled in the missing Statistical Descriptions of Data page — moments, quantiles, robustness, histograms, KDE, bootstrap, Monte Carlo summaries, and a five-minute diagnostic ritual (via create-page on MediaWiki MCP Server))
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Statistical descriptions of data is the art of collapsing a pile of numbers into something a human can actually reason about. You just generated ten million random draws from your Monte Carlo simulation — congrats. Now what? You can't stare at ten million floats. You need the five numbers that matter, the shape of the distribution, and an honest assessment of how much you don't know.

This page is the bridge between generating random numbers (Numerics/Random Numbers) and doing actual inference (Numerics/Classification and Inference). We're not testing hypotheses or fitting models yet — we're just describing what's in front of us, competently.

Why "Just Describing" Is Harder Than It Looks

Descriptive statistics has a branding problem. It sounds like the boring chapter everyone skips. But honestly, most data disasters happen right here: someone blindly reports the mean and standard deviation for a distribution with fat tails, or uses Pearson correlation on a relationship that's shaped like a parabola, or bins a histogram with the wrong width and invents structure that isn't there.

The central tension in descriptive statistics is this: no real dataset is normal. The mean, the variance, the correlation coefficient — these are all elegant summaries if your data is Gaussian. When it's not (and it never is), you need tools that don't shatter on contact with reality.

The Moments: Your Distribution's Fingerprint

If a distribution were a person, the moments would be their driver's license photo, height, build, and whether they slouch. You don't need all the moments to recognize someone, but the first four get you surprisingly far.

Raw Moments and Central Moments

The $ k $-th raw moment about zero:

$ m_k = \frac{1}{n} \sum_{i=1}^{n} x_i^k $

The $ k $-th central moment (about the mean $ \mu $):

$ \mu_k = \frac{1}{n} \sum_{i=1}^{n} (x_i - \bar{x})^k $

The raw moments tell you about location and spread in absolute terms. The central moments tell you about shape — and shape is where everything interesting lives.

The First Four: A Field Guide

Moment What it is What it tells you Fragile to...
1st (Mean) $ \bar{x} = \frac{1}{n}\sum x_i $ Center of mass Where the data "lives" Outliers. A single bad value at $ 10^9 $ ruins your whole afternoon.
2nd (Variance) $ \sigma^2 = \frac{1}{n}\sum (x_i - \bar{x})^2 $ Spread How wide the distribution is Outliers, squared. Even worse than the mean.
3rd (Skewness) $ \gamma_1 = \frac{\mu_3}{\sigma^3} $ Asymmetry Whether the tail leans left or right. $ \gamma_1 > 0 $ means right-skewed (long tail to the right — income data always looks like this). Small samples; needs $ n \gg 100 $ to stabilize.
4th (Kurtosis) $ \gamma_2 = \frac{\mu_4}{\sigma^4} - 3 $ Tail weight (excess kurtosis) How much probability lives in the tails vs. the center. Normal = 0. Positive = fatter tails than normal (Student's t, Cauchy). Negative = thinner tails (uniform, beta with small parameters). Massive samples needed. Seriously, $ n=200 $ is a joke for kurtosis estimation.

The "-3" in kurtosis is the "excess" convention. Some software reports raw kurtosis (3 for normal), some reports excess (0 for normal). Always check which convention your library uses. SciPy's kurtosis defaults to excess. NumPy doesn't have a built-in kurtosis function at all. Pandas df.kurtosis() gives excess. This has caused real confusion in real meetings.

Higher Moments: Mostly a Theoretical Curiosity

The 5th moment and beyond are almost never used in practice. They're unstable to estimate (needing absurd sample sizes) and rarely have intuitive interpretations. If you find yourself computing the 6th moment of something, you've probably made a wrong turn.

The Location Question: Where Is Your Data?

"There's an average" is possibly the most common statistical utterance on Earth. But "average" means at least three different things, and picking the wrong one is a classic blunder.

Mean, Median, Mode: Pick Your Fighter

Measure Definition Best when... Worst when...
Mean $ \bar{x} = \frac{1}{n}\sum x_i $ Data is symmetric, no/few outliers, you need mathematical tractability Skewed data, heavy tails, you care about "typical" not "expected"
Median Middle value (50th percentile) Skewed data, outliers present, you want "the typical value" You need to do algebra with the result
Mode Most frequent value Discrete data, multimodal distributions Continuous data without binning; can be ambiguous or nonexistent

The mean is the minimizer of squared error: $ \bar{x} = \arg\min_c \sum (x_i - c)^2 $. The median minimizes absolute error: $ \text{median} = \arg\min_c \sum |x_i - c| $. This is why the median doesn't care about outliers — the absolute error penalty doesn't escalate with distance. It's honestly kind of beautiful.

Trimmed and Winsorized Means

If you want a compromise between the mean's efficiency and the median's robustness, you trim:

  • Trimmed mean: Throw away the top and bottom $ \alpha\% $ of the data, then take the mean of the rest. For $ \alpha=0 $ you get the mean; for $ \alpha \to 50\% $ you approach the median. The 10% trimmed mean is a solid default.
  • Winsorized mean: Instead of throwing away extreme values, cap them at a percentile threshold. Less wasteful of data than trimming.
import numpy as np
from scipy import stats

x = np.random.standard_t(df=3, size=1000)  # Heavy-tailed

print(f"Mean:           {np.mean(x):.3f}")       # Jumpy with heavy tails
print(f"Median:         {np.median(x):.3f}")     # Stable
print(f"10% Trimmed:    {stats.trim_mean(x, 0.1):.3f}")  # Best of both
print(f"Winsorized 10%: {stats.mstats.winsorize(x, limits=0.1).mean():.3f}")

The trimmed mean is especially useful when you know contamination is present (bad measurements, fat-fingered data entry) but you don't want to throw away too much signal.

Geometric and Harmonic Means

Not just curiosities — these show up in real problems:

  • Geometric mean: $ \left(\prod x_i\right)^{1/n} $. Use for multiplicative processes: growth rates, investment returns, concentrations. If you average ratios, this is the one you want.
  • Harmonic mean: $ n / \sum (1/x_i) $. Use for rates and ratios where the denominator varies: average speed over a trip with varying speeds, average cost per unit, F1 score in machine learning.
from scipy.stats import gmean, hmean

rates = np.array([1.05, 1.12, 0.98, 1.07, 1.15])  # Annual growth factors
print(f"Arithmetic mean: {rates.mean():.4f}")   # Wrong for compounding
print(f"Geometric mean:  {gmean(rates):.4f}")    # Correct
# Harmonic mean: when you have rates per unit
speeds = np.array([60, 40, 30])  # mph over equal-distance segments
print(f"Hmean speed: {hmean(speeds):.1f} mph")  # The right average speed

People reach for the arithmetic mean by default and it's wrong surprisingly often. If your quantity is a rate, a ratio, or a growth factor, think twice.

The Spread Question: How Unreliable Is Your Location?

Knowing the center is useless without knowing the spread. A mean of 50 with a standard deviation of 1 is a totally different universe from a mean of 50 with a standard deviation of 200.

Standard Deviation and Variance

The variance $ \sigma^2 $ is the average squared deviation from the mean. The standard deviation $ \sigma $ is its square root.

The sample variance uses Bessel's correction: $ s^2 = \frac{1}{n-1}\sum (x_i - \bar{x})^2 $. This makes it an unbiased estimator of the population variance. NumPy's np.var(x) uses $ n $ by default (population); np.var(x, ddof=1) uses $ n-1 $ (sample). Pandas df.var() defaults to $ n-1 $. Yes, this inconsistency is annoying.

Standard deviation is not robust. A single outlier inflates it dramatically. If your data might have contamination, report the IQR or MAD alongside (or instead of) the standard deviation.

IQR: The Interquartile Range

$ \text{IQR} = Q_3 - Q_1 $ — the range of the middle 50% of the data. Robust to outliers, easy to interpret, and the foundation of the box plot.

The IQR relates to the standard deviation under normality: $ \text{IQR} \approx 1.349\sigma $. If your data's IQR is wildly different from this, it's a quick diagnostic that normality is off the table.

MAD: Median Absolute Deviation

$ \text{MAD} = \text{median}(|x_i - \text{median}(x)|) $

This is honestly my favorite robust scale estimator. It's the median's answer to standard deviation. Under normality, $ \text{MAD} \approx 0.6745\sigma $. SciPy reports the scaled version by default:

from scipy.stats import median_abs_deviation

x = np.random.standard_cauchy(size=1000)  # Extreme heavy tails

print(f"Std:        {np.std(x):.3f}")           # Ridiculous, might be huge
print(f"IQR:        {stats.iqr(x):.3f}")          # Stable
print(f"MAD (raw):  {median_abs_deviation(x, scale=1.0):.3f}")   # Stable
print(f"MAD (norm): {median_abs_deviation(x):.3f}")  # Scaled to match σ for normal

For data with even moderate tails, the MAD is way more informative than the standard deviation. If your std is 100 and your MAD is 2, you've got outliers — not a wide distribution.

Range and Min/Max: The Fragile Extremes

The minimum and maximum are the most fragile statistics in existence. They're useful for data validation (is anything outside [0, 100] when it should be?) but never for inference. A single bad data point changes them completely. If you're tempted to normalize by the range ($ x' = (x - \min)/(\max - \min) $), ask yourself whether you trust those two numbers.

Quantiles and Order Statistics

The $ p $-th quantile $ Q(p) $ is the value below which fraction $ p $ of the data falls. The median is $ Q(0.5) $, quartiles are $ Q(0.25), Q(0.5), Q(0.75) $, and percentiles cover the rest.

The Quantile Definition War

There are at least nine different methods for computing sample quantiles — and they disagree on small datasets. Seriously. NumPy defaults to linear interpolation (method 7 in the literature), SciPy uses method 7, R defaults to method 7, and Excel uses a different one. For large $ n $ they all converge; for small $ n $ you can get different answers. Worth knowing about, mostly so you don't panic when R gives you a slightly different quartile than Python.

import numpy as np

x = np.array([1, 2, 3, 4, 5, 6])
print(np.percentile(x, 25))  # 2.25 (default: linear interpolation)

# All 9 methods, if you want to start a fight:
methods = ['inverted_cdf', 'averaged_inverted_cdf', 'closest_observation',
           'interpolated_inverted_cdf', 'hazen', 'weibull', 'linear',
           'median_unbiased', 'normal_unbiased']
for m in methods:
    print(f"  {m:30s}: {np.percentile(x, 25, method=m):.3f}")

The Five-Number Summary

This is my go-to first pass at any dataset, ever:

  1. Minimum
  2. $ Q_1 $ (25th percentile)
  3. Median ($ Q_2 $)
  4. $ Q_3 $ (75th percentile)
  5. Maximum

Add the mean to make it a six-number summary and you've got enough to diagnose skewness, outliers, and whether the distribution is even remotely symmetric. Tukey called these the "five numbers" and they're worth more than most fancy models.

import numpy as np

def five_number_summary(x):
    return {
        'min': np.min(x),
        'Q1': np.percentile(x, 25),
        'median': np.median(x),
        'Q3': np.percentile(x, 75),
        'max': np.max(x),
        # Bonus:
        'mean': np.mean(x),
        'IQR': stats.iqr(x),
    }

Box plots visualize exactly this summary. A box plot is a five-number summary with a mustache. Respect the box plot.

Shape: Skewness and Kurtosis in Practice

Skewness: Which Way Does It Lean?

Right-skewed (positive): the mean is greater than the median. Classic examples: income, response times, anybody's reaction to a new feature. The distribution has a long right tail — most values cluster on the left, a few extreme values pull the mean to the right.

Left-skewed (negative): the mean is less than the median. Less common but not rare: age at death in developed countries (most deaths cluster at older ages, but a left tail of premature deaths exists), or the time before a system fails if it mostly fails early.

A quick heuristic: if $ |\bar{x} - \text{median}| / \sigma > 0.2 $, skewness is probably worth paying attention to.

Kurtosis: What Lives in Your Tails?

High kurtosis ("leptokurtic" if you want to sound fancy) means the tails are fatter than a normal distribution. This is everywhere in real data — financial returns, internet traffic, earthquake magnitudes, word frequencies. The normal distribution is a mathematical convenience, not a fact about the world.

Low kurtosis ("platykurtic") means thinner tails than normal — the data is more concentrated near the mean. Uniform distributions and some bounded beta distributions fall here.

If your data has high kurtosis, the standard deviation is lying to you. It's being inflated by rare extreme events, and it makes the distribution look wider than it actually is for the typical case. This is why you want the IQR and MAD as backup.

from scipy import stats

x_normal = np.random.normal(0, 1, 10000)
x_t = np.random.standard_t(df=5, size=10000)

for name, data in [('Normal', x_normal), ("Student's t (df=5)", x_t)]:
    print(f"{name}:")
    print(f"  Skewness: {stats.skew(data):.4f}")
    print(f"  Kurtosis: {stats.kurtosis(data):.4f} (excess)")
    print(f"  σ={np.std(data):.3f}, IQR={stats.iqr(data):.3f}, MAD={median_abs_deviation(data):.3f}")
    print()

Multivariate Descriptions: When Data Comes in Pairs (or More)

Covariance and Correlation

Covariance: $ \text{Cov}(X, Y) = \frac{1}{n-1}\sum (x_i - \bar{x})(y_i - \bar{y}) $

Pearson correlation: $ \rho_{XY} = \frac{\text{Cov}(X, Y)}{\sigma_X \sigma_Y} $

Pearson correlation measures linear relationship. That word linear is doing a lot of work. A perfect parabola ($ Y = X^2 $ on a symmetric domain) has zero Pearson correlation. Anscombe's quartet — four datasets with identical means, variances, correlations, and regression lines — should be mandatory viewing for anyone who computes a correlation coefficient. Go look it up. It's four datasets that are completely different but have the same summary statistics. The visualizations are hilarious.

import numpy as np
import pandas as pd

df = pd.DataFrame({'x': x, 'y': y, 'z': z})
print(df.corr())       # Pearson by default
print(df.corr('spearman'))  # Rank correlation — handles monotonic nonlinearities
print(df.corr('kendall'))   # Another rank correlation, more robust to outliers

Rank Correlations: Spearman and Kendall

If Pearson is "how linear is this relationship," Spearman is "how monotonic is this relationship." Spearman correlation is just Pearson on the ranks — it captures any relationship where $ Y $ consistently goes up when $ X $ goes up, regardless of whether it's a straight line or a curve.

Kendall's $ \tau $ is based on concordant/discordant pairs and is even more robust to outliers. Use it when your data has weird extreme values and you still want to detect association.

Rule of thumb: Always compute Spearman alongside Pearson. If they're dramatically different, your relationship is nonlinear and Pearson is misleading you.

Correlation Is Not Causation (But You Knew That)

The most repeated phrase in statistics, and people still get it wrong. Two things can be correlated because:

  1. $ X $ causes $ Y $
  2. $ Y $ causes $ X $
  3. $ Z $ causes both $ X $ and $ Y $
  4. Coincidence (especially in high-dimensional datasets)
  5. You conditioned on a collider and created a spurious correlation (Berkson's paradox)

No correlation coefficient can distinguish these. You need a causal model (do-calculus, Pearl's framework) or an experiment. This page is about description, not causation — but you should know where the boundary is.

Histograms: Binning Is a Design Decision

A histogram is a piecewise-constant density estimate. The choice of bin width and bin origin completely changes what you see. Too few bins and you miss structure. Too many bins and you see noise that looks like structure. The bin width problem is a microcosm of the bias-variance tradeoff — and it's worth getting right.

Bin Width Rules of Thumb

Rule Formula Best for...
Sturges' $ k = \lceil \log_2 n + 1 \rceil $ Quick and dirty, tends to undersmooth
Scott's $ h = 3.5\sigma / n^{1/3} $ Normal-ish data
Freedman-Diaconis $ h = 2 \cdot \text{IQR} / n^{1/3} $ Robust to outliers — this is my default
Square root $ k = \lceil \sqrt{n} \rceil $ When you have no other ideas

The Freedman-Diaconis rule is honestly the best general-purpose choice. It uses the IQR instead of the standard deviation, so it doesn't blow up when your data has heavy tails.

import numpy as np
import matplotlib.pyplot as plt

x = np.random.exponential(scale=2, size=5000)

# Let NumPy pick bins automatically (Freedman-Diaconis for uniform, Sturges for others)
plt.hist(x, bins='auto', density=True, alpha=0.7)

# Or specify the rule explicitly:
from scipy.stats import iqr
h = 2 * iqr(x) / (len(x) ** (1/3))
n_bins_fd = int(np.ceil((x.max() - x.min()) / h))
plt.hist(x, bins=n_bins_fd, density=True, alpha=0.7)

The Bin Origin Problem

Shift your bins by half a bin width and the histogram changes — sometimes dramatically. This is an artifact, not a property of the data. Kernel density estimation (next section) solves this problem entirely by smoothing over a continuous kernel instead of hard bin edges.

Kernel Density Estimation: The Histogram, Upgraded

Kernel density estimation replaces the hard bin edges of a histogram with a smooth kernel (usually Gaussian) centered at each data point. The result is a continuous, differentiable density estimate that doesn't depend on bin origin. It's strictly better than a histogram for visualization and for any downstream computation that needs a density.

from scipy.stats import gaussian_kde
import numpy as np

x = np.random.gamma(shape=2, scale=2, size=5000)
kde = gaussian_kde(x)

x_grid = np.linspace(0, 20, 200)
density = kde(x_grid)

# The bandwidth is the crucial parameter — too small = wiggly, too large = oversmoothed
print(f"Bandwidth (Scott's rule): {kde.covariance_factor() * np.std(x):.4f}")
print(f"Bandwidth factor:        {kde.covariance_factor():.4f}")

Bandwidth selection is the art here. Too small and you're essentially plotting the data itself (overfitting). Too large and you smear out real features (underfitting). Scott's rule and Silverman's rule of thumb are the standard defaults, but cross-validation can tune it further if you really care.

Pandas: Statistical Descriptions in One Line

Pandas gives you describe() — the single most useful line of code in exploratory data analysis:

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'normal': np.random.normal(0, 1, 10000),
    'exponential': np.random.exponential(2, 10000),
    't_dist': np.random.standard_t(df=5, size=10000),
})

print(df.describe())
# Output includes: count, mean, std, min, 25%, 50%, 75%, max
# For non-normal data, pay attention to the gap between mean and 50%

# For more percentiles:
print(df.describe(percentiles=[0.01, 0.05, 0.25, 0.50, 0.75, 0.95, 0.99]))

describe() with custom percentiles is your first line of defense against bad data. If the 1st percentile is -5000 and the 99th is +5000 on data you expect to be roughly in [-3, 3], something is wrong — either the data is contaminated or your expectations are.

Statistical Descriptions of Monte Carlo Output

This is where the connection to Numerics/Random Numbers pays off. When you've run a Monte Carlo simulation, you get back an array of estimates. Describing that array tells you about bias, variance, and convergence.

The Monte Carlo Summary

def monte_carlo_summary(estimates, true_value=None):
    """Summarize a vector of Monte Carlo estimates."""
    summary = {
        'n_samples': len(estimates),
        'mean': np.mean(estimates),
        'std': np.std(estimates, ddof=1),
        'sem': stats.sem(estimates),  # Standard error of the mean
        'median': np.median(estimates),
        'IQR': stats.iqr(estimates),
        'skewness': stats.skew(estimates),
        'kurtosis': stats.kurtosis(estimates),
    }
    if true_value is not None:
        summary['bias'] = summary['mean'] - true_value
        summary['rmse'] = np.sqrt(np.mean((estimates - true_value)**2))
    return summary

The SEM (standard error of the mean) is $ \sigma / \sqrt{n} $ — this is how precise your Monte Carlo estimate is. It decreases as $ 1/\sqrt{n} $, which is why Monte Carlo converges so slowly. To halve your error you need 4× the samples. This is the fundamental pain of Monte Carlo and the reason people invest so much in variance reduction.

Convergence Diagnostics

Plot the running mean as samples accumulate. If it hasn't stabilized, your simulation hasn't converged:

running_mean = np.cumsum(estimates) / np.arange(1, len(estimates) + 1)
# Plot this. If it's still trending at the end, keep simulating.

Also useful: the running standard error. If the SEM is still large relative to the effect size you care about, you need more samples.

The Bootstrap: Describing Uncertainty Without Formulas

The bootstrap is one of those ideas that feels like cheating but is mathematically sound. To get a confidence interval for any statistic, just resample your data with replacement, recompute the statistic, and look at the distribution of results. No formulas, no asymptotic approximations, no assumptions about the sampling distribution.

import numpy as np

def bootstrap_ci(data, statistic, n_bootstrap=10000, alpha=0.05):
    """
    Bootstrap confidence interval for an arbitrary statistic.
    Returns (lower, upper) bounds.
    """
    n = len(data)
    boot_stats = np.empty(n_bootstrap)
    for i in range(n_bootstrap):
        sample = np.random.choice(data, size=n, replace=True)
        boot_stats[i] = statistic(sample)
    return np.percentile(boot_stats, [100 * alpha / 2, 100 * (1 - alpha / 2)])

# Example: CI for the skewness of a distribution
x = np.random.exponential(scale=2, size=200)
ci = bootstrap_ci(x, stats.skew, n_bootstrap=10000)
print(f"Skewness: {stats.skew(x):.3f}, 95% CI: [{ci[0]:.3f}, {ci[1]:.3f}]")

The percentile bootstrap (what's shown above) is the simplest variant and works well for most problems. The BCa (bias-corrected and accelerated) bootstrap is more accurate but also more involved — scipy.stats.bootstrap has it built in (SciPy 1.7+).

The bootstrap is honestly the answer to "how do I get an error bar on [anything]?" Just bootstrap it. It's embarrassingly parallel, assumption-light, and works for any statistic you can compute.

Practical Workflow: The First Five Minutes With a New Dataset

When I get a new dataset, here's the (Python) ritual I run before doing anything else:

import numpy as np
import pandas as pd
from scipy import stats
from scipy.stats import median_abs_deviation

def first_look(x, name="data"):
    """The five-minute diagnostic."""
    print(f"=== {name} ===")
    print(f"  n = {len(x)}")
    print(f"  missing = {np.isnan(x).sum()}")

    desc = {
        'mean': np.mean(x),
        'median': np.median(x),
        'std': np.std(x, ddof=1),
        'IQR': stats.iqr(x),
        'MAD': median_abs_deviation(x),
        'skewness': stats.skew(x),
        'kurtosis': stats.kurtosis(x),
        'min': np.min(x),
        'max': np.max(x),
        '1%': np.percentile(x, 1),
        '99%': np.percentile(x, 99),
    }
    for k, v in desc.items():
        print(f"  {k:12s}: {v:.4f}")

    # Quick outlier flag: values beyond 3×IQR from quartiles
    q1, q3 = np.percentile(x, [25, 75])
    iqr_val = q3 - q1
    lower = q1 - 3 * iqr_val
    upper = q3 + 3 * iqr_val
    n_outliers = np.sum((x < lower) | (x > upper))
    if n_outliers > 0:
        print(f"  ⚠ {n_outliers} outlier(s) beyond 3×IQR (lower={lower:.2f}, upper={upper:.2f})")

    # Skew check
    if abs(desc['skewness']) > 1:
        print(f"  ⚠ Substantial skewness ({desc['skewness']:.2f}) — mean ≠ median ≠ typical")
    if desc['kurtosis'] > 2:
        print(f"  ⚠ High kurtosis ({desc['kurtosis']:.2f}) — heavy tails, σ is inflated")
    if abs(desc['mean'] - desc['median']) / (desc['std'] + 1e-10) > 0.2:
        print(f"  ⚠ Mean-median gap suggests skewness or outliers")

This isn't fancy. It's just systematic. The key is getting the five-number summary, the moments, and a few automatic sanity checks before you proceed to modeling. A shocking number of modeling failures would be caught by these five minutes.

Robustness: The Central Tension, Revisited

Here's the thesis of this whole page: most classical statistics are designed for normal distributions, and real data is not normal. The tools that survive contact with reality are the robust ones — median over mean, IQR/MAD over standard deviation, Spearman/Kendall over Pearson, and the bootstrap over asymptotic formulas.

A robust statistic has these properties:

  • Bounded influence function: A single bad data point can't arbitrarily corrupt the estimate. The mean has unbounded influence; the median has bounded influence.
  • High breakdown point: The fraction of contaminated data needed to make the estimate arbitrarily wrong. The median's breakdown point is 50% (you need to corrupt half the data to break it). The mean's breakdown point is 0% (a single bad point breaks it).
  • Efficiency under the assumed model: Robust estimators are typically less efficient than classical ones when the data is normal. This is the tradeoff — you pay a small premium in the ideal case for massive protection in the realistic case. Worth it, almost always.

Summary Table: Which Statistic When

You want to know... If data is roughly normal If data is skewed / heavy-tailed / contaminated
Center Mean Median, or trimmed mean
Spread Standard deviation IQR or MAD
Association Pearson $ r $ Spearman $ \rho $ or Kendall $ \tau $
Shape Skewness & kurtosis Skewness & kurtosis (but need larger sample), or just look at a histogram/KDE
Confidence interval $ \bar{x} \pm t \cdot \text{SE} $ Bootstrap percentile CI
Outlier detection $ 3\sigma $ rule $ 1.5 \times \text{IQR} $ rule (Tukey's fences)

See Also