Numerics/Interpolation Extrapolation
From charlesreid1
Interpolation & Extrapolation is about constructing new data points from a discrete set of known data points. Interpolation fills gaps *within* the known range; extrapolation ventures *outside* it — which is almost always a terrible idea.
The One-Sentence Opinion
Use cubic splines for smooth data, linear interpolation when you're in a hurry or the data is dense, and never trust an extrapolation unless you have a physical model backing it up. Polynomial interpolation of high degree is a trap — Runge's phenomenon will ruin your day.
Interpolation vs. Extrapolation
| Interpolation | Extrapolation | |
|---|---|---|
| What | Estimate \(y\) for \(x\) inside the convex hull of known points | Estimate \(y\) for \(x\) outside the known range |
| Reliability | Generally trustworthy with a sensible method | Fragile, unreliable, sometimes catastrophically wrong |
| Use case | Upsampling, regridding, filling missing data | Forecasting, extending a trend — but you probably shouldn't |
| Opinion | Go ahead | Don't. Or at least know exactly why you're doing it. |
Linear Interpolation
The simplest method: connect the dots with straight lines. Between \((x_i, y_i)\) and \((x_{i+1}, y_{i+1})\):
\[ y(x) = y_i + \frac{(y_{i+1} - y_i)(x - x_i)}{x_{i+1} - x_i} \]
When to use it
- Densely sampled data (point spacing much smaller than feature scale)
- You don't care about derivative continuity
- Quick visualization
When not to
- Sparse data with curvature — you'll get visible kinks
- You need a smooth derivative
- Almost never for extrapolation (the slope at the last segment is arbitrary)
Python
import numpy as np
def linear_interpolate(x_known, y_known, x_query):
"""Linear interpolation. x_known must be sorted."""
idx = np.searchsorted(x_known, x_query)
# Clamp to valid range
idx = np.clip(idx, 1, len(x_known) - 1)
x_left, x_right = x_known[idx - 1], x_known[idx]
y_left, y_right = y_known[idx - 1], y_known[idx]
t = (x_query - x_left) / (x_right - x_left)
return y_left + t * (y_right - y_left)
Or just use NumPy:
y_query = np.interp(x_query, x_known, y_known)
np.interp is fast, vectorized, and does exactly what you think. It's the right tool about 40% of the time.
Polynomial Interpolation
Fit a single polynomial \(P_n(x)\) of degree \(n-1\) through \(n\) points. The polynomial exists and is unique, but that doesn't mean it's a good idea.
Vandermonde (don't do this)
The obvious approach — solve \(V \mathbf{c} = \mathbf{y}\) where \(V_{ij} = x_i^{\,j}\) — is numerically unstable for anything beyond degree 5 or 6. The Vandermonde matrix is ill-conditioned.
Lagrange Form
\[ P(x) = \sum_{i=0}^{n-1} y_i \prod_{j \neq i} \frac{x - x_j}{x_i - x_j} \]
Mathematically elegant, computationally inefficient (\(O(n^2)\) per evaluation), and prone to massive oscillations between points — Runge's phenomenon.
Runge's Phenomenon
If you interpolate \(f(x) = 1/(1 + 25x^2)\) on \([-1,1]\) with equispaced points, a high-degree polynomial goes berserk near the endpoints. The cure is *not* more points — that makes it worse. The cure is don't use high-degree polynomials.
Python (for when you must)
from numpy.polynomial import Polynomial
p = Polynomial.fit(x_known, y_known, deg=len(x_known)-1)
y_query = p(x_query)
Polynomial.fit uses a shifted-and-scaled basis internally, which helps with conditioning but does nothing to fix Runge's phenomenon.
When to use polynomial interpolation
- Degree ≤ 3, and you know what you're doing
- You're deriving a numerical method (e.g., Gaussian quadrature nodes)
- That's about it
Cubic Splines — The Workhorse
A cubic spline fits piecewise cubic polynomials that match in value, first derivative, and second derivative at each knot. The result is visually smooth, computationally cheap to evaluate, and free from Runge-style oscillations.
Varieties
- Natural spline
- \(f = 0\) at endpoints. Simple, but can wobble near boundaries.
- Clamped / "complete" spline
- You specify \(f'\) at endpoints if you know it. Uncommon in practice.
- Not-a-knot spline
- Third derivative continuous across the second and penultimate knots. Default in MATLAB and
scipy.interpolate.CubicSpline. Usually the safe choice.
- Monotone spline (PCHIP)
- Preserves monotonicity of the data. If the data is monotone increasing, the interpolant is too — no spurious overshoots. Less smooth (only \(C^1\)).
Python: Cubic Spline
from scipy.interpolate import CubicSpline
import numpy as np
x = np.array([0., 1., 2., 3., 4.])
y = np.array([0., 2., 1., 3., 2.])
cs = CubicSpline(x, y, bc_type='not-a-knot')
# bc_type='natural' for natural spline
# bc_type='clamped' for derivative-specified
x_dense = np.linspace(0, 4, 200)
y_dense = cs(x_dense) # evaluate
dy_dense = cs(x_dense, 1) # first derivative
d2y_dense = cs(x_dense, 2) # second derivative
Python: PCHIP (monotone)
from scipy.interpolate import PchipInterpolator
pchip = PchipInterpolator(x, y)
y_dense = pchip(x_dense)
When to use splines
- Your default choice for smooth 1-D interpolation
- Data with curvature that linear can't capture
- You want \(C^2\) continuity and visually pleasing results
- PCHIP when you can't tolerate overshoots (e.g., physical quantities that must stay positive)
Nearest-Neighbor Interpolation
Trivial: pick the value at the nearest known point. Piecewise constant, \(C^{-1}\) (discontinuous).
from scipy.interpolate import interp1d
nn = interp1d(x, y, kind='nearest')
y_query = nn(x_query)
Useful for categorical data, color lookups, or when you explicitly want to avoid introducing new values.
Radial Basis Functions (RBF)
For scattered (non-gridded) data in multiple dimensions, RBF interpolation fits:
\[ s(\mathbf{x}) = \sum_i w_i \,\phi(\|\mathbf{x} - \mathbf{x}_i\|) \]
where \(\phi\) is a radial basis function (Gaussian, multiquadric, thin-plate spline, etc.).
from scipy.interpolate import RBFInterpolator
rbf = RBFInterpolator(xy_known, z_known, kernel='thin_plate_spline')
z_query = rbf(xy_query)
RBF scales poorly (\(O(n^2)\) memory, \(O(n^3)\) solve). For large scattered datasets, consider local methods or Gaussian process regression.
Extrapolation: Here Be Dragons
Extrapolation assumes the pattern observed in the data continues outside the measured range. This assumption is wildly unjustified in most real-world settings.
Why extrapolation is dangerous
- A polynomial that fits your data perfectly can diverge to \(\pm\infty\) immediately outside the range.
- A spline extrapolates linearly by default — the slope at the last segment is an artifact of your last few points.
- The real world has phase transitions, saturation, and regime changes you haven't observed.
The only defensible extrapolation
- You have a physical model (e.g., \(y = A e^{-Bx}\)) and you're fitting parameters, not blindly extending a curve.
- You're extrapolating a tiny distance relative to your point spacing and the function is known to be smooth.
- You explicitly propagate the uncertainty and it's enormous — which is honest.
Python: enabling extrapolation in SciPy
By default, CubicSpline and interp1d raise ValueError outside bounds (which is the right default). To override:
cs = CubicSpline(x, y, extrapolate=True) # linear extrapolation
# or
f = interp1d(x, y, kind='cubic', fill_value='extrapolate')
If you find yourself setting extrapolate=True, stop and ask why.
Opinionated Decision Flowchart
Do you have a physical model?
├─ Yes → Fit the model parameters. Don't interpolate arbitrarily.
└─ No → Are the x-values outside your known range?
├─ Yes → STOP. You're extrapolating. Reconsider.
└─ No → Is your data dense relative to feature scale?
├─ Yes → np.interp (linear). Done.
└─ No → Do you need C² smoothness?
├─ Yes → CubicSpline (not-a-knot)
└─ No → Is monotonicity critical?
├─ Yes → PchipInterpolator
└─ No → CubicSpline anyway; it's fine.
Summary Table
| Method | Smoothness | Speed | Numerical Stability | Recommendation |
|---|---|---|---|---|
np.interp (linear) |
\(C^0\) | Fastest | Excellent | Good for dense data |
| Polynomial (high degree) | \(C^\infty\) | Fast eval, slow fit | Terrible | Avoid |
CubicSpline |
\(C^2\) | Fast | Good | Default choice |
PchipInterpolator |
\(C^1\) | Fast | Good | When overshoots are unacceptable |
| RBF | \(C^\infty\) | Slow (\(O(n^3)\)) | OK for moderate n | Scattered multi-D data |
| Nearest-neighbor | \(C^{-1}\) | Fastest | Trivial | Categorical data only |
See Also
- Numerics/Integration of Functions — splines are used extensively in quadrature
- Numerics/Solving Linear Equations — Vandermonde and tridiagonal systems underpin interpolation
- Numerics/Fast Fourier Transform — bandlimited interpolation via sinc functions
- SciPy interpolation module