From charlesreid1

(Create Numerics/Interpolation Extrapolation page: opinionated Python-centric guide to interpolation and extrapolation methods (via create-page on MediaWiki MCP Server))
 
(Fix formatting: replace Markdown backtick code blocks with MediaWiki <syntaxhighlight>/<pre> tags; add border="1" to wikitable declarations (via update-page on MediaWiki MCP Server))
 
Line 7: Line 7:
== Interpolation vs. Extrapolation ==
== Interpolation vs. Extrapolation ==


{| class="wikitable"
{| class="wikitable" border="1"
! !! Interpolation !! Extrapolation
! !! Interpolation !! Extrapolation
|-
|-
Line 39: Line 39:
=== Python ===
=== Python ===


```python
<syntaxhighlight lang="python">
import numpy as np
import numpy as np


Line 51: Line 51:
     t = (x_query - x_left) / (x_right - x_left)
     t = (x_query - x_left) / (x_right - x_left)
     return y_left + t * (y_right - y_left)
     return y_left + t * (y_right - y_left)
```
</syntaxhighlight>


Or just use NumPy:
Or just use NumPy:


```python
<syntaxhighlight lang="python">
y_query = np.interp(x_query, x_known, y_known)
y_query = np.interp(x_query, x_known, y_known)
```
</syntaxhighlight>


<code>np.interp</code> is fast, vectorized, and does exactly what you think. It's the right tool about 40% of the time.
<code>np.interp</code> is fast, vectorized, and does exactly what you think. It's the right tool about 40% of the time.
Line 83: Line 83:
=== Python (for when you must) ===
=== Python (for when you must) ===


```python
<syntaxhighlight lang="python">
from numpy.polynomial import Polynomial
from numpy.polynomial import Polynomial


p = Polynomial.fit(x_known, y_known, deg=len(x_known)-1)
p = Polynomial.fit(x_known, y_known, deg=len(x_known)-1)
y_query = p(x_query)
y_query = p(x_query)
```
</syntaxhighlight>


<code>Polynomial.fit</code> uses a shifted-and-scaled basis internally, which helps with conditioning but does nothing to fix Runge's phenomenon.
<code>Polynomial.fit</code> uses a shifted-and-scaled basis internally, which helps with conditioning but does nothing to fix Runge's phenomenon.
Line 117: Line 117:
=== Python: Cubic Spline ===
=== Python: Cubic Spline ===


```python
<syntaxhighlight lang="python">
from scipy.interpolate import CubicSpline
from scipy.interpolate import CubicSpline
import numpy as np
import numpy as np
Line 132: Line 132:
dy_dense = cs(x_dense, 1)      # first derivative
dy_dense = cs(x_dense, 1)      # first derivative
d2y_dense = cs(x_dense, 2)    # second derivative
d2y_dense = cs(x_dense, 2)    # second derivative
```
</syntaxhighlight>


=== Python: PCHIP (monotone) ===
=== Python: PCHIP (monotone) ===


```python
<syntaxhighlight lang="python">
from scipy.interpolate import PchipInterpolator
from scipy.interpolate import PchipInterpolator


pchip = PchipInterpolator(x, y)
pchip = PchipInterpolator(x, y)
y_dense = pchip(x_dense)
y_dense = pchip(x_dense)
```
</syntaxhighlight>


=== When to use splines ===
=== When to use splines ===
Line 153: Line 153:
Trivial: pick the value at the nearest known point. Piecewise constant, \(C^{-1}\) (discontinuous).
Trivial: pick the value at the nearest known point. Piecewise constant, \(C^{-1}\) (discontinuous).


```python
<syntaxhighlight lang="python">
from scipy.interpolate import interp1d
from scipy.interpolate import interp1d


nn = interp1d(x, y, kind='nearest')
nn = interp1d(x, y, kind='nearest')
y_query = nn(x_query)
y_query = nn(x_query)
```
</syntaxhighlight>


Useful for categorical data, color lookups, or when you explicitly want to avoid introducing new values.
Useful for categorical data, color lookups, or when you explicitly want to avoid introducing new values.
Line 172: Line 172:
where \(\phi\) is a radial basis function (Gaussian, multiquadric, thin-plate spline, etc.).
where \(\phi\) is a radial basis function (Gaussian, multiquadric, thin-plate spline, etc.).


```python
<syntaxhighlight lang="python">
from scipy.interpolate import RBFInterpolator
from scipy.interpolate import RBFInterpolator


rbf = RBFInterpolator(xy_known, z_known, kernel='thin_plate_spline')
rbf = RBFInterpolator(xy_known, z_known, kernel='thin_plate_spline')
z_query = rbf(xy_query)
z_query = rbf(xy_query)
```
</syntaxhighlight>


RBF scales poorly (\(O(n^2)\) memory, \(O(n^3)\) solve). For large scattered datasets, consider local methods or Gaussian process regression.
RBF scales poorly (\(O(n^2)\) memory, \(O(n^3)\) solve). For large scattered datasets, consider local methods or Gaussian process regression.
Line 199: Line 199:
By default, <code>CubicSpline</code> and <code>interp1d</code> raise <code>ValueError</code> outside bounds (which is the right default). To override:
By default, <code>CubicSpline</code> and <code>interp1d</code> raise <code>ValueError</code> outside bounds (which is the right default). To override:


```python
<syntaxhighlight lang="python">
cs = CubicSpline(x, y, extrapolate=True)  # linear extrapolation
cs = CubicSpline(x, y, extrapolate=True)  # linear extrapolation
# or
# or
f = interp1d(x, y, kind='cubic', fill_value='extrapolate')
f = interp1d(x, y, kind='cubic', fill_value='extrapolate')
```
</syntaxhighlight>


If you find yourself setting <code>extrapolate=True</code>, stop and ask why.
If you find yourself setting <code>extrapolate=True</code>, stop and ask why.
Line 209: Line 209:
== Opinionated Decision Flowchart ==
== Opinionated Decision Flowchart ==


```
<pre>
Do you have a physical model?
Do you have a physical model?
  ├─ Yes → Fit the model parameters. Don't interpolate arbitrarily.
  ├─ Yes → Fit the model parameters. Don't interpolate arbitrarily.
Line 221: Line 221:
                                         ├─ Yes → PchipInterpolator
                                         ├─ Yes → PchipInterpolator
                                         └─ No  → CubicSpline anyway; it's fine.
                                         └─ No  → CubicSpline anyway; it's fine.
```
</pre>


== Summary Table ==
== Summary Table ==


{| class="wikitable"
{| class="wikitable" border="1"
! Method !! Smoothness !! Speed !! Numerical Stability !! Recommendation
! Method !! Smoothness !! Speed !! Numerical Stability !! Recommendation
|-
|-

Latest revision as of 06:12, 20 June 2026

Interpolation & Extrapolation is about constructing new data points from a discrete set of known data points. Interpolation fills gaps *within* the known range; extrapolation ventures *outside* it — which is almost always a terrible idea.

The One-Sentence Opinion

Use cubic splines for smooth data, linear interpolation when you're in a hurry or the data is dense, and never trust an extrapolation unless you have a physical model backing it up. Polynomial interpolation of high degree is a trap — Runge's phenomenon will ruin your day.

Interpolation vs. Extrapolation

Interpolation Extrapolation
What Estimate \(y\) for \(x\) inside the convex hull of known points Estimate \(y\) for \(x\) outside the known range
Reliability Generally trustworthy with a sensible method Fragile, unreliable, sometimes catastrophically wrong
Use case Upsampling, regridding, filling missing data Forecasting, extending a trend — but you probably shouldn't
Opinion Go ahead Don't. Or at least know exactly why you're doing it.

Linear Interpolation

The simplest method: connect the dots with straight lines. Between \((x_i, y_i)\) and \((x_{i+1}, y_{i+1})\):

\[ y(x) = y_i + \frac{(y_{i+1} - y_i)(x - x_i)}{x_{i+1} - x_i} \]

When to use it

  • Densely sampled data (point spacing much smaller than feature scale)
  • You don't care about derivative continuity
  • Quick visualization

When not to

  • Sparse data with curvature — you'll get visible kinks
  • You need a smooth derivative
  • Almost never for extrapolation (the slope at the last segment is arbitrary)

Python

import numpy as np

def linear_interpolate(x_known, y_known, x_query):
    """Linear interpolation. x_known must be sorted."""
    idx = np.searchsorted(x_known, x_query)
    # Clamp to valid range
    idx = np.clip(idx, 1, len(x_known) - 1)
    x_left, x_right = x_known[idx - 1], x_known[idx]
    y_left, y_right = y_known[idx - 1], y_known[idx]
    t = (x_query - x_left) / (x_right - x_left)
    return y_left + t * (y_right - y_left)

Or just use NumPy:

y_query = np.interp(x_query, x_known, y_known)

np.interp is fast, vectorized, and does exactly what you think. It's the right tool about 40% of the time.

Polynomial Interpolation

Fit a single polynomial \(P_n(x)\) of degree \(n-1\) through \(n\) points. The polynomial exists and is unique, but that doesn't mean it's a good idea.

Vandermonde (don't do this)

The obvious approach — solve \(V \mathbf{c} = \mathbf{y}\) where \(V_{ij} = x_i^{\,j}\) — is numerically unstable for anything beyond degree 5 or 6. The Vandermonde matrix is ill-conditioned.

Lagrange Form

\[ P(x) = \sum_{i=0}^{n-1} y_i \prod_{j \neq i} \frac{x - x_j}{x_i - x_j} \]

Mathematically elegant, computationally inefficient (\(O(n^2)\) per evaluation), and prone to massive oscillations between points — Runge's phenomenon.

Runge's Phenomenon

If you interpolate \(f(x) = 1/(1 + 25x^2)\) on \([-1,1]\) with equispaced points, a high-degree polynomial goes berserk near the endpoints. The cure is *not* more points — that makes it worse. The cure is don't use high-degree polynomials.

Python (for when you must)

from numpy.polynomial import Polynomial

p = Polynomial.fit(x_known, y_known, deg=len(x_known)-1)
y_query = p(x_query)

Polynomial.fit uses a shifted-and-scaled basis internally, which helps with conditioning but does nothing to fix Runge's phenomenon.

When to use polynomial interpolation

  • Degree ≤ 3, and you know what you're doing
  • You're deriving a numerical method (e.g., Gaussian quadrature nodes)
  • That's about it

Cubic Splines — The Workhorse

A cubic spline fits piecewise cubic polynomials that match in value, first derivative, and second derivative at each knot. The result is visually smooth, computationally cheap to evaluate, and free from Runge-style oscillations.

Varieties

Natural spline
\(f = 0\) at endpoints. Simple, but can wobble near boundaries.
Clamped / "complete" spline
You specify \(f'\) at endpoints if you know it. Uncommon in practice.
Not-a-knot spline
Third derivative continuous across the second and penultimate knots. Default in MATLAB and scipy.interpolate.CubicSpline. Usually the safe choice.
Monotone spline (PCHIP)
Preserves monotonicity of the data. If the data is monotone increasing, the interpolant is too — no spurious overshoots. Less smooth (only \(C^1\)).

Python: Cubic Spline

from scipy.interpolate import CubicSpline
import numpy as np

x = np.array([0., 1., 2., 3., 4.])
y = np.array([0., 2., 1., 3., 2.])

cs = CubicSpline(x, y, bc_type='not-a-knot')
# bc_type='natural' for natural spline
# bc_type='clamped' for derivative-specified

x_dense = np.linspace(0, 4, 200)
y_dense = cs(x_dense)          # evaluate
dy_dense = cs(x_dense, 1)      # first derivative
d2y_dense = cs(x_dense, 2)     # second derivative

Python: PCHIP (monotone)

from scipy.interpolate import PchipInterpolator

pchip = PchipInterpolator(x, y)
y_dense = pchip(x_dense)

When to use splines

  • Your default choice for smooth 1-D interpolation
  • Data with curvature that linear can't capture
  • You want \(C^2\) continuity and visually pleasing results
  • PCHIP when you can't tolerate overshoots (e.g., physical quantities that must stay positive)

Nearest-Neighbor Interpolation

Trivial: pick the value at the nearest known point. Piecewise constant, \(C^{-1}\) (discontinuous).

from scipy.interpolate import interp1d

nn = interp1d(x, y, kind='nearest')
y_query = nn(x_query)

Useful for categorical data, color lookups, or when you explicitly want to avoid introducing new values.

Radial Basis Functions (RBF)

For scattered (non-gridded) data in multiple dimensions, RBF interpolation fits:

\[ s(\mathbf{x}) = \sum_i w_i \,\phi(\|\mathbf{x} - \mathbf{x}_i\|) \]

where \(\phi\) is a radial basis function (Gaussian, multiquadric, thin-plate spline, etc.).

from scipy.interpolate import RBFInterpolator

rbf = RBFInterpolator(xy_known, z_known, kernel='thin_plate_spline')
z_query = rbf(xy_query)

RBF scales poorly (\(O(n^2)\) memory, \(O(n^3)\) solve). For large scattered datasets, consider local methods or Gaussian process regression.

Extrapolation: Here Be Dragons

Extrapolation assumes the pattern observed in the data continues outside the measured range. This assumption is wildly unjustified in most real-world settings.

Why extrapolation is dangerous

  • A polynomial that fits your data perfectly can diverge to \(\pm\infty\) immediately outside the range.
  • A spline extrapolates linearly by default — the slope at the last segment is an artifact of your last few points.
  • The real world has phase transitions, saturation, and regime changes you haven't observed.

The only defensible extrapolation

  1. You have a physical model (e.g., \(y = A e^{-Bx}\)) and you're fitting parameters, not blindly extending a curve.
  2. You're extrapolating a tiny distance relative to your point spacing and the function is known to be smooth.
  3. You explicitly propagate the uncertainty and it's enormous — which is honest.

Python: enabling extrapolation in SciPy

By default, CubicSpline and interp1d raise ValueError outside bounds (which is the right default). To override:

cs = CubicSpline(x, y, extrapolate=True)   # linear extrapolation
# or
f = interp1d(x, y, kind='cubic', fill_value='extrapolate')

If you find yourself setting extrapolate=True, stop and ask why.

Opinionated Decision Flowchart

Do you have a physical model?
 ├─ Yes → Fit the model parameters. Don't interpolate arbitrarily.
 └─ No  → Are the x-values outside your known range?
           ├─ Yes → STOP. You're extrapolating. Reconsider.
           └─ No  → Is your data dense relative to feature scale?
                     ├─ Yes → np.interp (linear). Done.
                     └─ No  → Do you need C² smoothness?
                               ├─ Yes → CubicSpline (not-a-knot)
                               └─ No  → Is monotonicity critical?
                                         ├─ Yes → PchipInterpolator
                                         └─ No  → CubicSpline anyway; it's fine.

Summary Table

Method Smoothness Speed Numerical Stability Recommendation
np.interp (linear) \(C^0\) Fastest Excellent Good for dense data
Polynomial (high degree) \(C^\infty\) Fast eval, slow fit Terrible Avoid
CubicSpline \(C^2\) Fast Good Default choice
PchipInterpolator \(C^1\) Fast Good When overshoots are unacceptable
RBF \(C^\infty\) Slow (\(O(n^3)\)) OK for moderate n Scattered multi-D data
Nearest-neighbor \(C^{-1}\) Fastest Trivial Categorical data only

See Also