Revision as of 21:09, 24 June 2026

Notes

Image segmentation is the process of partitioning a digital image into multiple segments (sets of pixels, also called superpixels or regions) to simplify or change the representation of the image into something more meaningful and easier to analyze. Minimum spanning trees (MSTs) provide an elegant, graph-based approach to this problem.

Intuition

An image can be modeled as a graph where each pixel is a vertex and edges connect neighboring pixels (typically 4-connected or 8-connected neighborhoods). Edge weights represent the dissimilarity between the two pixels — for example, the absolute difference in intensity (grayscale) or the Euclidean distance in RGB color space. The intuition is that pixels within the same object tend to have similar colors/intensities (low edge weights), while pixels on object boundaries have large differences (high edge weights).

The MST of this pixel graph captures the "backbone" of low-dissimilarity connections. Because an MST minimizes the total sum of edge weights, it preferentially connects similar pixels. By cutting the MST at edges whose weight exceeds some threshold, the image can be segmented into homogeneous regions.

Graph Construction

Vertices

Each pixel $p \in P$ of the image becomes a vertex $v_p \in V$ . For an image of size $W \times H$ , the graph has $|V| = W \times H$ vertices.

Edges

Edges are added between neighboring pixels. Common neighborhood structures:

4-connected grid: each pixel connects to its top, bottom, left, and right neighbors. Maximum degree 4.
8-connected grid: each pixel connects to all eight surrounding pixels. Maximum degree 8.

For an $W \times H$ image with 4-connectivity, the graph has approximately $|E| \approx 2WH - W - H$ undirected edges.

Edge Weights

The weight of an edge $$ e = (v_p, v_q) $$ measures dissimilarity:

Grayscale images: $$ w(e) = |I(p) - I(q)| $$ where $$ I(p) $$ is the intensity of pixel $$ p $$ (0–255).

Color images (RGB): $w(e) = \sqrt{ (R_p - R_q)^2 + (G_p - G_q)^2 + (B_p - B_q)^2 }$

Additional feature vectors can incorporate texture, position, or higher-level descriptors. The weight function is domain-specific.

MST-Based Segmentation: Basic Approach

Single-Threshold Method

The simplest MST-based segmentation works as follows:

Build graph G = (V, E) from the image.
Compute the MST T of G.
Choose a threshold τ.
Remove all edges of T with weight > τ.
The remaining connected components are the segments.

This approach segments the image into regions where all intra-region pixel differences are below $\tau$ . The threshold $\tau$ controls the granularity: lower values produce more, smaller segments; higher values produce fewer, larger segments.

Limitations of Single Threshold

A global threshold fails when different regions of the image have different natural levels of contrast. A shadowed region may have small edge weights throughout (requiring a low threshold), while a high-contrast region may need a high threshold to avoid over-segmentation.

Felzenszwalb–Huttenlocher Algorithm

The seminal graph-based segmentation algorithm by Felzenszwalb and Huttenlocher (2004) addresses the global-threshold problem by introducing adaptive, component-aware thresholds.

Key Idea

Rather than a fixed threshold, each component $$ C $$ defines its internal difference:

$\text{Int}(C) = \max_{e \in \text{MST}(C)} w(e)$

That is, the maximum edge weight in the component's MST. This is the minimum threshold at which the component would be split.

The minimum internal difference between two components is:

$\text{MInt}(C_1, C_2) = \min\left( \text{Int}(C_1) + \frac{k}{|C_1|}, \quad \text{Int}(C_2) + \frac{k}{|C_2|} \right)$

where $$ k $$ is a scale parameter and $$ |C| $$ is the number of pixels in the component. The term $$ k/|C| $$ makes smaller components more tolerant of merging — without it, isolated pixels with small edge weights would never merge.

Two adjacent components are merged if the minimum-weight edge between them is less than or equal to $\text{MInt}(C_1, C_2)$ .

Algorithm

FelzenszwalbHuttenlocher(G, k):
    sort edges by increasing weight
    initialize each vertex as its own component
    threshold[c] = k  for each component c with size 1

    for each edge e = (u, v) in sorted order:
        find components C1 containing u, C2 containing v
        if C1 != C2:
            if w(e) <= min(threshold[C1], threshold[C2]):
                merge C1 into C2 (or vice versa)
                new_c = merged component
                Int[new_c] = w(e)  // largest edge in new MST
                threshold[new_c] = Int[new_c] + k / |new_c|

Complexity

Sorting edges: $O(|E| \log |E|)$
Union-find operations: $O(|E| \cdot \alpha(|V|))$ (nearly linear)
Overall: $O(N \log N)$ for an image with $$ N $$ pixels and constant neighborhood size.

Parameters

k (scale): governs preference for larger components. Larger $$ k $$ yields larger segments. Typical range: 50–500.
σ (sigma): pre-smoothing with a Gaussian of width σ before computing edge weights reduces noise sensitivity.
min_size: a post-processing step merges components smaller than min_size into their nearest neighbor.

Worked Example (Grayscale, 1D)

Consider a 1×8 grayscale strip with intensities:

Pixel:  0    1    2    3    4    5    6    7
Intensity: 10  12  13  50  52  51  90  92

Edge weights (absolute differences):

(0,1): 2    (1,2): 1    (2,3): 37   (3,4): 2
(4,5): 1    (5,6): 39   (6,7): 2

Sort edges by weight: (1,2)=1, (4,5)=1, (0,1)=2, (3,4)=2, (6,7)=2, (2,3)=37, (5,6)=39

Kruskal construction of MST: all edges except the largest edges that would complete a cycle. Here the graph is a line, so all edges are in the MST.

MST edges sorted: 1, 1, 2, 2, 2, 37, 39

Single-threshold τ = 10: remove edges > 10 → break at (2,3)=37 and (5,6)=39. Segments: {0,1,2}, {3,4,5}, {6,7} — three segments matching the natural intensity clusters.

Single-threshold τ = 3: remove edges > 3 → break at 37 and 39, plus (0,1)=2 and (3,4)=2 and (6,7)=2 stay. Segments: {0,1,2}, {3,4,5}, {6,7} — same result here because the internal edges are ≤3.

Single-threshold τ = 1: remove edges > 1. Segments: {0}, {1,2}, {3}, {4,5}, {6}, {7} — over-segmented.

2D Example

Consider a simple 4×4 grayscale image:

10  10  80  80
10  12  82  78
12  11  79  81
11  10  77  80

There is a clear boundary between the left half (dark, values ~10) and right half (bright, values ~80). Edge weights between columns 1–2 and 2–3 are small (~0–2) within each region, while edge weights across the vertical midline (between columns 2 and 3) are large (~65–70).

The MST connects all pixels within each half using low-weight edges and crosses the boundary at exactly one edge (the smallest cross-boundary difference). Cutting that single edge yields two perfect segments — the left and right halves of the image.

This property generalizes: in an image where objects are separated by high-contrast boundaries, the MST crosses each object boundary at a single edge (or a small number of edges), and removing those cuts segments the image.

Extensions and Variants

Iterative MST Segmentation

After segmenting with a threshold, treat each segment as a super-vertex and build a new graph. This hierarchical approach captures structure at multiple scales.

Interactive Segmentation

The user marks certain pixels as foreground or background. Edge weights to marked pixels are modified (boosted or penalized) before computing the MST, guiding the segmentation to respect user input. This is closely related to graph-cut methods (see Graphs/Max Flow).

Minimum Spanning Forest with Seeds

Given seed points for each region, compute a minimum spanning forest (MSF) — an MST where each tree is rooted at a seed. Every pixel is assigned to the nearest seed under the MST distance metric. This is equivalent to running Prim's algorithm simultaneously from all seeds.

Normalized Cuts via MST

The MST can approximate the Normalized Cut criterion by identifying the minimum-cost partition of the MST. While exact normalized cuts are NP-hard, the MST provides an efficient approximation.

Comparison with Other Methods

Comparison of Segmentation Approaches
Method	Graph-Based?	Complexity	Key Idea
MST (single threshold)	Yes	$O(N \log N)$	Global edge-weight threshold
Felzenszwalb–Huttenlocher	Yes	$O(N \log N)$	Adaptive per-component thresholds
K-means (color clustering)	No	$$ O(N K I) $$	Clusters pixels by color, ignores spatial layout
Mean shift	No	$$ O(N^2) $$ naive	Mode-seeking in feature space
Graph cuts (min-cut)	Yes	$O(N^{1.5})$ approx	Global energy minimization with source/sink
Watershed	No	$O(N \log N)$	Topographic flooding from local minima

MST-based methods are attractive because they are efficient, deterministic, produce hierarchical segmentations, and naturally respect both spatial and feature information.

Advantages and Disadvantages

Advantages

Efficiency: $O(N \log N)$ runtime for $$ N $$ pixels.
Global optimality: the MST minimizes total connection cost, ensuring internal homogeneity.
Spatial coherence: segments are guaranteed to be connected in the image grid.
Hierarchical: the MST naturally captures segmentations at all scales — cutting at different thresholds yields a dendrogram of regions.
No prior knowledge: unlike k-means, the number of segments does not need to be specified in advance.

Disadvantages

Sensitive to noise: isolated noisy pixels create spurious small segments. Gaussian pre-smoothing helps.
Boundary leakage: if an object boundary has even one low-weight pixel (e.g., a thin bridge), the MST crosses it and merges the regions. Post-processing with contour-based refinement can mitigate this.
Parameter tuning: the scale parameter $$ k $$ must be chosen appropriately for each image domain.
Global structure: the MST does not directly model high-level shape or texture.

Applications

Medical imaging: segmenting organs, tumors, or lesions in CT/MRI scans.
Object recognition: pre-processing step to generate superpixels for downstream classifiers.
Satellite imagery: partitioning land cover types in remote sensing.
Video segmentation: extending the graph into the temporal dimension with 3D neighborhoods.
Interactive photo editing: selecting foreground objects with minimal user strokes.

Related Pages

Graphs/Minimum Spanning Tree — parent page: MST definition, properties, and classic algorithms
Graphs/Cluster Finding — uses MSTs for clustering in general graphs
Graphs/Kruskal — Kruskal's algorithm (basis of Felzenszwalb–Huttenlocher)
Graphs/Prim Jarnik — Prim's algorithm (basis of seeded segmentation)
Graphs/Max Flow — min-cut / max-flow methods for segmentation
Graphs/Connectivity — graph connectivity fundamentals

References

Felzenszwalb, P. F., & Huttenlocher, D. P. (2004). "Efficient Graph-Based Image Segmentation." International Journal of Computer Vision, 59(2), 167–181.
Zahn, C. T. (1971). "Graph-Theoretical Methods for Detecting and Describing Gestalt Clusters." IEEE Transactions on Computers, C-20(1), 68–86. (Pioneering MST-based clustering.)
Shi, J., & Malik, J. (2000). "Normalized Cuts and Image Segmentation." IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8), 888–905.

Flags

@@ Line 47: / Line 47: @@
 The simplest MST-based segmentation works as follows:
-<pre>
+# Build graph G = (V, E) from the image.
-. Build graph G = (V, E) from the image.
+# Compute the MST T of G.
-. Compute the MST T of G.
+# Choose a threshold τ.
-. Choose a threshold τ.
+# Remove all edges of T with weight > τ.
-. Remove all edges of T with weight > τ.
+# The remaining connected components are the segments.
-. The remaining connected components are the segments.
-</pre>
 This approach segments the image into regions where all intra-region pixel differences are below <math>\tau</math>. The threshold <math>\tau</math> controls the granularity: lower values produce more, smaller segments; higher values produce fewer, larger segments.

Graphs/Minimum Spanning Tree/Image Segmentation: Difference between revisions

From charlesreid1