• What: The process of partitioning a digital image into multiple segments (sets of pixels, also known as image regions).
  • Goal: To simplify or change the representation of an image into something more meaningful and easier to analyze. Specifically, to cluster pixels into regions corresponding to individual surfaces, objects, or natural parts of objects.
  • Use/Application: Object recognition, occlusion boundary estimation, image compression, image editing, image database look-up.
  • Approach Discussed: Primarily bottom-up segmentation, meaning it relies on local pixel information (brightness, color, texture) without prior knowledge or recognition of specific objects (top-down information).
  • Input: Often image brightness (grayscale), but techniques can extend to colour, motion, or stereo disparity.

Example Segmentations

  • Simple scenes show clear delineation of surfaces.
  • Complex scenes (horses, tiger) show varying results from different algorithms.
  • Key Property: Segmentation boundaries are closed, defining distinct regions, unlike edge detection results which can be fragmented.

Key Questions

  1. How well can bottom-up segmentation work without object recognition?
  2. How to mathematically define a “good” segment?
  3. How to solve the defined segmentation problem(s)?
  4. How to evaluate the results?

Observations on Example Algorithms

  • Algorithms mentioned: Local Variation (LV), Spectral Min-Cut (SMC), Human (H), Edge-augmented Mean-Shift (ED), Normalized Cut (NC).
  • Quality: Highly image-dependent. Works best on smoothly shaded surfaces with clear intensity steps between them.
  • Human vs. Machine: Humans likely use object recognition; these bottom-up methods don’t.
  • Complexity: Machine results are useful for simple images but less reliable for complex/cluttered ones.
  • Evaluation: Need standardized datasets because algorithms can fail on challenging images (clutter, camouflage).

Approaches to Segmentation

Clustering in Feature Space

  • What: Represent each pixel as a point in a multi-dimensional feature space. Segmentation becomes finding clusters in this space.
  • How: Define a feature vector for each pixel . This can include:
    • Pixel coordinates:
    • Intensity/Color:
    • Local features (e.g., texture, filter responses):
  • Why: Pixels belonging to the same segment (e.g., same object part) are expected to have similar feature vectors and thus form a dense cluster in the feature space.

Mixture of Gaussians (MoG) Model

  • What: A specific method for feature space clustering.
  • How: Model the probability distribution of feature vectors as a sum of Gaussian distributions:
    • : mixing coefficient for cluster ()
    • : mean and covariance of the -th Gaussian component.
    • Fit parameters using Maximum Likelihood on the image’s feature vectors.
    • Choose using criteria like Minimum Description Length (MDL).
  • Segmentation: Assign pixel to the cluster that maximizes its “ownership” probability:
  • Variations: K-means (simpler clustering), restricted covariance matrices .
  • Limitations: Assumes segments correspond to well-separated, Gaussian-shaped clusters in feature space. Can fail when this assumption is violated (e.g., complex textures, gradual changes). (See Blobworld examples).

Mean-Shift Segmentation

  • What: Another feature space clustering method, but uses a non-parametric density estimate.
  • How (Density Estimation): Estimate the density using kernel density estimation (e.g., Epanechnikov or Gaussian kernel ):
    • The kernel’s shape/bandwidth ( in ) controls smoothness.
  • How (Clustering): Find the modes (peaks) of the estimated density .
    • For each feature point , perform mean-shift iterations: (This iteratively shifts the point towards the local mean, weighted by the kernel derivative, converging to a mode).
  • Segmentation: Assign pixel to the segment corresponding to the mode its feature vector converges to. Segments are the domains of convergence (watersheds) of the mean-shift procedure.
  • Properties: Converges; tends to avoid edges (anti-edge detection); can fragment regions with constant intensity gradient (requires post-processing).
  • Enhancement (EDISON): Incorporate edge information to modify the weights in the mean-shift iteration, improving results by preserving weak boundaries without excessive oversegmentation.

Similarity Graph Based Methods

  • What: Represent the image as a graph where nodes are pixels and edge weights represent similarity (or dissimilarity) between connected pixels. Segmentation becomes partitioning the graph.
  • How (Graph Construction):
    • Vertices : Image pixels .
    • Edges : Connect nearby pixels .
    • Weights or Affinities : Based on similarity of features and . Example affinity: or using for Mahalanobis distance.
  • Goal: Find a partition of the graph nodes (pixels) into segments.

Connected Components (CC)

  • How: Simple approach: remove edges where dissimilarity is above a threshold . Find the connected components in the remaining graph.
  • Limitation: Not robust. A single “weak link” (low weight edge) can incorrectly merge two distinct regions. Finding a good is hard.
  • Kruskal’s Algorithm: Can find CCs efficiently by building a Minimum Spanning Tree (MST) first, then removing edges above .

Local Variation Method (Felzenszwalb & Huttenlocher)

  • What: An efficient graph-based method using a modified Kruskal’s algorithm.
  • How: Builds segments (trees in a forest) by adding edges in increasing order of weight. Merges two components using edge only if the edge weight is small compared to the internal variation within both components: where .
    • : maximum edge weight within component (internal variation).
    • : number of pixels in .
    • : merging threshold parameter.
  • Why: Adapts the merge criterion based on the existing variation within components. Large components require much smaller external variation (edge weight) to merge compared to their internal variation. Encourages growth of small regions.
  • Pros: Fast (), tends to follow ‘true’ boundaries well initially.
  • Cons: Parameter controls region size; can create narrow regions along boundaries.

Source-Sink Minimum Cut (S-T Min-Cut)

  • What: Uses graph cuts (min-cut/max-flow algorithms) for segmentation, often for foreground/background or between specific regions.
  • How:
    1. Define ‘source’ seeds and ‘sink’ seeds (sets of pixels believed to be in different segments).
    2. Construct a directed graph from the similarity graph: include edges and with capacity .
    3. Add a source node connected to all pixels in with infinite capacity edges.
    4. Add a sink node connected from all pixels in with infinite capacity edges.
    5. Find the minimum cut in this graph. The cut partitions the nodes (pixels) into two sets, one containing and the other containing . The cut cost is .
  • Why: Efficient algorithms exist for S-T min-cut. Finds the globally optimal boundary between the predefined and regions.
  • Use: Need a good method to generate seed sets and to find multiple segments (e.g., using spectral properties [6]).
  • Cons: Computationally intensive if many cuts are needed. Results depend heavily on seed selection.

Normalized Cut (Ncut) (Shi & Malik)

  • What: A graph partitioning criterion that tries to avoid cutting off very small, isolated segments.
  • How (Criterion): Partition the graph into and by minimizing the normalized cut: where is the total connection weight between sets and . measures the total connection from to the whole graph.
  • Why: Minimizing just the cut weight tends to produce small outlier segments. Ncut balances the cut cost with the “volume” or total connection of the segments.
  • Problem: Minimizing Ncut directly is NP-hard (computationally intractable).
  • How (Approximation - Spectral Method):
    1. The Ncut minimization problem is related to finding an eigenvector.
    2. Relax the discrete problem to a continuous one involving the generalized eigenvector problem related to the graph Laplacian.
    3. Solve for the eigenvector corresponding to the second smallest eigenvalue of the normalized Laplacian system , or equivalently, the eigenvector for the second smallest eigenvalue of . ( is affinity matrix, is diagonal degree matrix ).
    4. Threshold this eigenvector (e.g., at 0 or search for the best threshold) to get a discrete partition .
  • Use: Can be applied recursively to get multiple segments. Can also use multiple eigenvectors for multi-way Ncut [14].
  • Limitation: The step of thresholding the eigenvector is an approximation, works best when there’s a clear bipartition in the data.

Natural Image Boundaries

  • What: Contours separating distinct objects or surfaces (semantically meaningful).
  • Difference: Distinct from image edges, which are just rapid changes in brightness/color and might occur within an object (texture).
  • Relation to Segmentation: Boundary detection is complementary. Can be integrated into segmentation (e.g., EDISON uses edge saliency, Ncut can use intervening contour information in affinities [8]).

Evaluation

  • Berkeley Segmentation Database: A standard dataset with human-provided segmentations for benchmarking algorithms. Shows human variability.
  • Benchmarking: Use datasets (e.g., Berkeley, synthetic fractal images) to compare algorithms.
  • Metrics: Precision-Recall curves are often used for evaluating the quality of detected segment boundaries against ground truth. Human performance serves as an upper bound reference. (SE-MinCut, MeanShift often perform well in examples shown).