Canny Edge Detection

Canny edge detection, proposed by John Canny in 1986, is considered a highly effective and widely used edge detection algorithm. It aims to find optimal edges in images corrupted by white noise, based on a set of specific criteria.

Scale-Space Concept (Context)

(Relevant context mentioned in Sonka, p. 144)

Edges can exist at multiple scales. Analyzing features at different scales can be done using scale-space representation.
An image $f (x)$ is smoothed by convolution with a Gaussian kernel $G (x, σ)$ at varying standard deviations ( $σ$ ): $G (x, σ) = e^{- x^{2} / (2 σ^{2})}$ (5.51 - 1D Gaussian)
The resulting function $F (x, σ) = f (x) * G (x, σ)$ (5.52) creates a scale-space image over the $(x, σ)$ plane.
Inflexion points in $F (x, σ_{0})$ for a fixed $σ_{0}$ describe the curve qualitatively. These occur where: $\frac{\partial ^{2} F ( x , σ _{0} )}{\partial x ^{2}} = 0$ and $\frac{\partial ^{3} F ( x , σ _{0} )}{\partial x ^{3}} \neq = 0$ (5.53)
Tracking these inflexion points across different scales ( $σ$ ) helps localize events from coarse to fine scales. Canny edge detection utilizes scale as part of its process.

Optimality Criteria (Canny’s Design Goals)

Canny sought an edge detector that optimizes three main criteria for step edges corrupted by white noise:

Good Detection:
- Maximize the probability of detecting true edges (low false negatives).
- Minimize the probability of detecting non-edges (low false positives).
- Related to maximizing the Signal-to-Noise Ratio (SNR).
- SNR Definition (CMU Slide 3): $SNR = \frac{∣ \int _{- W}^{W} G ( - x ) f ( x ) d x ∣}{n _{0} \int _{- W}^{W} f ^{2} ( x ) d x}$ where $f$ is the filter, $G$ is the edge signal (e.g., step edge), $n_{0}^{2}$ is the mean-squared noise amplitude per unit length, and the denominator represents the filter’s root-mean-squared response to noise $n (x)$ only.
Good Localization:
- The detected edge position should be as close as possible to the actual edge center.
- Related to minimizing the distance between the detected and true edge, often measured by the reciprocal of the root mean square (rms) distance.
- Localization Definition (CMU Slide 3): $L oc a l i z a t i o n = \frac{1}{E [ x _{0}^{2} ]} = \frac{∣ \int _{- W}^{W} G ^{'} ( - x ) f ^{'} ( x ) d x ∣}{n _{0} \int _{- W}^{W} f ^{'2} ( x ) d x}$ where $E [x_{0}^{2}]$ is the expected squared distance of the marked edge from the center. The numerator relates to the second derivative of the filtered response, indicating the steepness of the zero-crossing. Sharper slope means better localization.
Single Response (One Response Criterion):
- Minimize the number of local maxima around a single edge. Ideally, only one response per true edge.
- This criterion helps prevent multiple detections for a single edge, especially in the presence of noise, and works against non-smooth operators.
- Related to maximizing the distance between peaks in the noise response.
- Inter-maximum Spacing (CMU Slide 5): The mean distance $x_{zc}$ between zero-crossings of the filtered noise response derivative is related to the filter characteristics: $x_{zc} (f) = π (\frac{\int _{- \infty}^{\infty} f ^{'2} ( x ) d x}{\int _{- \infty}^{\infty} f ^{''2} ( x ) d x})^{1/2}$
- This distance is constrained to be a fraction $k$ of the operator width $W$ : $x_{zc} (f) = kW$ .

Derivation and Filter Approximation

Canny initially derived a filter for 1D signals optimizing the first two criteria using the calculus of variations.
Adding the third criterion (single response) requires numerical optimization.
Result: The optimal filter can be effectively approximated (error < 20%) by the first derivative of a Gaussian smoothing filter.
- This approximation provides an efficient implementation.
- There is a similarity to the LoG (Laplacian of Gaussian) / Marr-Hildreth detector, but Canny uses the first derivative for detection and localization, which provides directional information, unlike the non-directional Laplacian.

Generalization to 2D

A 2D step edge is defined by position, orientation, and magnitude (strength).
The 1D approach is generalized by:
1. Convolving the image $f$ with a symmetric 2D Gaussian $G$ .
2. Differentiating the smoothed image in the direction n perpendicular (normal) to the edge.
Let $G_{n}$ be the operator representing the first derivative of $G$ in direction n: $G_{n} = \frac{\partial G}{\partial n} = n \cdot \nabla G$ (5.54)
The edge normal direction n is not known a priori but can be estimated from the gradient direction of the smoothed image: $n = \frac{\nabla ( G * f )}{∣\nabla ( G * f ) ∣}$ (5.55)
Edge Location: Edges are located at points where the gradient magnitude is maximal in the direction n. This corresponds to zero-crossings of the second derivative in the direction n: $\frac{\partial}{\partial n} (G_{n} * f) = 0$ (5.56)
Using the associativity of convolution and differentiation, this condition becomes: $\frac{\partial ^{2}}{\partial n ^{2}} (G * f) = 0$ (5.57)
This equation forms the basis for non-maximal suppression.
Edge Strength: The magnitude (strength) of the edge is measured by the gradient magnitude of the smoothed image: $∣ G_{n} * f ∣ = ∣\nabla (G * f) ∣$ (5.58)

Algorithm Steps (Algorithm 5.4 / Combined View)

Noise Reduction (Gaussian Smoothing):
- Convolve the input image $f$ with a Gaussian kernel $G_{σ}$ of scale $σ$ .
- $I_{s m oo t h e d} = f * G_{σ}$
- The choice of $σ$ affects the scale of edges detected. Larger $σ$ reduces noise more but blurs edges.
Gradient Calculation:
- Compute the gradient components ( $G_{x}$ , $G_{y}$ ) of the smoothed image $I_{s m oo t h e d}$ (e.g., using Sobel operators).
- Calculate the Gradient Magnitude: $M = G_{x}^{2} + G_{y}^{2}$ (approximates edge strength, Eq 5.58).
- Calculate the Gradient Direction: $θ = arctan2 (G_{y}, G_{x})$ (estimates direction perpendicular to the edge, related to n in Eq 5.55).
Non-Maximal Suppression (NMS):
- Goal: Thin wide ridges around edges down to single-pixel width (improves localization).
- Process: For each pixel, check if its gradient magnitude $M$ is a local maximum along the gradient direction $θ$ .
- Quantify $θ$ into discrete directions (e.g., horizontal, vertical, +45°, -45°).
- Compare the pixel’s magnitude $M (x, y)$ with the magnitudes of its two neighbors along the gradient direction.
- Interpolation: Since the gradient direction is continuous, the neighbors’ magnitudes often need to be interpolated. For example (CMU Slide 12):
  - If the gradient direction points between pixel $(x + 1, y)$ and $(x + 1, y + 1)$ , interpolate the gradient magnitude at point A along the gradient line using magnitudes $G (x + 1, y + 1)$ and $G (x, y + 1)$ . Let this be $G_{A}$ .
  - Similarly, interpolate the magnitude at point B on the opposite side using $G (x - 1, y - 1)$ and $G (x, y - 1)$ . Let this be $G_{B}$ .
  - Interpolation formula (example for $G_{A}$ ): $G_{A} = \frac{u _{y}}{u _{y}} G (x + 1, y + 1) + \frac{u _{y} - u _{x}}{u _{y}} G (x, y + 1)$ (assuming gradient vector is $(u_{x}, u_{y})$ and $u_{y} \neq = 0$ ). Note: Simpler linear interpolation based on angle is common.
- Suppression Rule: If $M (x, y)$ is less than either interpolated neighbor magnitude ( $G_{A}$ or $G_{B}$ ), suppress the pixel (set its magnitude to 0). Otherwise, keep $M (x, y)$ .
Hysteresis Thresholding:
- Goal: Separate significant edges from spurious responses (improve detection) and bridge gaps in edge contours (‘streaking’ problem).
- Uses two thresholds: $T_{L}$ (low threshold) and $T_{H}$ (high threshold), with $T_{L} < T_{H}$ .
- Threshold Determination (CMU Slide 7): Thresholds can be set adaptively based on image statistics, e.g., using percentiles (like the median) of the gradient magnitude histogram. Canny suggested they could be slowly varying functions across the image.
- Procedure: a. Pixels with magnitude $M > T_{H}$ are marked as strong (‘seed’) edge pixels. b. Pixels with magnitude $T_{L} < M \leq T_{H}$ are marked as weak edge pixels. c. Pixels with magnitude $M \leq T_{L}$ are suppressed (non-edges). d. Keep all strong edge pixels. e. Keep weak edge pixels only if they are connected (via 8-connectivity) to a strong edge pixel, either directly or through a path of other connected weak pixels. Discard unconnected weak pixels.
Multi-Scale Aggregation / Feature Synthesis (Optional - Sonka p. 146):
- The standard Canny algorithm (steps 1-4) is often run at a single scale $σ$ .
- Canny also proposed a ‘feature synthesis’ approach to handle multiple scales:
  1. Run the detector (steps 1-5) for a sequence of increasing $σ$ .
  2. Start with the smallest scale $σ_{min}$ . Mark significant edges.
  3. For the next scale $σ_{i + 1}$ , predict the edge response based on the edges found at $σ_{i}$ .
  4. Compare the predicted response with the actual response at $σ_{i + 1}$ . Mark additional edges only if they are significantly stronger than predicted.
  5. Build a cumulative edge map. Edges are typically localized using the smallest scale at which they were reliably detected.
- Note: Full feature synthesis is less commonly implemented than the single-scale version.

Figure 5.23: Shows Canny edge detection results at two different scales ( $σ = 1.0$ and $σ = 2.8$ , without feature synthesis). Larger $σ$ results in smoother, less detailed edges, removing weaker features and noise.

(Sonka p. 147, CMU Slides 13, 14)

An alternative approach to edge detection models the underlying continuous image intensity function in local neighborhoods using parametric surfaces, called facets.
The discrete image is seen as a sampled, noisy version of this underlying function.
Model Complexity:
- Flat Facet: Piecewise constant intensity.
- Sloped Facet: Piecewise linear function (plane).
- Quadratic/Bi-cubic Facet: Higher-order polynomials.
Example Polynomial Fit: $z = f (x, y) = k_{1} + k_{2} x + k_{3} y + k_{4} x^{2} + k_{5} x y + k_{6} y^{2} + k_{7} x^{3} + k_{8} x^{2} y + k_{9} x y^{2} + k_{10} y^{3}$
Parameter Estimation: Coefficients ( $k_{1}$ to $k_{10}$ ) are fitted to the pixel intensities $I (u, v)$ in a neighborhood by minimizing the Mean Square Error (MSE) or Euclidean norm: $E = \sum_{u} \sum_{v} (I (u, v) - f (u, v))^{2}$
Edge Detection: Once the continuous facet model $f (x, y)$ is estimated for each neighborhood, edges can be detected with subpixel precision as:
- Extrema of the first directional derivative.
- Zero-crossings of the second directional derivative of the facet function $f (x, y)$ .
This approach allows for robust edge detection and characterization directly from the estimated continuous surface.

Comparison with Simpler Operators

(CMU Slides 15, 16)

Robert’s Cross Operator: Simpler $2 \times 2$ diagonal gradient operator. Derived from fitting a least-squares planar surface over a $2 \times 2$ window. $G_{x} = [10 0 - 1]$ , $G_{y} = [0 - 1 10]$ or $[- 1 - 1 11]$ , $[1 - 1 1 - 1]$
Prewitt Operator: $3 \times 3$ operator, approximates gradient. Derived by fitting a quadratic surface over a $3 \times 3$ window and differentiating. $G_{x} = - 1 - 1 - 1 000111$ , $G_{y} = - 1 01 - 1 01 - 1 01$ (Note: $G_{y}$ in slide seems transposed relative to common definition).
Sobel Operator: Similar to Prewitt but gives more weight to center pixels (gradient of a surface smoothed by a $3 \times 3$ mean filter). $G_{x} = - 1 - 2 - 1 000121$ , $G_{y} = - 1 01 - 2 02 - 1 01$
Canny vs Simpler Operators: Canny provides superior noise suppression, localization, and single-edge response due to its multi-stage approach (Gaussian smoothing, NMS, hysteresis) based on optimizing specific criteria, whereas Roberts, Prewitt, and Sobel are simpler approximations of the gradient.

Quartz 4

Explorer

Canny Edge Detection

Canny Edge Detection

Scale-Space Concept (Context)

Optimality Criteria (Canny’s Design Goals)

Derivation and Filter Approximation

Generalization to 2D

Algorithm Steps (Algorithm 5.4 / Combined View)

Parametric Edge Models (Facet Model - Alternative/Related Concept)

Comparison with Simpler Operators

Graph View

Table of Contents

Backlinks