Corner IMP

Answer:

1. Harris Corner Detector - Working Principle (6 Marks)

The Harris corner detector aims to identify corners in an image, which are points where image intensity varies significantly in multiple directions within a local neighborhood. Unlike edges (intensity change in one direction) or flat regions (little change), corners provide stable, localized features. The principle involves analyzing the local intensity structure using the second-moment autocorrelation matrix (also known as the Harris matrix or structure tensor).

Steps:

Compute Image Gradients: Calculate the spatial derivatives of the image intensity, $I_{x} = \frac{\partial I}{\partial x}$ and $I_{y} = \frac{\partial I}{\partial y}$ , typically using filters like Sobel or simple differences.
Compute Products of Derivatives: For each pixel, calculate the products of these gradients: $I_{x}^{2}$ , $I_{y}^{2}$ , and $I_{x} I_{y}$ .
Windowed Summation (Autocorrelation Matrix): Define a window (e.g., $3 \times 3$ or $5 \times 5$ , often weighted by a Gaussian $w (x, y)$ ) around each pixel $(x, y)$ . Compute the sums of the derivative products within this window to form the Harris matrix $C$ (or $M$ ): $C (x, y) = \sum_{u, v \in Win d o w} w (u, v) [I_{x} (x + u, y + v)^{2} I_{x} (x + u, y + v) I_{y} (x + u, y + v) I_{x} (x + u, y + v) I_{y} (x + u, y + v) I_{y} (x + u, y + v)^{2}]$ $C = [\sum w I_{x}^{2} \sum w I_{x} I_{y} \sum w I_{x} I_{y} \sum w I_{y}^{2}]$ This $2 \times 2$ symmetric matrix summarizes the gradient distribution within the local window.
Corner Response Calculation: Analyzing the eigenvalues ( $λ_{1}, λ_{2}$ ) of $C$ reveals the nature of the region:
- Flat: $λ_{1} \approx 0$ , $λ_{2} \approx 0$
- Edge: $λ_{1} >> λ_{2}$ or $λ_{2} >> λ_{1}$
- Corner: $λ_{1}$ and $λ_{2}$ are both large and similar in magnitude. To avoid explicit eigenvalue computation, Harris and Stephens proposed a corner response function $R$ : $R = d e t (C) - α (t r a ce (C))^{2}$ where $d e t (C) = λ_{1} λ_{2}$ and $t r a ce (C) = λ_{1} + λ_{2}$ . $α$ is an empirical sensitivity factor (typically $0.04$ to $0.06$ ).
- $R$ is large and positive for corners.
- $R$ is negative for edges.
- $∣ R ∣$ is small for flat regions.
Thresholding and Non-Maximum Suppression (NMS):
- Apply a threshold to the response map $R$ . Only pixels with $R > t h res h o l d$ are considered potential corners.
- Apply Non-Maximum Suppression: In a local neighborhood around each potential corner point, keep only the point with the maximum $R$ value, suppressing the others. This yields well-localized corner points.

2. Salient Properties of Harris Corner Detector (3 Marks)

Rotation Invariance: The detector response $R$ depends on the eigenvalues (via det and trace), which are invariant to rotation of the coordinate system. Thus, the detector finds the same corners regardless of image rotation.
Partial Illumination Invariance: It is relatively robust to changes in image brightness (additive illumination changes) because it relies on gradients. However, it is sensitive to contrast changes (multiplicative changes).
Not Scale Invariant: This is a major limitation. The detector operates at a single scale defined by the window size used for gradient calculation and summation. A corner might not be detected if the image scale changes significantly.

3. Working Principle of SIFT (Scale-Invariant Feature Transform) Operator (6 Marks)

SIFT is a comprehensive algorithm designed to detect and describe local image features in a way that is robust to changes in scale, rotation, illumination, and moderate viewpoint changes. It consists of four main stages:

Scale-Space Extrema Detection:
- Goal: Find potential keypoint locations that are stable across different scales.
- Method: The image is progressively blurred using Gaussian filters with increasing standard deviation ( $σ$ ). These blurred images form a scale-space. To efficiently detect stable blob-like structures, the Difference-of-Gaussian (DoG) function is computed by subtracting adjacent Gaussian-blurred images: $D (x, y, σ) = (G (x, y, kσ) - G (x, y, σ)) * I (x, y)$ . DoG is an approximation of the scale-normalized Laplacian of Gaussian ( $σ^{2} \nabla^{2} G$ ).
- Detection: Local extrema (maxima and minima) of the DoG images are detected by comparing a pixel to its 26 neighbours in 3D (8 in the same scale, 9 in the scale above, 9 in the scale below). These extrema locations $(x, y, σ)$ are candidate keypoints.
Keypoint Localization:
- Goal: Refine the location of candidate keypoints and eliminate unstable ones (low contrast or edge responses).
- Refinement: A 3D quadratic function is fitted to the DoG scale-space around the candidate keypoint to determine its sub-pixel and sub-scale location more accurately.
- Low Contrast Rejection: Keypoints with a low DoG function value at the refined extremum are discarded as they are sensitive to noise. $∣ D (\overset{x}{^}) ∣ < Threshold$ (e.g., 0.03).
- Edge Response Elimination: Keypoints lying on edges are unstable. The ratio of principal curvatures (derived from a $2 \times 2$ Hessian matrix computed at the keypoint location and scale) is used. If the ratio is above a threshold (e.g., 10), indicating one curvature is much larger than the other, the keypoint is rejected. $T r (H)^{2} / De t (H) < (r + 1)^{2} / r$ .
Orientation Assignment:
- Goal: Assign one or more orientations to each keypoint based on local image gradient directions to achieve rotation invariance for the descriptor.
- Method: For each keypoint, consider a neighborhood around it at its detected scale ( $σ$ ). Compute gradient magnitudes and orientations for all pixels in this neighborhood. Create an orientation histogram (typically 36 bins). Each sample added to the histogram is weighted by its gradient magnitude and by a Gaussian window centered at the keypoint.
- Assignment: The highest peak in the histogram determines the dominant orientation. Any other peaks above 80% of the highest peak also generate a keypoint with that orientation, allowing for multiple orientations at a single location/scale.
Keypoint Descriptor Generation:
- Goal: Create a highly distinctive and robust descriptor for the local region around the keypoint.
- Method: Take a $16 \times 16$ pixel neighborhood around the keypoint, rotated to align with the assigned orientation(s). Divide this neighborhood into a $4 \times 4$ grid of subregions. For each $4 \times 4$ subregion, compute an 8-bin histogram of gradient orientations (relative to the keypoint orientation), weighted by gradient magnitudes and a Gaussian centered on the keypoint.
- Vector: Concatenate these 16 histograms ( $4 \times 4 \times 8 = 128$ ) into a single 128-dimensional feature vector.
- Normalization: Normalize the vector to unit length for illumination invariance (contrast). Threshold large values (e.g., > 0.2) and re-normalize to reduce the effects of non-linear illumination changes. This final 128-D vector is the SIFT descriptor.

Quartz 4

Explorer

Corner IMP

Graph View