Motivation

Global features from the whole image are often not desirable.
Histograms of entire images can be significantly different even if they depict the same object, due to changes in viewpoint, lighting, etc.
Instead, we match local regions which are prominent to the object or scene in the image. These local regions offer more robustness to changes in the overall image.
Application Areas:
- Object detection
- Image matching
- Image stitching

Requirements of a Local Feature

Repetitive: The detector should find the same physical points in the scene independently in each image, regardless of viewing conditions.
Invariant to translation, rotation, scale: The feature should be detectable regardless of the object’s position, orientation, or size in the image.
Invariant to affine transformation: The feature should be robust to distortions like shearing.
Invariant to the presence of noise, blur, etc.: The feature should be detectable even with image degradation.
Locality: The feature should be robust to occlusion (parts of the object being hidden), clutter (other objects in the scene), and illumination changes.
Distinctiveness: The region around the feature should contain an “interesting” structure, making it easily distinguishable from other features.
Quantity: There should be enough feature points detected to adequately represent the image.
Time efficient: The feature detection and description process should be computationally feasible.

General Approach

Find the interest points. These are locations in the image that are likely to be stable and distinctive, such as corners.
Consider the region around each keypoint. A patch of pixels surrounding the detected interest point is analyzed.
Compute a local descriptor from the region and normalize the feature. This descriptor is a numerical representation of the region’s appearance, designed to be invariant to various transformations. Normalization helps with robustness to lighting changes.
Match local descriptors. Descriptors from different images are compared (e.g., using Euclidean distance), and matches are identified based on a similarity threshold. $d (f_{A}, f_{B})$ < T, where $f_{A}$ and $f_{B}$ are feature vectors, and $T$ is a threshold.

Some Popular Detectors

Hessian/Harris corner detection
Laplacian of Gaussian (LOG) detector
Difference of Gaussian (DOG) detector
Hessian/Harris Laplacian detector
Hessian/Harris Affine detector
Maximally Stable Extremal Regions (MSER)
Many others…

These detectors often look for changes in the image gradient in two directions, which typically indicates corners.

No change in any direction: Indicates a flat region.
Change in one direction only: Indicates an edge.
Change in both directions: Indicates a corner.

Hessian Corner Detector

Harris Corner Detector

Scale Invariant Region Detection

Hessian and Harris corner detectors are not scale-invariant. The response of the detector changes significantly as the image is scaled.

$∣ L o G (x, σ_{n}) ∣ = σ_{n}^{2} ∣ L_{xx} (x, σ_{n}) + L_{yy} (x, σ_{n}) ∣$

Solution: Use the concept of Scale Space.

Laplacian of Gaussian (LOG) Detector

Local Descriptors

We have detected interest points (keypoints) in an image.
How to match the points across different images of the same object?

Solution: Use Local Descriptors.

List of Local Feature Descriptors

Scale Invariant Feature Transform (SIFT)
Speed-Up Robust Feature (SURF)
Histogram of Oriented Gradient (HOG)
Gradient Location Orientation Histogram (GLOH)
PCA-SIFT
Pyramidal HOG (PHOG)
Pyramidal Histogram Of visual Words (PHOW)
Others (Shape Context, Steerable filters, Spin images).

Local descriptors should be robust to viewpoint change or illumination change.

SIFT

[Lowe, 2004]

Step 1: Scale-Space Extrema Detection

Detect interesting points (invariant to scale and orientation) using DOG.

Step 2: Keypoint Localization

Determine the location and scale at each candidate location.
Select keypoints based on stability.
Aim: Reject low-contrast points and points that lie on edges.
- Low contrast points elimination:
  - Fit keypoint at $\underline{x}$ to nearby data using a quadratic approximation:
  $D (\underline{x}) = D + \frac{\partial D ^{T}}{\partial x} \underline{x} + \frac{1}{2} \underline{x}^{T} \frac{\partial ^{2} D ^{T}}{\partial x ^{2}} \underline{x}$
  
  Where, $D (x, σ) = (G (x, kσ) - G (x, σ)) * I (x)$
  - Calculate the local maxima of the fitted function. $\partial D / \partial \underline{x} = [D + \frac{\partial D ^{T}}{\partial x} \underline{x} + \frac{1}{2} \underline{x}^{T} \frac{\partial ^{2} D}{\partial x ^{2}} \underline{x}] / \partial \underline{x} = 0$ $\underline{\overset{x}{^}} = - \frac{\partial ^{2} D ^{- 1}}{\partial x ^{2}} \frac{\partial D}{\partial x}$
  - Discard local minima (for contrast): $D (\underline{\overset{x}{^}}) < 0.03$
- Eliminating edge response:
  - DOG gives a strong response along edges.
  - Solution: Check the “cornerness” of each keypoint.
  - On an edge, one principal curvature is much bigger than another.
  - High cornerness $⟺$ No dominant principal curvature component.
  - Consider the concept of Hessian and Harris corner detection.
  Hessian Matrix:
  
  $H = [I_{xx} I_{x y} I_{x y} I_{yy}]$
  
  Harris corner criterion:
  
  $\frac{Tr ( H ) ^{2}}{Det ( H )} < \frac{( r + 1 ) ^{2}}{r}$ Discard points with a response below the threshold.

Step 3: Orientation Assignment

Aim: Assign a consistent orientation to each keypoint based on local image properties to obtain rotational invariance.
- To transform relative data accordingly.
- The magnitude and orientation of the gradient of an image patch I(x, y) at a particular scale are:
  
  $m (x, y) = (I (x + 1, y) - I (x - 1, y))^{2} + (I (x, y + 1) - I (x, y - 1))^{2}$
  
  $θ (x, y) = tan^{- 1} \frac{I ( x , y + 1 ) - I ( x , y - 1 )}{I ( x + 1 , y ) - I ( x - 1 , y )}$
Create a weighted (magnitude + Gaussian) histogram of local gradient directions computed at the selected scale.
Assign the dominant orientation of the region as that of the peak of the smoothed histogram.
For multiple peaks, create multiple keypoints (one for each significant peak).

Step 4: Keypoint Descriptor

Aim: Obtain a local descriptor that is highly distinctive yet invariant to variations like illumination and affine change.
Consider a rectangular grid (e.g., 16x16) in the direction of the dominant orientation of the region.
Divide the region into 4x4 sub-regions.
Consider a Gaussian filter above the region, which gives higher weights to pixels closer to the center of the descriptor.
Create 8-bin gradient histograms for each sub-region. The bins are weighted by the gradient magnitude and a Gaussian window (where $σ$ is half the window size).
This results in a feature vector of dimension 128 (8 bins * 4 * 4 subregions = 128).
Finally, normalize the 128-dimensional vector to make it illumination invariant.

SIFT: Some Results

Object detection: SIFT features can be used to detect objects in images.
Panorama stitching: SIFT can be used to find corresponding points in overlapping images to create panoramas.

GLOH

Gradient Location and Orientation Histogram (GLOH)

First 3 steps – same as SIFT.
Step 4 – Local image descriptor:
- Consider a log-polar location grid with 3 different radii and 8 angular directions for two of them, resulting in a total of 17 location bins.
- Form a histogram of gradients having 16 bins.
- Form a feature vector of 272 dimensions (17 * 16).
- Perform dimensionality reduction and project the features to a 128-dimensional space.

Some Other Examples

SURF, PHOW, HOG

Quartz 4

Explorer

Local Feature Detectors and Descriptors