Motivation
- Global features from the whole image are often not desirable.
- Histograms of entire images can be significantly different even if they depict the same object, due to changes in viewpoint, lighting, etc.
- Instead, we match local regions which are prominent to the object or scene in the image. These local regions offer more robustness to changes in the overall image.
- Application Areas:
- Object detection
- Image matching
- Image stitching
Requirements of a Local Feature
- Repetitive: The detector should find the same physical points in the scene independently in each image, regardless of viewing conditions.
- Invariant to translation, rotation, scale: The feature should be detectable regardless of the object’s position, orientation, or size in the image.
- Invariant to affine transformation: The feature should be robust to distortions like shearing.
- Invariant to the presence of noise, blur, etc.: The feature should be detectable even with image degradation.
- Locality: The feature should be robust to occlusion (parts of the object being hidden), clutter (other objects in the scene), and illumination changes.
- Distinctiveness: The region around the feature should contain an “interesting” structure, making it easily distinguishable from other features.
- Quantity: There should be enough feature points detected to adequately represent the image.
- Time efficient: The feature detection and description process should be computationally feasible.
General Approach
- Find the interest points. These are locations in the image that are likely to be stable and distinctive, such as corners.
- Consider the region around each keypoint. A patch of pixels surrounding the detected interest point is analyzed.
- Compute a local descriptor from the region and normalize the feature. This descriptor is a numerical representation of the region’s appearance, designed to be invariant to various transformations. Normalization helps with robustness to lighting changes.
- Match local descriptors. Descriptors from different images are compared (e.g., using Euclidean distance), and matches are identified based on a similarity threshold. < T, where and are feature vectors, and is a threshold.
Some Popular Detectors
- Hessian/Harris corner detection
- Laplacian of Gaussian (LOG) detector
- Difference of Gaussian (DOG) detector
- Hessian/Harris Laplacian detector
- Hessian/Harris Affine detector
- Maximally Stable Extremal Regions (MSER)
- Many others…
These detectors often look for changes in the image gradient in two directions, which typically indicates corners.
- No change in any direction: Indicates a flat region.
- Change in one direction only: Indicates an edge.
- Change in both directions: Indicates a corner.
Hessian Corner Detector
Harris Corner Detector
Scale Invariant Region Detection
Hessian and Harris corner detectors are not scale-invariant. The response of the detector changes significantly as the image is scaled.
Solution: Use the concept of Scale Space.
Laplacian of Gaussian (LOG) Detector
Local Descriptors
- We have detected interest points (keypoints) in an image.
- How to match the points across different images of the same object?
Solution: Use Local Descriptors.
List of Local Feature Descriptors
- Scale Invariant Feature Transform (SIFT)
- Speed-Up Robust Feature (SURF)
- Histogram of Oriented Gradient (HOG)
- Gradient Location Orientation Histogram (GLOH)
- PCA-SIFT
- Pyramidal HOG (PHOG)
- Pyramidal Histogram Of visual Words (PHOW)
- Others (Shape Context, Steerable filters, Spin images).
Local descriptors should be robust to viewpoint change or illumination change.
SIFT
[Lowe, 2004]
Step 1: Scale-Space Extrema Detection
- Detect interesting points (invariant to scale and orientation) using DOG.
Step 2: Keypoint Localization
-
Determine the location and scale at each candidate location.
-
Select keypoints based on stability.
-
Aim: Reject low-contrast points and points that lie on edges.
-
Low contrast points elimination:
- Fit keypoint at to nearby data using a quadratic approximation:
Where,
- Calculate the local maxima of the fitted function.
- Discard local minima (for contrast):
-
Eliminating edge response:
- DOG gives a strong response along edges.
- Solution: Check the “cornerness” of each keypoint.
- On an edge, one principal curvature is much bigger than another.
- High cornerness No dominant principal curvature component.
- Consider the concept of Hessian and Harris corner detection.
Hessian Matrix:
Harris corner criterion:
Discard points with a response below the threshold.
-
Step 3: Orientation Assignment
-
Aim: Assign a consistent orientation to each keypoint based on local image properties to obtain rotational invariance.
-
To transform relative data accordingly.
-
The magnitude and orientation of the gradient of an image patch I(x, y) at a particular scale are:
-
-
Create a weighted (magnitude + Gaussian) histogram of local gradient directions computed at the selected scale.
-
Assign the dominant orientation of the region as that of the peak of the smoothed histogram.
-
For multiple peaks, create multiple keypoints (one for each significant peak).
Step 4: Keypoint Descriptor
-
Aim: Obtain a local descriptor that is highly distinctive yet invariant to variations like illumination and affine change.
-
Consider a rectangular grid (e.g., 16x16) in the direction of the dominant orientation of the region.
-
Divide the region into 4x4 sub-regions.
-
Consider a Gaussian filter above the region, which gives higher weights to pixels closer to the center of the descriptor.
-
Create 8-bin gradient histograms for each sub-region. The bins are weighted by the gradient magnitude and a Gaussian window (where is half the window size).
-
This results in a feature vector of dimension 128 (8 bins * 4 * 4 subregions = 128).
-
Finally, normalize the 128-dimensional vector to make it illumination invariant.
SIFT: Some Results
- Object detection: SIFT features can be used to detect objects in images.
- Panorama stitching: SIFT can be used to find corresponding points in overlapping images to create panoramas.
GLOH
Gradient Location and Orientation Histogram (GLOH)
- First 3 steps – same as SIFT.
- Step 4 – Local image descriptor:
- Consider a log-polar location grid with 3 different radii and 8 angular directions for two of them, resulting in a total of 17 location bins.
- Form a histogram of gradients having 16 bins.
- Form a feature vector of 272 dimensions (17 * 16).
- Perform dimensionality reduction and project the features to a 128-dimensional space.
Some Other Examples
SURF, PHOW, HOG