Problem: Massive number of digital images online are useless without organization.
Image Classification:
What: Organizing images into different classes based on image features. Typically assigns a single main label (e.g., ‘mountain’).
Why: To enable searching and retrieval.
How (Basic): Extract features (e.g., color histogram) from an unknown image and compare them to features of known, labeled images. Assign the label of the best match.
Image Annotation:
What: Labeling an image with multiple relevant semantic keywords (e.g., ‘mountain’, ‘sky’, ‘snow’, ‘trees’). Classifying an image into multiple classes.
How: Often uses Multiple Instance Learning (MIL) where an image is a ‘bag’ of features/instances. If any instance is positive for a label, the bag (image) gets the label.
Relation: Closely related. Good classification helps annotation, and vice-versa.
Challenge: Simple matching to one example image is unreliable as one image doesn’t represent the whole class well. Need robust classifiers trained on many examples.
III.1.1 Generative Approach
What: Builds a model representing how data for each class is generated. Often a probabilistic distribution P(features∣class).
Why (Analogy): Like having an abstract concept/model for each object type (e.g., an “ideal apple”).
How: Learn the probability distribution from many sample images of the class (e.g., using Gaussian Mixture Models, Bayesian methods).
Use: Classify a new image by finding which class model most likely generated its features (using Bayes’ theorem to find P(class∣features)).
III.1.2 Discriminative Approach
What: Learns a decision boundary directly between different classes in the feature space. Doesn’t explicitly model each class.
Why: Focuses on separating classes rather than describing them individually.
How: Collect training data from different classes. Find an optimal separator (e.g., hyperplane) that distinguishes between the classes based on feature similarity/difference.
Use: Classify a new image based on which side of the decision boundary its features fall.
Chapter 7: Bayesian Classification
7.1 Introduction (Bayes’ Theorem)
What: A mathematical formula relating conditional probabilities. Foundation for Bayesian classifiers.
Formula:P(A∣B)=P(B)P(B∣A)P(A)
P(A∣B): Posterior probability (probability of hypothesis A given evidence B).
P(B∣A): Likelihood (probability of evidence B given hypothesis A).
P(A): Prior probability (initial belief in hypothesis A).
P(B): Evidence (probability of observing B).
Why: Allows updating beliefs (prior → posterior) based on new evidence. Often easier to estimate P(B∣A) than P(A∣B) directly. Incorporates prior knowledge.
Use: Predict events based on related information (e.g., disease from symptoms, image class from features).
7.2 Naïve Bayesian (NB) Image Classification
What: A simple classification method applying Bayes’ theorem to image features.
Goal: Given image features x, find the most probable class Ci. Maximize the posterior P(Ci∣x).
MAP Criterion: Choose class C^ that maximizes P(x∣Ci)P(Ci) (since P(x) is constant for all classes).
C^=argmaxi{P(x∣Ci)P(Ci)}
How (Likelihood P(x∣Ci) Estimation):
Discretization/VQ: Cluster features from all training images (Vector Quantization). x is assigned to the nearest cluster centroid xj. The likelihood P(x∣Ci) is approximated by P(xj∣Ci), calculated as the proportion of class Ci samples within cluster Xj. (Fig 7.1)
Naïve Assumption (Independent Features): Assumes feature vector components x=(x1,...,xm) are independent given the class. Simplifies likelihood: P(x∣Ci)=∏j=1mP(xj∣Ci).
Bag of Features (BoF): Image I represented as a set of independent region features {x1,...,xk}. Likelihood: P(I∣Ci)=∏j=1kP(xj∣Ci).
Use: Basic image classification.
7.3 Image Annotation with Word Co-occurrence (WCC)
What: An early method for explicit image annotation (assigning multiple keywords).
Goal: Link visual features (from image blocks) to semantic words (labels).
How:
Divide labeled training images into blocks.
Cluster blocks using VQ to create visual words (VWs).
For each VW cluster ci, build a histogram of co-occurring text words P(wj∣ci).
Annotate a new image: find nearest VWs for its blocks, sum their word histograms, select words from top histogram bins.
Key Idea: Learns the probability of a semantic word given a visual word/cluster (P(w∣c)). Allows multi-label assignment.
7.7 Image Classification with Gaussian Process (GP)
What: Treats a feature vector (e.g., histogram) as a function and uses Gaussian Processes to model distributions over these functions for classification.
How:
Assume feature vectors from a class are sample functions drawn from a GP.
A GP is defined by a mean function and a covariance (kernel) function k(di,dj) describing similarity between dimensions/points of the function.
Learn the GP parameters from training data X.
Predict the probability of a new feature vector X∗ belonging to the class using the conditional GP distribution P(X∗∣X).
Use: Classification method that models the underlying function generating the features, capturing dependencies between feature dimensions.
7.8 Summary of Bayesian Classifiers
Generative: Model how data is generated for each class (P(x∣C)).
Intuitive: Incorporates prior knowledge, results often interpretable.