III.1 Introduction

  • Problem: Massive number of digital images online are useless without organization.
  • Image Classification:
    • What: Organizing images into different classes based on image features. Typically assigns a single main label (e.g., ‘mountain’).
    • Why: To enable searching and retrieval.
    • How (Basic): Extract features (e.g., color histogram) from an unknown image and compare them to features of known, labeled images. Assign the label of the best match.
  • Image Annotation:
    • What: Labeling an image with multiple relevant semantic keywords (e.g., ‘mountain’, ‘sky’, ‘snow’, ‘trees’). Classifying an image into multiple classes.
    • How: Often uses Multiple Instance Learning (MIL) where an image is a ‘bag’ of features/instances. If any instance is positive for a label, the bag (image) gets the label.
  • Relation: Closely related. Good classification helps annotation, and vice-versa.
  • Challenge: Simple matching to one example image is unreliable as one image doesn’t represent the whole class well. Need robust classifiers trained on many examples.

III.1.1 Generative Approach

  • What: Builds a model representing how data for each class is generated. Often a probabilistic distribution .
  • Why (Analogy): Like having an abstract concept/model for each object type (e.g., an “ideal apple”).
  • How: Learn the probability distribution from many sample images of the class (e.g., using Gaussian Mixture Models, Bayesian methods).
  • Use: Classify a new image by finding which class model most likely generated its features (using Bayes’ theorem to find ).

III.1.2 Discriminative Approach

  • What: Learns a decision boundary directly between different classes in the feature space. Doesn’t explicitly model each class.
  • Why: Focuses on separating classes rather than describing them individually.
  • How: Collect training data from different classes. Find an optimal separator (e.g., hyperplane) that distinguishes between the classes based on feature similarity/difference.
  • Use: Classify a new image based on which side of the decision boundary its features fall.

Chapter 7: Bayesian Classification

7.1 Introduction (Bayes’ Theorem)

  • What: A mathematical formula relating conditional probabilities. Foundation for Bayesian classifiers.
  • Formula:
    • : Posterior probability (probability of hypothesis A given evidence B).
    • : Likelihood (probability of evidence B given hypothesis A).
    • : Prior probability (initial belief in hypothesis A).
    • : Evidence (probability of observing B).
  • Why: Allows updating beliefs (prior posterior) based on new evidence. Often easier to estimate than directly. Incorporates prior knowledge.
  • Use: Predict events based on related information (e.g., disease from symptoms, image class from features).

7.2 Naïve Bayesian (NB) Image Classification

  • What: A simple classification method applying Bayes’ theorem to image features.
  • Goal: Given image features , find the most probable class . Maximize the posterior .
  • MAP Criterion: Choose class that maximizes (since is constant for all classes).
  • How (Likelihood Estimation):
    • Discretization/VQ: Cluster features from all training images (Vector Quantization). is assigned to the nearest cluster centroid . The likelihood is approximated by , calculated as the proportion of class samples within cluster . (Fig 7.1)
    • Naïve Assumption (Independent Features): Assumes feature vector components are independent given the class. Simplifies likelihood: .
    • Bag of Features (BoF): Image represented as a set of independent region features . Likelihood: .
  • Use: Basic image classification.

7.3 Image Annotation with Word Co-occurrence (WCC)

  • What: An early method for explicit image annotation (assigning multiple keywords).
  • Goal: Link visual features (from image blocks) to semantic words (labels).
  • How:
    1. Divide labeled training images into blocks.
    2. Cluster blocks using VQ to create visual words (VWs).
    3. For each VW cluster , build a histogram of co-occurring text words .
    4. Annotate a new image: find nearest VWs for its blocks, sum their word histograms, select words from top histogram bins.
  • Key Idea: Learns the probability of a semantic word given a visual word/cluster (). Allows multi-label assignment.

7.7 Image Classification with Gaussian Process (GP)

  • What: Treats a feature vector (e.g., histogram) as a function and uses Gaussian Processes to model distributions over these functions for classification.
  • How:
    1. Assume feature vectors from a class are sample functions drawn from a GP.
    2. A GP is defined by a mean function and a covariance (kernel) function describing similarity between dimensions/points of the function.
    3. Learn the GP parameters from training data .
    4. Predict the probability of a new feature vector belonging to the class using the conditional GP distribution .
  • Use: Classification method that models the underlying function generating the features, capturing dependencies between feature dimensions.

7.8 Summary of Bayesian Classifiers

  • Generative: Model how data is generated for each class ().
  • Intuitive: Incorporates prior knowledge, results often interpretable.
  • Robust: Produces probabilistic outputs, handling uncertainty.
  • Nonlinear: Decision boundaries adapt to data distributions.
  • Downside: Require sufficient data to estimate distributions accurately.