8.1.1 A Theoretical Solution

  • Limitations of Bayesian Methods: Bayesian methods (covered in Chapter 7) rely on accurate probability models. However, in real-world scenarios, data distributions are often unknown. Accurate model estimation requires a large number of training samples, which may not always be available, especially with high-dimensional data (like multimedia).

  • Alternative Approach: Discriminant Functions: Instead of modeling the entire data distribution, we can assume a functional form for the decision boundary between classes. We then estimate the parameters of this function using training data. Linear classifiers are one type of such an approach.

  • Linear Discriminant Function Formulation:

    • Let be an -dimensional feature vector representing the data.

    • A linear discriminant function is defined as: (8.1) where:

      • are the variables (features).
      • are the coefficients or weights.
    • Letting , the function can be rewritten as: (8.2)

  • Geometric Interpretation:

    • The equation represents a hyperplane in -dimensional space. This hyperplane acts as the decision boundary.

    • Classification Criterion: A sample data point with feature vector is classified as follows:

  • Finding the Weights (Theoretical Solution):

    • Goal: Find the weights that minimize misclassifications in the training set.

    • Classical Approach: Solve a system of linear equations.

    • Requirement: For -dimensional data, you need linear equations (or samples) to solve for the weights.

    • Let represent the class label: * (or ) for class 1. * (or ) for class 2. *Given training sample. . Where, ,

    • System of Equations: Substituting each training sample into equation (8.2), we get equations: (8.3)

    • Matrix Form: Equation (8.3) can be expressed in matrix form:

      (8.4)

    • Compact Matrix Notation:

      • and are the transposes of and , respectively.
    • Solution: The solution for is: (8.5)

8.1.2 An Optimal Solution

  • Limitations of the Theoretical Solution: The solution in 8.1.1 is based on only samples and is therefore not optimal for the entire training set.

  • Optimal Solution Goal: Minimize the squared errors of over the entire training dataset of data points . Where, ,

  • Total Squared Error: (8.6)

  • Minimization: Take the partial derivative of with respect to each () and set it to zero: . This gives linear equations:

    (8.7)

  • Equivalent Form: (8.8)

  • Solving for Weights: Solve the linear equations in (8.8) using the same method as in (8.3). This results in an optimal hyperplane.

  • Key Differences from (8.3):

    • is replaced by .
    • is replaced by .
  • Solution in Matrix Form: (8.8a) Where is the matrix representing the data, and d is a vector representing the class values of the data.

8.1.3 A Suboptimal Solution

  • Limitations of the Optimal Solution (8.8): While (8.8) is more optimal than (8.5), it involves processing very large matrices, which is computationally expensive and undesirable, especially for high-dimensional multimedia data.

  • Iterative Optimization: An alternative is to use an iterative optimization algorithm to find a suboptimal solution to (8.2). This avoids large matrix operations.

  • Error-Driven Weight Adaptation: A common practice is a trial-and-error technique, adjusting the weights based on misclassifications.

  • Iterative Procedure:

    1. Initialization: Initialize the weights with small random values.
    2. Next Training Sample: Take the next training sample , where or .
    3. Compute : .
    4. Update Weights (if misclassified):
      • If (a misclassification), update the weights:
        • , for .
        • and are positive constants.
    5. Repeat: Repeat steps 2-4 for all remaining training samples, until all samples are correctly classified or the weights stop changing.
  • Explanation of Weight Update:

    • Let be the updated value and be the old value of .

    • Since , , and are all positive:

      • If , all become larger.
      • If , all become smaller.
    • This means the decision function is updated as follows:

    • Therefore, the hyperplane moves in the correct direction with the updated weights until the misclassified sample is on the correct side.