8.1.1 A Theoretical Solution
-
Limitations of Bayesian Methods: Bayesian methods (covered in Chapter 7) rely on accurate probability models. However, in real-world scenarios, data distributions are often unknown. Accurate model estimation requires a large number of training samples, which may not always be available, especially with high-dimensional data (like multimedia).
-
Alternative Approach: Discriminant Functions: Instead of modeling the entire data distribution, we can assume a functional form for the decision boundary between classes. We then estimate the parameters of this function using training data. Linear classifiers are one type of such an approach.
-
Linear Discriminant Function Formulation:
-
Let be an -dimensional feature vector representing the data.
-
A linear discriminant function is defined as: (8.1) where:
- are the variables (features).
- are the coefficients or weights.
-
Letting , the function can be rewritten as: (8.2)
-
-
Geometric Interpretation:
-
The equation represents a hyperplane in -dimensional space. This hyperplane acts as the decision boundary.
-
Classification Criterion: A sample data point with feature vector is classified as follows:
-
-
Finding the Weights (Theoretical Solution):
-
Goal: Find the weights that minimize misclassifications in the training set.
-
Classical Approach: Solve a system of linear equations.
-
Requirement: For -dimensional data, you need linear equations (or samples) to solve for the weights.
-
Let represent the class label: * (or ) for class 1. * (or ) for class 2. *Given training sample. . Where, ,
-
System of Equations: Substituting each training sample into equation (8.2), we get equations: (8.3)
-
Matrix Form: Equation (8.3) can be expressed in matrix form:
(8.4)
-
Compact Matrix Notation:
- and are the transposes of and , respectively.
-
Solution: The solution for is: (8.5)
-
8.1.2 An Optimal Solution
-
Limitations of the Theoretical Solution: The solution in 8.1.1 is based on only samples and is therefore not optimal for the entire training set.
-
Optimal Solution Goal: Minimize the squared errors of over the entire training dataset of data points . Where, ,
-
Total Squared Error: (8.6)
-
Minimization: Take the partial derivative of with respect to each () and set it to zero: . This gives linear equations:
(8.7)
-
Equivalent Form: (8.8)
-
Solving for Weights: Solve the linear equations in (8.8) using the same method as in (8.3). This results in an optimal hyperplane.
-
Key Differences from (8.3):
- is replaced by .
- is replaced by .
-
Solution in Matrix Form: (8.8a) Where is the matrix representing the data, and
dis a vector representing the class values of the data.
8.1.3 A Suboptimal Solution
-
Limitations of the Optimal Solution (8.8): While (8.8) is more optimal than (8.5), it involves processing very large matrices, which is computationally expensive and undesirable, especially for high-dimensional multimedia data.
-
Iterative Optimization: An alternative is to use an iterative optimization algorithm to find a suboptimal solution to (8.2). This avoids large matrix operations.
-
Error-Driven Weight Adaptation: A common practice is a trial-and-error technique, adjusting the weights based on misclassifications.
-
Iterative Procedure:
- Initialization: Initialize the weights with small random values.
- Next Training Sample: Take the next training sample , where or .
- Compute : .
- Update Weights (if misclassified):
- If (a misclassification), update the weights:
- , for .
- and are positive constants.
- If (a misclassification), update the weights:
- Repeat: Repeat steps 2-4 for all remaining training samples, until all samples are correctly classified or the weights stop changing.
-
Explanation of Weight Update:
-
Let be the updated value and be the old value of .
-
Since , , and are all positive:
- If , all become larger.
- If , all become smaller.
-
This means the decision function is updated as follows:
-
Therefore, the hyperplane moves in the correct direction with the updated weights until the misclassified sample is on the correct side.
-