Quartz 4

❯

❯

8.1 Linear Classifier

8.1 Linear Classifier

Apr 27, 20255 min read

8.1.1 A Theoretical Solution

Limitations of Bayesian Methods: Bayesian methods (covered in Chapter 7) rely on accurate probability models. However, in real-world scenarios, data distributions are often unknown. Accurate model estimation requires a large number of training samples, which may not always be available, especially with high-dimensional data (like multimedia).
Alternative Approach: Discriminant Functions: Instead of modeling the entire data distribution, we can assume a functional form for the decision boundary between classes. We then estimate the parameters of this function using training data. Linear classifiers are one type of such an approach.
Linear Discriminant Function Formulation:
- Let $x = (x_{1}, x_{2}, ..., x_{n})$ be an $n$ -dimensional feature vector representing the data.
- A linear discriminant function is defined as: $f (x) = w_{0} + w_{1} x_{1} + w_{2} x_{2} + ... + w_{n} x_{n}$ (8.1) where:
  - $x_{i}$ are the variables (features).
  - $w_{i}$ are the coefficients or weights.
- Letting $x_{0} = 1$ , the function can be rewritten as: $f (x) = \sum_{j = 0}^{n} w_{j} x_{j}$ (8.2)
Geometric Interpretation:
- The equation $f (x) = 0$ represents a hyperplane in $n$ -dimensional space. This hyperplane acts as the decision boundary.
- Classification Criterion: A sample data point with feature vector $x$ is classified as follows:
  $x \in {class 1 class 2 if f (x) > 0 if f (x) < 0$
Finding the Weights (Theoretical Solution):
- Goal: Find the weights $w_{0}, w_{1}, w_{2}, ..., w_{n}$ that minimize misclassifications in the training set.
- Classical Approach: Solve a system of linear equations.
- Requirement: For $n$ -dimensional data, you need $n + 1$ linear equations (or samples) to solve for the $n + 1$ weights.
- Let $d_{i}$ represent the class label: * $d_{i} = 1$ (or $d_{i} > 0$ ) for class 1. * $d_{i} = - 1$ (or $d_{i} < 0$ ) for class 2. *Given $n + 1$ training sample. $(x_{i}, d_{i})$ . Where, $x_{i} = (x_{i 1}, x_{i 2}, ..., x_{in})$ , $i = 1, 2, ..., n + 1$
- System of Equations: Substituting each training sample $x_{i}$ into equation (8.2), we get $n + 1$ equations: $\sum_{j = 0}^{n} w_{j} x_{ij} = d_{i}, i = 1, 2, ..., n + 1$ (8.3)
- Matrix Form: Equation (8.3) can be expressed in matrix form:
  $11 ⋮ 11 x_{11} x_{21} ⋮ x_{n 1} x_{n + 1, 1} x_{12} x_{22} ⋮ x_{n 2} x_{n + 1, 2} \dots \dots ⋱ \dots \dots x_{1 n} x_{2 n} ⋮ x_{nn} x_{n + 1, n} w_{0} w_{1} w_{2} ⋮ w_{n} = d_{1} d_{2} ⋮ d_{n} d_{n + 1}$
  (8.4)
- Compact Matrix Notation: $X w^{T} = d^{T}$
  - $w^{T}$ and $d^{T}$ are the transposes of $w$ and $d$ , respectively.
- Solution: The solution for $w$ is: $w^{T} = X^{- 1} d^{T}$ (8.5)

8.1.2 An Optimal Solution

Limitations of the Theoretical Solution: The solution in 8.1.1 is based on only $n + 1$ samples and is therefore not optimal for the entire training set.
Optimal Solution Goal: Minimize the squared errors of $f (x)$ over the entire training dataset of $N$ data points $(x_{i}, d_{i})$ . Where, $x_{i} = (x_{i 1}, x_{i 2}, ..., x_{in})$ , $i = 1, 2, ..., N$
Total Squared Error: $E = \sum_{i = 0}^{N - 1} (f (x_{i}) - d_{i})^{2}$ $E = \sum_{i = 0}^{N - 1} (\sum_{j = 0}^{n} w_{j} x_{ij} - d_{i})^{2}$ (8.6)
Minimization: Take the partial derivative of $E$ with respect to each $w_{k}$ ( $k = 0, 1, 2, ..., n$ ) and set it to zero: $\frac{\partial E}{\partial w _{k}} = 0$ . This gives $n + 1$ linear equations:

$\sum_{i = 0}^{N - 1} x_{ik} (\sum_{j = 0}^{n} w_{j} x_{ij} - d_{i}) = 0, k = 0, 1, 2, ..., n$ (8.7)
Equivalent Form: $\sum_{j = 0}^{n} w_{j} (\sum_{i = 0}^{N - 1} x_{ik} x_{ij}) = \sum_{i = 0}^{N - 1} x_{ik} d_{i}, k = 0, 1, 2, ..., n$ (8.8)
Solving for Weights: Solve the $n + 1$ linear equations in (8.8) using the same method as in (8.3). This results in an optimal hyperplane.
Key Differences from (8.3):
- $x_{ij}$ is replaced by $\sum_{i = 0}^{N - 1} x_{ik} x_{ij}$ .
- $d_{i}$ is replaced by $\sum_{i = 0}^{N - 1} x_{ik} d_{i}$ .
Solution in Matrix Form: $w^{T} = (X^{T} X)^{- 1} (X^{T} d^{T})$ (8.8a) Where $X$ is the $N \times (n + 1)$ matrix representing the $N$ data, and d is a $1 \times N$ vector representing the class values of the $N$ data.

8.1.3 A Suboptimal Solution

Limitations of the Optimal Solution (8.8): While (8.8) is more optimal than (8.5), it involves processing very large matrices, which is computationally expensive and undesirable, especially for high-dimensional multimedia data.
Iterative Optimization: An alternative is to use an iterative optimization algorithm to find a suboptimal solution to (8.2). This avoids large matrix operations.
Error-Driven Weight Adaptation: A common practice is a trial-and-error technique, adjusting the weights based on misclassifications.
Iterative Procedure:
1. Initialization: Initialize the weights $w_{0}, w_{1}, w_{2}, ..., w_{n}$ with small random values.
2. Next Training Sample: Take the next training sample ${x, d} = {(x_{1}, x_{2}, ..., x_{n}), d}$ , where $d = 1$ or $- 1$ .
3. Compute $f (x)$ : $f (x) = w_{0} + w_{1} x_{1} + w_{2} x_{2} + ... + w_{n} x_{n}$ .
4. Update Weights (if misclassified):
  - If $f (x) \neq = d$ (a misclassification), update the weights:
    - $w_{0} \leftarrow w_{0} + c d k$
    - $w_{j} \leftarrow w_{j} + c d x_{j}$ , for $j = 1, 2, ..., m$ .
    - $k$ and $c$ are positive constants.
5. Repeat: Repeat steps 2-4 for all remaining training samples, until all samples are correctly classified or the weights stop changing.
Explanation of Weight Update:
- Let $f_{n e w}$ be the updated value and $f_{o l d}$ be the old value of $f (x)$ .
- Since $k$ , $c$ , and $x_{j}$ are all positive:
  - If $d = 1$ , all $w_{j}$ become larger.
  - If $d = - 1$ , all $w_{j}$ become smaller.
- This means the decision function is updated as follows:
  ${f_{n e w} > f_{o l d} f_{n e w} < f_{o l d} if d = 1 if d = - 1$
- Therefore, the hyperplane $f (x)$ moves in the correct direction with the updated weights until the misclassified sample is on the correct side.

Graph View

8.1.1 A Theoretical Solution
8.1.2 An Optimal Solution
8.1.3 A Suboptimal Solution

Backlinks

CSD504-CV-Home

Created with Quartz v4.5.0 © 2025

GitHub
Discord Community