PCA IMP

(Part 1) Importance of Dimensionality Reduction in Computer Vision (2 Marks)

Digital images, especially high-resolution ones, contain a vast amount of data. When an image of size $N \times M$ pixels is treated as a data point (e.g., by flattening it into a vector), its dimensionality is $N \times M$ . This high dimensionality poses several challenges in computer vision tasks:

The Curse of Dimensionality: As dimensionality increases, the volume of the data space grows exponentially. This makes the available data points sparse, requiring significantly more data to train models effectively and avoid overfitting. Many algorithms struggle to generalize well in very high-dimensional spaces.
Computational Complexity: Processing, storing, and analyzing high-dimensional data is computationally expensive. Algorithms operating on raw pixel data (like classification, clustering, or retrieval) become slow and memory-intensive.
Redundancy and Noise: Raw pixel data often contains significant redundancy (e.g., strong correlations between adjacent pixels) and noise. Dimensionality reduction techniques aim to capture the essential structure (variance) while discarding redundancy and some noise, leading to more robust representations.

Therefore, dimensionality reduction is crucial in computer vision to:

Improve computational efficiency (faster training and inference, lower memory usage).
Mitigate the curse of dimensionality, potentially leading to better model generalization.
Extract meaningful features by removing redundancy and noise.

(Part 2) Block-Based PCA for Dimensionality Reduction on Grayscale Images (10 Marks)

Principal Component Analysis (PCA) is a standard technique for dimensionality reduction. It identifies the directions (principal components) along which the variance in the data is maximal. Applying PCA directly to an entire image (flattened into a vector) is often computationally infeasible due to the large size of the covariance matrix ( $(N \times M) \times (N \times M)$ ). Block-based PCA offers a more practical approach by operating on smaller image patches (blocks).

Here’s how block-based PCA serves the purpose of dimensionality reduction for a digital grayscale image $I$ :

1. Image Representation and Block Extraction:

Let the grayscale image be $I$ .
Divide the image $I$ into small, possibly overlapping or non-overlapping blocks of size $b \times b$ pixels (e.g., $8 \times 8$ ).
Each block is flattened (e.g., row by row) into a vector $x \in R^{d}$ , where $d = b^{2}$ .
Collect a large set of these block vectors from the image (or a representative set of training images) to form a dataset $X = {x_{1}, x_{2}, ..., x_{K}}$ , where $K$ is the total number of blocks.

2. Apply PCA to the Set of Blocks:

Mean Subtraction: Calculate the mean block vector $μ = \frac{1}{K} \sum_{i = 1}^{K} x_{i}$ . Subtract the mean from each block vector: $ϕ_{i} = x_{i} - μ$ . Let $Φ = [ϕ_{1}, ϕ_{2}, ..., ϕ_{K}]$ be the matrix of mean-centered block vectors.
Covariance Matrix: Compute the sample covariance matrix of the block vectors: $C = \frac{1}{K - 1} \sum_{i = 1}^{K} ϕ_{i} ϕ_{i}^{T} = \frac{1}{K - 1} Φ Φ^{T}$ $C$ is a $d \times d$ matrix ( $b^{2} \times b^{2}$ ).
Eigendecomposition: Find the eigenvalues $λ_{1} \geq λ_{2} \geq ... \geq λ_{d} \geq 0$ and the corresponding orthonormal eigenvectors $e_{1}, e_{2}, ..., e_{d}$ of the covariance matrix $C$ : $C e_{j} = λ_{j} e_{j}$ The eigenvectors $e_{j}$ are the principal components of the block data. They represent the primary modes of variation within the $b \times b$ image blocks and can be visualized as “eigenblocks”.

3. Dimensionality Reduction:

Select Principal Components: Choose the top $p$ eigenvectors (where $p ≪ d$ ) corresponding to the $p$ largest eigenvalues. These eigenvectors capture the most significant variance in the block data. Form a projection matrix $E = [e_{1}, e_{2}, ..., e_{p}]$ . $E$ is a $d \times p$ matrix.
Projection (Encoding): For any given block $x$ from the image, first subtract the mean: $ϕ = x - μ$ . Then, project this mean-centered block onto the $p$ -dimensional subspace spanned by the selected eigenvectors: $y = E^{T} ϕ$ The vector $y \in R^{p}$ is the low-dimensional representation of the original $d$ -dimensional block $x$ .

4. How Dimensionality Reduction is Achieved:

Instead of storing or processing the original $d = b^{2}$ pixel values for each block, we now store or process only the $p$ coefficients in the vector $y$ .
Since we choose $p ≪ d$ (e.g., reducing a $8 \times 8 = 64$ dimensional block to $p = 10$ or $p = 16$ dimensions), we achieve significant dimensionality reduction for each block.
The entire image can then be represented by the collection of these low-dimensional vectors $y_{i}$ for all its blocks, along with the projection matrix $E$ and the mean block $μ$ (which are computed once from the training set).

5. Reconstruction (Optional but Illustrative):

An approximation $\hat{x}$ of the original block $x$ can be reconstructed from its low-dimensional representation $y$ : $\hat{ϕ} = E y$ (Project back to the original $d$ -dimensional space) $\hat{x} = \hat{ϕ} + μ = E y + μ$
The quality of the reconstruction depends on the number of principal components $p$ retained. Using more components leads to better reconstruction but less dimensionality reduction.

Quartz 4

Explorer

PCA IMP

Graph View

Backlinks