Geometric Camera Calibration

Plan

Review Perspective Projection: Understand the mathematical model of how cameras form images.
Geometric Camera Calibration: Determine the camera’s parameters.
- Indirect Calibration: First solve for the overall projection matrix, then extract individual parameters.
- Direct Calibration: Solve for parameters more directly (often involving non-linear methods or specific assumptions).
- Multi-planes Method (Zhang’s Method): A popular technique using a planar calibration target.
- Example: Using the Matlab Camera Calibration Toolbox.
Catadioptric Sensing: (Briefly mentioned, covered in a different slide deck - e.g., using mirrors).
Other Methods (Not Covered in Detail):
- Vanishing points-based method.
- Self-calibration (using only image correspondences without a known target).

Review: Perspective Projection Model

Camera Parameters

A camera’s imaging process is modeled by several parameters:

Extrinsic Parameters: Define the camera’s position and orientation in the 3D world.
- Rotation ( $R$ ): A 3x3 rotation matrix describing the camera’s orientation relative to the world coordinate system.
- Translation ( $T$ ): A 3x1 vector describing the position of the camera’s optical center relative to the world origin.
Intrinsic Parameters: Define the camera’s internal optical and geometric properties. Encoded in the Camera Matrix ( $K$ ).
- Focal Length ( $f$ ): Distance from the optical center to the image plane. Often expressed in pixel units ( $α = f s_{x}$ , $β = f s_{y}$ ).
- Principal Point ( $c_{x}, c_{y}$ or $u_{0}, v_{0}$ ): The point where the optical axis intersects the image plane, usually near the image center. Coordinates are in pixels.
- Pixel Size ( $s_{x}, s_{y}$ ): Physical size of a pixel (e.g., mm/pixel). Incorporated into $α, β$ .
- Skew Angle ( $θ$ ): Angle between the image sensor’s x and y axes. Usually close to 90 degrees, making the skew factor ( $s = - α cot θ$ ) close to zero for modern cameras.

Important Term: Intrinsics vs. Extrinsics: Intrinsics describe the camera itself; Extrinsics describe its pose in the world.

(Note: Parameter definitions, especially intrinsics, can vary slightly between texts/toolboxes.)

Projection Equation

The relationship between a 3D world point $X_{w} = [X_{w}, Y_{w}, Z_{w}]^{T}$ and its corresponding 2D image point $x = [u, v]^{T}$ is described by the perspective projection equation. Using homogeneous coordinates:

3D world point: $P_{w} = [X_{w}, Y_{w}, Z_{w}, 1]^{T}$
2D image point: $p = [u, v, 1]^{T}$

The projection is given by:

λ p = K [R ∣ T] P_{w}

Where:

$λ$ is a scale factor (depth).
$K$ is the $3 \times 3$ intrinsic matrix.
$[R ∣ T]$ is the $3 \times 4$ extrinsic matrix (combining rotation $R$ and translation $T$ ).
The combined $3 \times 4$ matrix $M = K [R ∣ T]$ is called the Projection Matrix.

The Intrinsic Matrix (K)

A common form for the intrinsic matrix $K$ is:

K = α 00 s β 0 u_{0} v_{0} 1

Where:

$α = f \cdot k$ (focal length in x-pixel units)
$β = f \cdot l$ (focal length in y-pixel units)
$s = - α cot θ$ (skew parameter, often 0)
$(u_{0}, v_{0})$ is the principal point.

$K$ has 5 degrees of freedom (DoF): $α, β, u_{0}, v_{0}, s$ . If skew is zero and pixels are square ( $α = β$ ), only 3 DoF remain.

Normalized Image Plane

Conceptually, the projection can be seen as first projecting the 3D point onto a “normalized” image plane located at $f = 1$ . The coordinates on this plane are $(\hat{X}, \hat{Y}) = (X / Z, Y / Z)$ . The intrinsic matrix $K$ then maps these normalized coordinates to pixel coordinates, accounting for focal length, principal point, pixel size, and skew.

Properties of Perspective Projection

Distant objects appear smaller.
Points project to points.
Lines project to lines.
Vanishing Points: Projections of points at infinity. Parallel lines in 3D appear to converge at a vanishing point in the image.
Angles are not preserved.
Parallel lines in 3D are not generally parallel in the image (they meet at a vanishing point).

Geometric Camera Calibration

Goal: To estimate the intrinsic ( $K$ ) and extrinsic ( $[R ∣ T]$ ) parameters of a camera. Equivalently, estimate the $3 \times 4$ projection matrix $M$ and decompose it.

The Calibration Problem

Input: A set of $n$ known 3D points $P_{w, i}$ (e.g., corners on a calibration rig/checkerboard) and their corresponding measured 2D image projections $p_{i}$ .
Output: The intrinsic matrix $K$ and the extrinsic matrix $[R ∣ T]$ .

How many points are needed?

The projection matrix $M$ has 12 entries but is defined only up to an overall scale factor, meaning it has 11 degrees of freedom (DoF).
Each 3D-to-2D point correspondence provides 2 linear constraints on the entries of $M$ .
Therefore, we need at least $11/2 = 5.5$ points. Since we need a whole number of points, we need at least 6 points (providing 12 constraints) to solve linearly for $M$ .
Exam Tip: Remember that $M$ has 11 DoF and requires at least 6 point correspondences for a unique solution (assuming points are in general position).

Indirect Calibration (DLT Approach)

This approach first solves for the projection matrix $M$ and then decomposes it into $K$ , $R$ , and $T$ .

Formulate Linear Equations: Start with the projection equation $λ p_{i} = M P_{w, i}$ . Let $p_{i} = [u_{i}, v_{i}, 1]^{T}$ and $M = m_{1}^{T} m_{2}^{T} m_{3}^{T}$ , where $m_{j}$ are the rows of $M$ . Writing out the components: $λ u_{i} = m_{1}^{T} P_{w, i}$ $λ v_{i} = m_{2}^{T} P_{w, i}$ $λ = m_{3}^{T} P_{w, i}$ Eliminate $λ$ to get two linear equations in the elements of $M$ for each point $i$ : $u_{i} (m_{3}^{T} P_{w, i}) - (m_{1}^{T} P_{w, i}) = 0$ $v_{i} (m_{3}^{T} P_{w, i}) - (m_{2}^{T} P_{w, i}) = 0$
Set up Homogeneous Linear System: Stack the equations from $n \geq 6$ points into a single matrix equation:
$P m = 0$
Where:
- $P$ is a $2 n \times 12$ matrix constructed from the known $P_{w, i}$ and measured $u_{i}, v_{i}$ .
Structure of P for one point i (two rows) $P_{i}$ = [ $X_{i}$ , $Y_{i}$ , $Z_{i}$ , $1$ ] $ro w_{1}$ = [ $0^{T}$ , $- P_{i}^{T}$ , $v_{i} * P_{i}^{T}$ ] $ro w_{2}$ = [ $P_{i}^{T}$ , $0^{T}$ , $- u_{i} * P_{i}^{T}$ ] P stacks row1, row2 for all i
- $m$ is a $12 \times 1$ vector containing the elements of $M$ (e.g., stacked rows or columns).
Solve using SVD: This is a homogeneous linear system ( $A x = 0$ ). The solution $m$ that minimizes $∣∣ P m ∣ ∣^{2}$ subject to $∣∣ m ∣ ∣^{2} = 1$ is found using Singular Value Decomposition (SVD). Compute the SVD of $P$ : $P = UD V^{T}$ . The solution $m$ is the last column of $V$ , which corresponds to the smallest singular value of $P$ . Important Term: DLT (Direct Linear Transformation): This method of solving for transformations (like $M$ ) by setting up and solving $P m = 0$ using SVD is known as DLT.
Extract Parameters from M: Once $m$ is found, reshape it into the $3 \times 4$ matrix $M$ . We know $M = ρ [A ∣ b] = ρ K [R ∣ T]$ , where $ρ$ is an unknown scale factor.
- Decompose $A = ρ KR$ . This can be done using RQ decomposition (similar to QR). Since $K$ is upper triangular and $R$ is orthogonal, RQ decomposition finds these factors.
- The scale factor $ρ$ can be determined (e.g., by requiring $K_{33} = 1$ ).
- The translation is then $T = \frac{1}{ρ} K^{- 1} b$ .
- Intrinsic Parameters: Extracted from $K$ .
- Extrinsic Parameters: $R$ and $T$ .
- Faugeras’ Theorem (1993): Provides conditions on $M$ (specifically on its $3 \times 3$ left part $A$ ) for it to be a valid perspective projection matrix (e.g., $det (A) \neq = 0$ ) and for specific constraints like zero-skew.
Degenerate Cases: The basic DLT method requires the 3D points $P_{w, i}$ to be non-coplanar. If they are coplanar, the solution for $M$ is not unique. Special methods (like Zhang’s) handle planar targets.

Handling Lens Distortion

Real lenses introduce distortions not modeled by the ideal pinhole model.

Types:
- Radial Distortion: Most significant, causes straight lines to appear curved, especially near image edges. (Barrel distortion: magnification decreases with distance from center; Pincushion distortion: magnification increases).
- Spherical Aberration, Chromatic Aberration (less commonly modeled in basic calibration).
Modeling: Radial distortion is often modeled by adjusting the image coordinates $(u, v)$ based on their distance $r = (u - u_{0})^{2} + (v - v_{0})^{2}$ from the principal point. Let $(u_{d}, v_{d})$ be the distorted (measured) coordinates and $(u_{u}, v_{u})$ be the undistorted (ideal pinhole) coordinates. $u_{u} = u_{d} + (u_{d} - u_{0}) (k_{1} r_{d}^{2} + k_{2} r_{d}^{4} + ...) v_{u} = v_{d} + (v_{d} - v_{0}) (k_{1} r_{d}^{2} + k_{2} r_{d}^{4} + ...)$ Where $r_{d}$ is the radius using distorted coordinates, and $k_{1}, k_{2}, ...$ are the radial distortion coefficients. Tangential distortion parameters ( $p_{1}, p_{2}$ ) can also be added.
Impact: Including distortion parameters makes the calibration problem non-linear.
Solution Strategy:
1. Initial Guess: Estimate $M$ (and thus $K, R, T$ ) using the linear DLT method, ignoring distortion.
2. Non-linear Refinement: Minimize the reprojection error over all parameters (intrinsics $K$ , extrinsics $[R ∣ T]$ for each view, and distortion coefficients $k_{1}, k_{2}, ...$ ) using an iterative optimization algorithm like Levenberg-Marquardt. Important Term: Reprojection Error: The geometric distance (e.g., Euclidean distance in pixels) between the measured 2D image point $p_{i}$ and the projected 2D point $\hat{p}_{i}$ obtained by projecting the corresponding 3D point $P_{w, i}$ using the current estimate of the camera parameters ( $K, R, T$ , distortion). $min_{K, R, T, k_{1}, k_{2}, ...} \sum_{i} ∣∣ p_{i} - \hat{p}_{i} (K, R, T, k_{1}, k_{2}, ..., P_{w, i}) ∣ ∣^{2}$

Direct Calibration (Alternative Approach, e.g., Tsai)

These methods attempt to solve for parameters more directly, sometimes making simplifying assumptions (like zero distortion, known aspect ratio, known principal point) to derive linear solutions for some parameters, followed by non-linear optimization for others. The slides outline one such method (Slides 52-59) involving solving for parts of the extrinsic parameters linearly using SVD, determining scale factors, orthogonalizing the rotation matrix, and finally solving for focal length and $T_{z}$ . These often require careful handling of scale and signs.

Multi-Plane Calibration (Zhang’s Method)

A very popular and practical method that uses a planar calibration target (e.g., a checkerboard) shown at several different orientations.

Homography: For a planar target (assume $Z_{w} = 0$ ), the relationship between 3D world points $P_{w} = [X_{w}, Y_{w}, 0, 1]^{T}$ and image points $p$ simplifies. Let $\tilde{P}_{w} = [X_{w}, Y_{w}, 1]^{T}$ . The projection is given by a homography $H$ : $λ p = K [r_{1} r_{2} r_{3} ∣ T] X_{w} Y_{w} 01 = K [r_{1} r_{2} ∣ T] X_{w} Y_{w} 1 = H \tilde{P}_{w}$ Where $H = K [r_{1} r_{2} ∣ T]$ is a $3 \times 3$ matrix, and $r_{1}, r_{2}$ are the first two columns of the rotation matrix $R$ .
Estimate Homographies: For each image ( $k$ ) of the planar target, estimate the homography $H_{k}$ relating the known 2D points on the target plane to the measured 2D image points using DLT.
Constraints on Intrinsics: Let $H = [h_{1} h_{2} h_{3}]$ . We have $[h_{1} h_{2} h_{3}] = ρ K [r_{1} r_{2} t]$ . Using the properties that $r_{1}, r_{2}$ are orthonormal columns of $R$ :
- $r_{1}^{T} r_{2} = 0$
- $r_{1}^{T} r_{1} = r_{2}^{T} r_{2} = 1$ (same length) Substitute $r_{1} = \frac{1}{ρ} K^{- 1} h_{1}$ and $r_{2} = \frac{1}{ρ} K^{- 1} h_{2}$ . This yields two linear constraints on the matrix $B = K^{- T} K^{- 1}$ (which is symmetric) per homography:
- $h_{1}^{T} K^{- T} K^{- 1} h_{2} = 0$
- $h_{1}^{T} K^{- T} K^{- 1} h_{1} = h_{2}^{T} K^{- T} K^{- 1} h_{2}$
Solve for Intrinsics (K): Since $B$ is symmetric ( $3 \times 3$ ), it has 6 unique elements. Each homography gives 2 constraints. Stacking constraints from $n \geq 3$ views allows solving linearly for the vector of elements in $B$ . Once $B$ is known, $K$ can be recovered using Cholesky decomposition (since $B = K^{- T} K^{- 1}$ ).
Solve for Extrinsics ([R|T]): Once $K$ is known, the extrinsics for each view $k$ can be easily computed from the corresponding homography $H_{k}$ .
Refinement: Perform non-linear minimization of reprojection error (as described before) to refine $K$ , all $[R_{k} ∣ T_{k}]$ , and potentially lens distortion parameters. Exam Tip: Zhang’s method is important. Key ideas: planar target → homography, constraints from rotation matrix properties on intrinsics, linear solve for $K^{- T} K^{- 1}$ , non-linear refinement.

Self-Calibration

Goal: Calibrate the camera using only point correspondences between multiple images, without a known calibration object.
Assumptions: Typically assumes a static scene and fixed intrinsic parameters across views.
Method: Relies on constraints from projective geometry, often involving the “image of the absolute conic.”
Characteristics: Highly flexible (no special setup needed) but generally less accurate and robust than methods using calibration targets. More complex mathematically.

Quartz 4

Explorer

Geometric Camera Calibration

Plan

Review: Perspective Projection Model

Camera Parameters

Projection Equation

The Intrinsic Matrix (K)

Normalized Image Plane

Properties of Perspective Projection

Geometric Camera Calibration

The Calibration Problem

How many points are needed?

Indirect Calibration (DLT Approach)

Handling Lens Distortion

Direct Calibration (Alternative Approach, e.g., Tsai)

Multi-Plane Calibration (Zhang’s Method)

Self-Calibration

Graph View

Table of Contents

Backlinks