Lecture 13¶
Announcements¶
- Anyone not paired up yet for P3?
Goals¶
- Know what is meant by a homogeneous "point at infinity"
- Understand the derivation and significance of:
- The Epipolar plane, epipolar lines, epipoles
- The essential matrix and the fundamental matrix
- Be able to set up and solve the reprojection error equations for the Direct Linear Transform (DLT) to find:
- (pose estimation) a camera matrix, given some 3D points and their 2D observations, or
- (triangulation) a 3D point given some camera locations and its 2D observation in each.
- Get a general sense of how structure-from-motion can bootstrap both camera geometry and 3D point locations starting only with 2D point correspondences among multiple cameras.
# boilerplate setup
%load_ext autoreload
%autoreload 2
%matplotlib inline
import os
import sys
src_path = os.path.abspath("../src")
if (src_path not in sys.path):
sys.path.insert(0, src_path)
# Library imports
import numpy as np
import imageio.v3 as imageio
import matplotlib.pyplot as plt
import skimage as skim
import cv2
# codebase imports
import util
import filtering
import features
import geometry
The autoreload extension is already loaded. To reload it, use: %reload_ext autoreload
Plan¶
- Points at infinity: intuition
- Intersection point of parallel lines
- Also can be viewed as "direction vectors"
- This is nice because if you transform them, the translation gets ignored as you'd want it to when transforming a direction.
- Epipolar geometry (via notes): essential and fundamental matrices
- Finding the fundamental matrix from 2D correspondences
- 8-point algorithm
- Find $R$, $t$ from $E$
- Multi-view geometry: problem taxonomy
- Multi-view geometry - simplest approach via DLT
- Write down "reprojection error" residuals
- Formulate pose estimation problem
- Formulate triangulation problem
- Write down "reprojection error" residuals
- Structure from motion: you don't know nothin'!
- We can solve for $F$ using 2D correspondences.
- Separating E into [R, t]
- What if you know F but not E?
- Trickier
- What if you know F but not E?
- In practice: structure from motion
Points at infinity: notes
Epipolar geometry: notes (see also mostly-typed notes linked from course webpage)
HW #1-2: Epipolar geometry of a rectified stereo pair¶
Given a rectified stereo pair, what constraint can you put on the homogeneous coordinates of all epipolar lines $\ell_p = [a, b, c]$?
In a rectified stereo pair, give the homogeneous coordinates of the right epipole (expressed in the right camera's image coordinates).
Finding the Fundamental Matrix from 2D Correspondences¶
via the 8-point algorithm
Let $p$ and $p'$ be a pair of corresponding points: $$p = (u, v, 1)$$
$$p' = (u', v', 1)$$
and the fundamental matrix relating their two cameras:
$$ F = \begin{bmatrix} f_{11} & f_{12} & f_{13}\\ f_{21} & f_{22} & f_{23} \\ f_{31} & f_{32} & f_{33} \\ \end{bmatrix} $$
The epipolar constraint gives us the constraint:
$$ p'^T F p = 0 $$
If we write this out in scalar form, this yields a single equation that's not linear in the point locations, but is linear in the fundamental matrix entries:
$$ uu' f_{11} + vu' f_{12} + u'f_{13} + uv' f_{21} + vv'f_{22} + v'f_{23} + uf_{31} + vf_{32} + f_{33} = 0 $$
Stack eight of these into a homogeneous linear system and you can solve for the entries of $F$ similarly to the way we did it for a homography.
If you're actually going to do this, beware: the magnitudes of the single terms (e.g., $vf_{32}$) vs the product terms (e.g., $uv' f_{21}$) will differ greatly, causing numerical stability problems. Fix: scale all observations to within the range [0,1] so their products can't get crazy; this is the normalized 8-point algorithm.
Given $F$, can we find $K$ and $[R|t]$?¶
Sort of. You can get $[R|t]$ from $E$ using some SVD tricks.
You can get $K$ from $F$ in specialized circumstances but not unambiguously and not in general.
In practice: estimate $f$ from camera metadata to get initial estimate, refine using nonlinear optimization.
Multi-view geometry: problem taxonomy¶
Notes
Multi-view geometry - a taxonomy of problems: notes
Pose Estimation and Triangulation via the Direct Linear Transform (DLT)¶
HW Problems 3-7¶
The projection of a 3D point to its 2D coordinates can be written as: $$ \begin{bmatrix}x_i\\ y_i \\ 1\end{bmatrix} = \begin{bmatrix}x_p/w_p\\ y_p/w_p \\ 1\end{bmatrix} \sim \begin{bmatrix}x_p\\ y_p \\ w_p\end{bmatrix}= \begin{bmatrix} p_{00} & p_{01} & p_{02} & p_{03} \\ p_{10} & p_{11} & p_{12} & p_{13} \\ p_{20} & p_{21} & p_{22} & p_{23} \\ \end{bmatrix} \begin{bmatrix}X_i\\ Y_i \\ Z_i \\ W_i \end{bmatrix} $$
Write down the residuals for the reprojection error given a camera matrix $P$, a 3D world point $p_{world}=(X_i, Y_i, Z_i, W_i)$, and its image space coordinates $p_{img} = (x_j, y_j)$. Note that we'll need to use the same "multiply by the denominator" trick we used when solving for Homographies, meaning these residuals don't perfectly correspond to the reprojection error.
Give the first two rows of the $A$ matrix in the least squares system $Ax = 0$ that you'd solve to find the elements of the camera matrix $$ P = \begin{bmatrix} p_{00} & p_{01} & p_{02} & p_{03} \\ p_{10} & p_{11} & p_{12} & p_{13} \\ p_{20} & p_{21} & p_{22} & p_{23} \\ \end{bmatrix}. $$ given a (known) set of $n$ 3D points $\{(X_i, Y_i, Z_i, 1) : 0 < i < n\}$ and their (known) corresponding 2D projections $\{(x_i, y_i) : 0 < i < n\}$. Note that we'll assume here that the 3D points are normalized, i.e., $W_i=1$.
How many 3D-2D point correspondences do you need to compute the entries of $P$?
Give the first two rows of the $A$ matrix in the least squares system $Ax = 0$ that you'd solve to find the location of a 3D point $[X, Y, Z, W]$ given a set of $m$ camera matrices $P_{1\ldots m}$ and corresponding observed 2D locations $\{(x_i, y_i) : 0 < i < m\}$. Note here that we're not assuming $W = 1$; this is not requried, but should help with numerical stability.
How many cameras (and corresponding 2D point locations) do you need to compute the location of $X, Y, Z, W$?