Lecture 4¶
Announcements¶
Faculty candidate talks coming up this week!
- Michael Walker (VR/AR for Human Robot Interaction)
- Today (Tuesday 1/23 4pm CF 316 Teaching Demo: Motion Planning for Mobile Robots
- TszChiu Au (Multirobot Systems, from a seemingly mathy angle)
- Thursday 1/25 4pm CF 105: Research Talk
- Friday 1/26 4pm CF 316: Teaching Demo
- Michael Walker (VR/AR for Human Robot Interaction)
Calendar adjustments have been made due to the snow day. Most things are just 1 class day later.
- The midterm exam has been pushed back from Tuesday 2/13 to Thursday 2/15.
- I may (no promises) try to compress the topic schedule a little for the few days after the midterm to give more time for the projects and advanced ML topics at the end.
I'm contemplating making the second half of Thursday a mini-lab. Who can bring a laptop (and have the lecture notebook repo cloned, jupyter set up, etc. ahead of class)?
Goals¶
- Know how and why to construct a Gaussian Pyramid
- Know how and why to construct a Laplacian Pyramid
- Be prepared to complete Project 1
(as time allows)
- Understand the high-level steps in a panorama stitching system
- Understand the motivation and high-level steps in image matching: feature detection, description, and matching
- Be able to describe examples of global and local image features.
(if we have time?) Introduce our running long-term goal of building a panorama stitching system
- Overview
- Feature detection and matching
# boilerplate setup
%load_ext autoreload
%autoreload 2
%matplotlib inline
import os
import sys
src_path = os.path.abspath("../src")
if (src_path not in sys.path):
sys.path.insert(0, src_path)
# Library imports
import numpy as np
import imageio.v3 as imageio
import matplotlib.pyplot as plt
import skimage as skim
import cv2
# codebase imports
import util
import filtering
Edge Detection: Revisited¶
beans = imageio.imread("../data/beans.jpg")
beans_small = skim.transform.rescale(beans, 0.25, anti_aliasing=True) # smaller beans
bg = skim.color.rgb2gray(beans) # grayscale beans
bn = bg + np.random.randn(*bg.shape) * 0.05 # grayscale noisy beans
plt.imshow(bn, cmap="gray")
<matplotlib.image.AxesImage at 0x7fef6292e0d0>
# take a look at filtering.grad_mag
beans_gradient = filtering.grad_mag(bn)
util.imshow_gray(beans_gradient)
If we fed this into Canny or somesuch, would it give us "good" edges?
Try zooming in a bit on the edges of her head. You'd think those would be important edges...
How strong is the gradient magnitude there?
util.imshow_gray(beans_gradient[200:350, 50:200])
Issue: what we conceptualize as edges exist at different spatial scales (or frequencies!)
bn_small = filtering.down_4x(bn)
plt.imshow(filtering.grad_mag(bn_small))
<matplotlib.image.AxesImage at 0x7fef5e5d9c10>
util.imshow_gray(cv2.Canny((beans_small * 255).astype(np.uint8), 150, 230))
So you want to take a derivative at a different scale... how should we go about it?
- Scale the image down
- Use a bigger filter
Homework Problem 1¶
Suppose you want to detect edges at a 2x reduced spatial scale. You have two choices: double the size of your gradient filters, or halve the size of your image (in both height and width). Calculate and compare the number of multiplications required per input pixel to perform just the filtering steps in each of these approaches. Assume that your downsampling prefilter is 5x5 and a "double size" sobel filter would end up being 7x7. Which approach will be more efficient if you want to detect edges at multiple reduced spatial scales?
For now: which of these do you think will be more efficient?
Gaussian Pyramid¶
Idea: recursively downsample (recall: blur-then-subsample!) to create a multi-scale image pyramid.
G = [bg]
levels = 6
for i in range(levels-1):
G.append(filtering.down_2x(G[-1]))
print(len(G))
6
# fig, axs = plt.subplots(1, len(G))
for i, lvl in enumerate(G):
util.imshow_truesize(lvl)
Frequency Content in the Gaussian Pyramid¶
Each level contains a subset of the original image's frequencies, with the cutoff getting lower at each level.
Frequencyometer illustration:
for i, lvl in enumerate(G):
fig = plt.figure(figsize=(3, 5))
fig.gca().set_axis_off()
util.imshow_gray(filtering.grad_mag(lvl))
Homework Problem 2¶
Suppose you have a 128x128 image, and you compute a full Gaussian pyramid (i.e., every possible level down to a 1x1 image). What multiple of the original image storage is required to store the entire pyramid?
Laplacian Pyramids¶
Thinking back to sharpening: what is lost from one level of the pyramid to the next?
$G_{i+1} = \textrm{subsample}(\textrm{blur}(G_i))$
Each level of a Laplacian pyramid captures this:
$$ \begin{align*} L_i &= G_i - \textrm{blur}(G_i) \textrm{ if } i < k\\ L_k &= G_k \end{align*} $$
where $k$ is index of the highest level (number of levels minus one, since we've zero-indexed).
Ponder: why should we have the special case for the last (highest, lowest-resolution) level?
from filtering_future import construct_laplacian # you'll write this yourself in P1!
L, G = construct_laplacian(filtering.down_2x(bg))
# fig, axs = plt.subplots(1, len(G))
for i, lvl in enumerate(L):
util.imshow_truesize(lvl)
Frequency Content in the Laplacian Pyramid¶
Frequencyometer illustration:
for i, lvl in enumerate(L):
fig = plt.figure(figsize=(3, 5))
fig.gca().set_axis_off()
util.imshow_gray(lvl)
Homework Problems 3-5¶
Suppose you are given the Laplacian and Gaussian pyramids for an input image $I.$ $G_{0 \ldots k}$ are the Gaussian pyramid levels starting at the original image ($G_0$), while the Laplacian layers $L_{0\ldots k}$ are the "detail" layers, with each $L_i$ at the same resolution as $G_i$.
(3) When is it the case that $G_\ell = L_\ell$?
(4) Given all levels of both pyramids, give an expression that yields a result as close as possible to $G_j$ with a sharpening filter applied. You don't need to actually do any filtering.
(5) Give an algorithm to reconstruct $G_0$ using only the levels of $L$.
util.imshow_gray(G[-3])
util.imshow_gray(filtering.separable_filter(G[-3], filtering.gauss1d5))
Project 1 Demo¶
python hybrid_gui.py -t resources/sample-correspondence.json -c resources/sample-config.json
python laplacian_gui.py --image resources/beans.jpg --levels 4
python local_laplacian_gui.py --image resources/flower-square.png
Tips on numpy efficiency¶
- Loops are slow. Get rid of them wherever possible!
- Example: remove 2 for loops from filtering.filter
- Full efficiency points on P1 requires removing all loops over the image pixels!
- General advice: batch as many operations that are happening anyway into a single operation.
- There is usually a numpy function to do that thing you're trying to do.
The high-level steps for panorama stitching are as follows - see the slides for more detail and visuals.
- Figure out the geometric relationship between images
- Align images to each other
- Blend them together
Starting with step 1, we have a few sub-steps:
Figure out the geometric relationship between images
(a) Identify matching points in neighboring images
(b) Model (and estimate) the geometric mapping from one point set to another
Step 1 (a) itself has some sub-steps:
- (a) Identify matching points in neighboring images
- Find points that would be good to match
- Extract a "descriptor" that captures local image information
- Find matching pairs of features using their descriptors.
... and this brings us to our starting point for next class: what features should we try to match?