CSCI 497P/597P - Homework 2

Spring 2020

Complete the following problems and submit your solutions to the HW1 assignment on Canvas. For all questions, justify your answers either or showing your work or giving a brief explanation. Please typeset your solutions using latex or similar¹; you may include neatly hand-drawn figures so long as the scan quality is good. If you feel that typesetting your answers is a major burden, please email me I will try to help or make alternative arrangements. You may work with your classmates on these problems, but you must write up your own solutions individually without using notes or photos made during your collaborative discussions. This is largely new material, so please let me know if you think anything is unclear or ambiguous and I’ll make corrections or clarifications as needed.

Let’s investigate the behavior of the Harris corner detector on the following three image patches: \[ \begin{bmatrix} 2 & 2 & 2\\ 2 & 2 & 2\\ 0 & 0 & 0 \end{bmatrix} \begin{bmatrix} 0 & 2 & 2\\ 0 & 2 & 2\\ 0 & 0 & 0 \end{bmatrix} \begin{bmatrix} 2 & 2 & 2\\ 0 & 2 & 2\\ 0 & 0 & 2 \end{bmatrix} \]
1. Compute the structure tensor for each of the above patches. I have it on good authority that these images are noise-free, so we can safely skip the Sobel filter and compute gradients using 3x1 and 1x3 centered finite difference filters and repeat padding.
2. Using software of your choice (np.linalg.eigvals is my choice), compute the smallest eigenvalue of each structure tensor.
A homography transformation matrix is usually written as \[ \begin{bmatrix} a & b & c \\ d & e & f \\ g & h & 1 \\ \end{bmatrix} \] I stated without proof in class that because this matrix acts on homogeneous coordinates, the 9th entry of the matrix does not add a degree of freedom; in other words, for any matrix

\[ H = \begin{bmatrix} a & b & c \\ d & e & f \\ g & h & k \\ \end{bmatrix} \] there is a homography \[ H' = \begin{bmatrix} a' & b' & c' \\ d' & e' & f' \\ g' & h' & 1 \\ \end{bmatrix} \] that has the equivalent effect on 2D homoogeneous points. Prove that this is the case.
In this problem, we’ll figure out how to extract a patch around a detected keypoint in order to compute a feature descriptor for the keypoint. The goal is as follows: fill an 8x8 square window of pixels with resampled image content from an 40x40 window centered around pixel coordinates \((x_f,y{_f})\) and oriented at an angle \(\theta_f\) in an image.

We’ll do this by composing a series of transformation matrices that, together, will transform points from the image coordinate system to the feature descriptor’s coordinate system. Then, we’ll use inverse warping to look up the pixel values that belong in the descriptor. We’ll start by building the transformation matrix in four steps, then combine the individual transformations and use inverse warping to fill in the pixels. The sequence of transformations (parts 1–4) is depicted here:
1. Build a transformation \(T_{T1}\) that translates the image such that \((x_f,y_f)\) is at the origin.
2. Build a transformation \(T_{R}\) that rotates the image so that the direction pointing at the angle \(\theta_{f}\) will point at an angle of \(0\).
3. Build a transformation \(T_{S}\) that scales the image by a factor of \(\frac{1}{5}\) to map the 40x40 window to 8x8.
4. Build a transformatino \(T_{T2}\) that translates the image so that the point that originated at \((x_f,y_f)\) appears in the center of the 8x8 window. Recall that pixel values “live” at the center of their pixel grid cells; notice that the specific position of the origin in the Step 4 image is such that \((0,0)\) is at the center of the bottom left pixel.
5. In terms of the above matrices (refer to them symbolically using the subscripted \(T\) names; do not use their individual entries), build a transformation \(T\) that maps image pixel coordinates to descriptor pixel coordinates.
6. Write pseudocode for the following procedure that fills in the 8x8 array of descriptor pixels, given the image and the transformation you derived. You can assume you have access to numpy functionality as well as a function bilerp(A, x, y) that uses bilinear interpolation to compute a value at floating-point coordinates \((x,y)\) in array A. To keep things simple, assume that img and desc can be indexed like img[x,y] with \((x,y)\) coordinates rather than standard numpy \(ij\) indexing. Hint: think of this as an image warping problem with a peculiarly small choice of output image size.
```
def extract_descriptor(T, img, desc):
   """ Fill in desc (8x8x3) with pixel values from img (HxWx3) given the
       transformation T (3x3) derived in the prior parts of this problem. """
```

As with HW1, you can download the markdown source for this document here.↩