CSCI 476/576: Project 2: Autostitch

Scott Wehrwein

Overview and Logistics

Dates

Assigned: Thursday, February 8th
Code Due: Wednesday, February 21st at 10pm (via Github)
Artifact Due: Thursday, February 22nd at 10pm (via Canvas)

Teamwork

You may work on this assignment solo or in groups of two. If you would like to work in a pair, you need to complete the following steps:

Find your partner.
The first member of the pair to accept the Github Classroom invite should create a new team (it doesn’t matter which group member does this). Your team name should be the WWU usernames of the two teammates, ordered lexicographically, and separated by an underscore. For example, if Filip Jagodzinski and I were working on this project together, our team would be named jagodzf_wehrwes.
The second member of the pair to accept the Github Classroom invite should find the team created by the first member and join it.

A couple other relevant policies:

You may only pair up with someone in your section (e.g., 476 students must work with 476 students).
To use slip days, both members must spend a slip day.
Code submissions are done jointly, but each group member will submit their own individual artifact.

Synopsis

In this project, you will implement a system to combine a series of horizontally overlapping photographs into a single panoramic image. We’ll use the built-in ORB feature detector and descriptor from the opencv library; 576 students will also implement an alternate homegrown feature matching pipeline. Given the feature correspnodences, you will automatically align the photographs (determine their overlap and relative positions) using RANSAC to find an outlier-robust motion model and then blend the resulting images into a single seamless panorama.

You are provided with a GUI that lets you test and visualize the functionality and intermediate results of the various statges of the pipeline that ultimately produces the final panorama output. We have also provided you with some test images and unit tests to help you debug.

The high-level steps required to create a panorama are listed below. 476 students will implement panorama stitching using translation and homography motion models, while 576 students will additionally implement a translational model with spherically-warped input images, which allows for full 360-degree panoramas. The steps in square brackets are only used with the spherical warping/translational approach:

Take a sequence of photos with horizontal overlap
[Warp each image to spherical coordinates]
Extract features from each image
Match features among neighboring pairs of images
Align neighboring pairs using RANSAC
Write out list of transformations that relate each image to a single coordinate system
[Correct for drift, if the panorama is 360 degrees)]
Warp the images into the output panorama and blend them together
Crop the panorama and admire the beautiful result

Getting Started

Skeleton code is provided in the repository created by Github Classroom. The invitation link is found in the Project 2 assignment on Canvas.

Test sets: See the resources subdirectory in your repo. You will find four datasets: yosemite, campus, melbourne, and melbourne_small.

Software environment: The lab machines should have the necessary packages installed to run the project code; the software listed for Project 1 should be all you need for this project as well (please let me know if you find this not to be the case!). The GUI for this project is written using TK, much like the prior project, so remote access should work similarly - please see the Project 1 handout for links to resources, and let me know if you’re having trouble running the project remotely.

Your Tasks

(576 only) Warp each image to spherical coordinates.
- File: warp.py
- Routine: computeSphericalWarpMappings
[TODO 1 - 576 only] Compute the inverse map to warp the image by filling in the skeleton code in the computeSphericalWarpMappings routine to:
1. Convert the given spherical image coordinates into the corresponding planar image coordinates
2. Apply radial distortion using the radial distortion model described in lecture
Align neighboring pairs.
- File: alignment.py
- Routines: alignPair, getInliers, computeHomography, leastSquaresFit
The computeHomography function takes two feature sets from image 1 and image 2 (f1 and f2) and a list of feature matches (containing pairs of indices into f1 and f2) and estimates a homography from image 1 to image 2.

[TODO 2] Set up the \(A\) matrix that defines to the system \(Ax\) that computes the residuals for a given homography unrolled into a vector \(h\).

[TODO 3a] Implement minimizeAx to find the unit-length vector \(\mathbf{x}\) that minimizes \(||A\mathbf{x}||\) for a given \(A\).

[TODO 3b] Call minimizeAx on the matrix you set up in TODO 2 and use its result to fill in the 3x3 homography matrix \(H\). Don’t forget to return the homography in its normalized form, with a 1 as the bottom right entry.

[TODO 4] alignPair is where you will implement RANSAC. It takes two feature sets, f1 and f2, the list of feature matches, and a motion model, m (described below) as parameters. For this project, we support two motion models, represented by the two possible values of the enum MotionModel: eTranslate and eHomography. alignPair estimates and returns the inter-image transform matrix \(M\) as follows:
1. Randomly choose a minimal set of feature matches (one match for the case of translations, four for homographies)
2. Estimate the corresponding motion model (alignment)
3. Invoke getInliers to get the indices of inlier feature matches (i.e., indices into matches) that agree with the current motion estimate.
After repeated trials, the entire inlier set from the \(M\) with the largest number of inliers is used to compute a final least squares estimate for the motion, which is returned as the matrix M.

[TODO 5] getInliers computes the indices of the matches that have a Euclidean distance below RANSACthresh given features f1 andf2 from image 1 and image 2 and an inter-image transformation matrix from image 1 to image 2.

[TODO 6, 7] leastSquaresFit computes a least squares estimate for the translation or homography using all of the matches previously estimated as inliers. It returns the resulting translation or homography output transform M. For translation estimation, I recommend simply averaging the translations rather than taking the heavy-handed linear algebra approach. For homographies, you’ve already implemented computeHomography to do the heavy lifting.
Warp and blend the aligned image pairs into a single output image to create the final panorama.
- File: blend.py
- Routines: imageBoundingBox, blendImages, accumulateBlend, normalizeBlend
[TODO 8] imageBoundingBox: Given an image and a homography, figure out the box bounding the image after applying the homography.

[TODO 9] getAccSize: Given the warped images and their relative displacements, figure out how large the final stitched image needs to be in order to fit all the warped image. This method also augments each per-image transformation with a translation that moves the output image coordinate system into a numpy-array-friendly world where (0, 0) is at the top left.

[TODO 10] blendImages: Warp each image into the output image’s coordinate system and add its pixel content into the accumulator. You will need to use inverse warping to calculate values at integer output pixel coordinates. To allow the images to blend smoothly, use the fourth channel to represent the weight of the contribution of a pixel. Using the linear blending scheme described in lecture, the weight varies linearly from 0 to 1 from the left side of the image over a distance of blendWidth pixels, then ramps down correspondingly on the right side of the image. Other, fancier blending schemes are possible - you may experiment with some for extra credit.

TODO 10 implementation notes:
1. When working with homogeneous coordinates, don’t forget to normalize when converting them back to Cartesian coordinates.
2. Watch out for black pixels in the source image when inverse warping, especially when dealing with spherically warped images. You don’t want to include these in the accumulation.
3. When doing inverse warping, use bilinear interpolation for the source image pixels. First try to work out the code by looping over each pixel. Later you can optimize your code using array instructions and numpy tricks. My approach does vectorized bilinear interpolation using array operations; another approach uses cv2.remap to warp the image. In either case, you may find numpy.meshgrid useful. Optimizing this function is worth only a couple points, so prioritize this lowest.
[TODO 11] normalizeBlend: Having accumulated weighted pixels from all the source images, this function normalizes the image so each pixel has unit weight by dividing by the weight at each pixel. Be careful not to divide by zero. Remember to make sure the alpha (fourth) channel of the resulting panorama is opaque (1)!

[TODO 12 - 576 only] blendImages: To make a 360 panorama, you need to do a couple extra things. First, you’ll want to include the first image again at the end so you can put the seam in the middle of that image. Second, you’ll need to correct for vertical drift to make the left and right edges line up perfectly. The getDriftParams function computes the position of the top left and top right corners of the un-corrected panorama, accounting for cutting out the left half of the left image and the right half of the right image. Given these two points, build a shearing transformation that maps these top two corners to the same \(y\) value.
(576 only) The base project uses built-in ORB feature detection and description functionality from OpenCV. The GUI (gui.py) accepts a --MOPS flag; if this is set, the program should use your own custom-written feature matching pipeline. Implement functionality to detect, desribe, and match features using Harris, MOPS, and SSD+ratio (methods for this likely fit best in alignment.py, but I haven’t given you any skeleton for this). Your pipeline should follow the code we wrote in class, but should be generalized to multiple scales by running on a Gaussian pyramid. Feel free to use OpenCV’s pyrDown or import relevant code from Project 1.

[TODO 13 - 576 only] Make appropriate calls to your own feature matching functionality in gui.py in the computeMapping function to replace ORB if the--MOPS flag is set.

Using the GUI

The skeleton code that we provide comes with a graphical interface, with the module gui.py, which makes it easy for you to do the following:

Visualize a Homography: The first tab in the UI provides you a way to load an image and apply an arbitrary homography to the image. This can be useful while debugging when, for example, you want to visualize the results of both manually and programmatically generated transformation matrices.
Visualize Spherical Warping: The second tab on the UI lets you spherically warp an image with a given focal length. This will only work if you’ve implemented TODO 1 (576 only).
Align Images: The third tab lets you select two images with overlap and uses RANSAC to compute a homography or translation that maps the right image onto the left image.
Generating a Panorama: The last tab in the UI lets you generate a panorama. You will need to specify a folder with images labelled in such an order that sorting them alphabetically gives you the order the images appear on the panorama from left to right (or from right to left). This ensures that the mappings between all neighboring pairs are computed. Our current code assumes that all images in the panorama have the same width.

Debugging Guidelines

You can use the GUI visualizations to check whether your program is running correctly.

Testing the warping routines:
- In the campus test set, the camera parameters used for these examples are:
  
  f = 595 k1 = -0.15 k2 = 0.00
- In the yosemite test set, a few example warped images are provided for testing purposes. The camera parameters used for these examples are:
  
  f = 678 k1 = -0.21 k2 = 0.26
  
  See if your program produces the same output. Note that if you use Yosemite with the translation motion model, you might get slightly blurry panoramas in the blending region (as you can also see from the example results). This is because the translation model isn’t flexible enough to describe the true transformation.
Testing the alignment routines:

Note that the campus images are only suitable for the translational motion model! The yosemite images are suitable for both motion models. To test alignPair, load two images in the alignment tab of the GUI. Clicking ‘Align Images’, displays a pair, the left and right images, with the right image transformed according to the inter-image transformation matrix and overlaid over the left image. This enables visually analyzing the accuracy of the transformation matrix. Note that blending is not performed at this stage.
Testing the blending routines:

When debugging your blending routines, you may find it helpful for the sake of efficiency to use the melbourne_small dataset, which is simply a downsampled version of the Melbourne dataset. Example panoramas are included in the yosemite and the campus directories. Compare the resulting panorama with these images. Note that it’s important to use the specified f, k1, k2 parameters to get the same image. 576 students should use the 360 degree checkbox to ensure you get the same result for campus dataset.
Additional notes: If you use very high resolution images when creating yoru own panorama on a laptop, you might run into memory problems. Try running on a machine with more memory; the lab machines have 16GB RAM which should be enough for panos captured by most consumer-oriented cameras.

Artifact

Each partner must submit their own artifact via Canvas: Take a series of images with a digital camera mounted on a tripod or a handheld camera, and stitch a panorama using your code. This panorama can be either translation-aligned (360 or not, if you implemented 360 features), or aligned with homographies (your choice). For best results, overlap each image by 50% with the previous one, and keep the camera level. In order to use your camera for a spherically warped translation-aligned panorama, you have to estimate the focal length. The simplest way to do this is through the EXIF tags of the images, as described here. You may also be able to find the focal length (in mm) and sensor width by searching for your camera or phone model. Alternatively, you can use a camera calibration toolkit to get more precise focal length and radial distortion coefficients.

For inspiration, check out some of the following links:

Super high resolution panoramas at GigaPan
Fun with panorama cloning
Matt Brown’s Autostitch page.

Submission

Code Submit your code by committing and pushing your changes to Github before the deadline. If you did any extra credit, describe what you did in readme.txt.

Artifact Every student must submit their own artifact. If you are working in a pair, each group member must submit an artifact. Submit your panorama artifact to Canvas in JPG format.

Survey Each student (again, both members of a pair if applicable) must fill out the P2 Survey assignment on Canvas, where you will provide an estimate of the number of hours you spent on this assignment, as well as any other comments you have about the assignment.

Extra Credit

Here is a list of suggestions for extending the program for extra credit. You are encouraged to come up with your own extensions. We’re always interested in seeing new, unanticipated ways to use this program! Please use the –extra-credit flag in gui.py. You will need to use the args parsed in the “main method” portion of gui.py and modify the rest of the code as necessary. If we run your program without the flag, it must implement the base project.

476 Students: complete TODO 1 and TODO 12 to make the 360 panorama mode work.
Sometimes, an exposure difference between images results in brightness fluctuation in the final mosaic. Devise a way to get rid of this artifact.
Try shooting a sequence with some objects moving and appearing in more than one image and implement a scheme to stitch a panorama without “ghosted” objects.
Implement a better blending technique, e.g., pyramid blending (you might be able to reuse some P1 code for this one), poisson image blending and graph cuts.

Rubric

Your project will be graded based on the quality of the panoramas generated. An approximate point breakdown is given below. Keep in mind that later code depends on earlier code, so partial credit may be hard to assign if something early on is broken. If you’re short on time, optimize for having working code for image alignment with homographies.

Correctness:

1. computeHomography
1. alignPair, getInliers, leastSquaresFit
1. imageBoundingBox
1. blendImages and normalizeBlend

576 only:

1. Spherical warping
1. Drift correction
1. Basic Harris/MOPS/SSD+Ratio feature detection, description, and matching.
1. Feature matching pipeline operates multiscale.

Efficiency:

1. each step except accumulateBlend runs in a few seconds or less
1. accumulateBlend runs in a few seconds or less

Survey:

1. P2 Survey is completed with an estimate of hours spent on the project.

Artifact:

1. Artifact

Clarity: Deductions for poor coding style may be made. Please see the syllabus for general coding guidelines. Up to two points may be deducted for each of the following:

Methods should be written as concisely and clearly as possible
Methods should not be too long - use helper methods to break code into sensible subroutines
Code should not be cryptic and terse - explain nontrivial blocks with comments
Methods you introduce should be accompanied by a precise specification
Variable and function names should be informative but not overly verbose

Acknowledgments

Many thanks are due to those who developed and refined prior versions of this assignment, including Steve Seitz, Kavita Bala, Noah Snavely, and many underappreciated TAs.