CS 497P/597P: Computer Vision, Winter 2019
Project 3:  Autostitch

Brief

  • Assigned: TBA
  • Code (groups of 2) due: Friday, February 22 by 9:59pm (via Github)
  • Artifact (done individually) due: Saturday, February 23 by 9:59pm (via Canvas)
  • Teams: This assignment should be done in teams of 2 students.

Synopsis

In this project, you will implement a system to combine a series of horizontally overlapping photographs into a single panoramic image. We give the ORB feature detector and descriptor. You will use ORB to first detect discriminating features in the images and find the best matching features in the other images. Then, using RANSAC, you will automatically align the photographs (determine their overlap and relative positions) and then blend the resulting images into a single seamless panorama. We have provided you with a graphical interface that lets you view the results of the various intermediate steps of the process. We have also provided you with some test images and

The project will consist of a pipeline of tabs visualized through AutostichUI that will operate on images or intermediate results to produce the final panorama output.  

The steps required to create a panorama are listed below. You will be creating two ways to stitch a panorama: using translations (where you'll need to pre-spherically-warp the input images) and homographies, where you align the input images directly. The steps in square brackets are only used with the spherical warping route:

 

Step

1.

Take pictures on a tripod (or handheld)

2.

[Warp to spherical coordinates]

3.

Extract features

4.

Match features

5.

Align neighboring pairs using RANSAC

6.

Write out list of neighboring translations

7.

[Correct for drift (360 only)]

8.

Read in [warped] images and blend them

9.

Crop the result and import into a viewer

 

The lecture slides and notes from 1/29, 1/30, 2/1, 2/5, and 2/6 may be helpful in implementing various pieces of this project.

This project uses Python, Numpy, and Scipy heavily. The documentation is generally quite good and we make suggestions for which functions maybe useful. If you feel the need to brush up on your numerical python skills, see the crash-course on Python and Numpy here.

Getting Started

Skeleton code: Github classroom link is available in the Project 3 assignment on Canvas.

Test sets: See the resources subdirectory in your repo. You will find three datasets: yosemite, campus, melbourne, and melbourne_small.

Software environment:: The lab machines should have the necessary packages installed to run the project code. If you are using your own machine, you may need the following packages (should be same/similar to previous projects):

sudo apt-get install python-numpy python-scipy python-matplotlib python-imaging-tk python-tk python-opencv

Your Tasks

Do not modify the code outside the TODO blocks.
  1. Warp each image into spherical coordinates. (file: warp.py, routine: computeSphericalWarpMappings)

[TODO 1] 597P only Compute the inverse map to warp the image by filling in the skeleton code in the computeSphericalWarpMappings routine to:

    1. convert the given spherical image coordinate into the corresponding planar image coordinate using the coordinate transformation equation from the lecture notes
    2. apply radial distortion using the equation from the lecture notes

(Note: When warping images in the GUI, you will have to use the focal length f estimates for the images provided below. If you use a different image size, remember to scale f according to the image size.)

(Note 2: This step is not used when estimating homographies between images, only translations.)

  1. Compute the alignment of image pairs. (file: alignment.py, routines: alignPair, getInliers, computeHomography, and leastSquaresFit)

[TODO 2, 3] computeHomography takes two feature sets from image 1 and image 2, f1 and f2 and a list of feature matches matches and estimates a homography from image 1 to image 2.

(Note 3: In computeHomography, you will compute the best-fit homography using the Singular Value Decomposition. Let us denote transpose of A as A'. Recall from lecture that to minimize Ah = 0, subject to the constraint that h is unit length, the solution h is the eigenvector of A'A with the smallest eigenvalue. Another way of getting at the same solution is to compute the SVD of A and take the right singular vector corresponding to the smallest singular value. See the lecture notes and slides on alignment using Homographies for details. If you want to dig further into the math, the wikipedia article on the SVD may be helpful.)

[TODO 4] AlignPair is where you will implement RANSAC. It takes two feature sets, f1 and f2, the list of feature matches obtained from the feature detecting and matching component (described in the first part of the project),  a motion model, m (described below) as parameters. Then it estimates and returns the inter-image transform matrix M.   For this project, the enum MotionModel may have two possible values: eTranslate and eHomography. AlignPair uses RANSAC (RAndom SAmpling Consensus) to estimate M. First, it randomly pulls out a minimal set of feature matches (one match for the case of translations, four for homographies), estimates the corresponding motion (alignment) and then invokes getInliers to get the indices of feature matches (indexing into matches) that agree with the current motion estimate.   After repeated trials, the motion estimate with the largest number of inliers is used to compute a least squares estimate for the motion, which is then returned in the motion estimate M.

[TODO 5] getInliers computes the indices of the matches that have a Euclidean distance below RANSACthresh given features f1 and f2 from image 1 and image 2 and an inter-image transformation matrix from image 1 to image 2.  

[TODO 6, 7] LeastSquaresFit computes a least squares estimate for the translation or homography using all of the matches previously estimated as inliers.  It returns the resulting translation or homography output transform M.

  1. Stitch and crop the resulting aligned images. (file: blend.py, routines: imageBoundingBox, blendImages, accumulateBlend, normalizeBlend)

[TODO 8] Given an image and a homography, figure out the box bounding the image after applying the homography.(imageBoundingBox.)

[TODO 9] Given the warped images and their relative displacements, figure out how large the final stitched image will be and their absolute displacements in the panorama.(blendImages.)

[TODO 10] Then, resample each image to its final location (you will need to use inverse warping here) and blend it with its neighbors. Try a simple feathering function as your weighting function (see mosaics lecture slide on "feathering") (this is a simple 1-D version of the distance map described in [Szeliski & Shum]).  For extra credit, you can try other blending functions or figure out some way to compensate for exposure differences. (accumulateBlend.)

[Additional hints] 1) When working with homogeneous coordinates, don't forget to normalize when converting them back to Cartesian coordinates. 2) Watch out for black pixels in the source image when inverse warping. You don't want to include them in the accumulation. 3) When doing inverse warping, use linear interpolation for the source image pixels. 4) First try to work out the code by looping over each pixel. Later you can optimize your code using array instructions and numpy tricks (numpy.meshgrid, cv2.remap). Optimizing this section is worth only a couple points, so prioritize this lowest.

[TODO 11] Normalize the image with the accumulated weight channel. Pay attention not to divide by zero. Remember to set the alpha channel of the resulting panorama to opaque! (normalizeBlend.)

[TODO 12] 597P only In case of 360 degree panoramas, make the left and right edges have perfect seams. The horizontal extent can be computed in the previous blending routine since the first image occurs at both the left and right end of the stitched sequence (draw the "cut" line halfway through this image).  Use a linear warp to the mosaic to remove any vertical "drift" between the first and last image.  This warp, of the form y' = y + ax, should transform the y coordinates of the mosaic such that the first image has the same y-coordinate on both the left and right end.  Calculate the value of 'a' needed to perform this transformation. (blendImages)

Summary of potentially useful functions (you do not have to use any of these):
  • np.divide, np.eye, np.array, np.dot

Using the GUI

You can run the skeleton program by running,
>> python gui.py

The skeleton code that we provide comes with a graphical interface, with the module gui.py, which makes it easy for you to do the following:

  1. Visualize a Homography: The first tab in the UI provides you a way to load an image and apply an arbitrary homography to the image. This can be useful while debugging when, for example, you want to visualize the results of both manually and programmatically generated transformation matrices.
  2. Visualize Spherical Warping: The second tab on the UI lets you spherically warp an image with a given focal length.
  3. Align Images: The third tab lets you select two images with overlap and uses RANSAC to compute a homography or translation (selectable) that maps the right image onto the left image.
  4. Generating a Panorama: The last tab in the UI lets you generate a panorama. To be able to create a panorama, you need to have a folder with images labelled in such an order that sorting them alphabetically gives you the order the images appear on the panorama from left to right (or from right to left). This ensures that the mappings between all neighboring pairs are computed. Our current code assumes that all images in the panorama have the same width!

Debugging Guidelines

You can use the GUI visualizations to check whether your program is running correctly.

  1. Testing the warping routines:
    • In the campus test set, the camera parameters used for these examples are
      • f = 595
      • k1 = -0.15
      • k2 = 0.00
    • In the yosemite test set, a few example warped images are provided for test purposes. The camera parameters used for these examples are
      • f = 678
      • k1 = -0.21
      • k2 = 0.26
      See if your program produces the same output. Note that if you use these images with the translation motion model, you might get a bit blurry panoramas in the blending region (as you can also see from the panorama given by us).
  1. Testing the alignment routines:
    • Note that the campus images are only suitable for the translational motion model! The yosemite images are suitable for both motion models. To test alignPair, load two images in the alignment tab of the GUI. Clicking 'Align Images', displays a pair, the left and right images, with the right image transformed according to the inter-image transformation matrix and overlaid over the left image. This enables visually analyzing the accuracy of the transformation matrix. Note that blending is not performed at this stage.
  1. Testing the blending routines:
    • When debugging your blending routines, you may find it helpful for the sake of efficiency to use the melbourne_small dataset, which is simply a downsampled version of the Melbourne dataset. Example panoramas are included in the yosemite and the campus directories. Compare the resulting panorama with these images. Note that it's important to use the specified f, k1, k2 parameters to get the same image. 597P students: use the 360 degree checkbox to ensure you get the same result for campus dataset.
  1. Additional notes:
    • If you use high resolution images when creating you own panorama on a laptop, you might run into memory problems. Try running on a machine with more memory; the lab machines have 16GB RAM which should be enough for panos captured by most consumer-oriented cameras.

Artifact

Each partner must submit their own artifact via Canvas: Take a series of images with a digital camera mounted on a tripod or a handheld camera, and stitch a panorama using your code. This panorama can be either translation-aligned (360 or not, if you implemented 360 features), or aligned with homographies (your choice). For best results, overlap each image by 50% with the previous one, and keep the camera level. In order to use your camera for a spherically warped translation-aligned panorama, you have to estimate the focal length.  The simplest way to do this is through the EXIF tags of the images, as described here. You may also be able to find the focal length (in mm) and sensor width by searching for your camera or phone model.  Alternatively, you can use a camera calibration toolkit to get more precise focal length and radial distortion coefficients.

What to Turn In

Code Submit your code by committing and pushing your changes to Github before the deadline. Be sure to include an estimate of the hours you spent in hours.txt and if you did any extra credit, describe what you did in readme.txt.

Artifact Each group member should submit their own panorama artifact to Canvas in JPG format.

Extra Credit

Here is a list of suggestions for extending the program for extra credit. You are encouraged to come up with your own extensions. We're always interested in seeing new, unanticipated ways to use this program! Please use the --extra-credit flag in gui.py. You will need to use the args parsed in the "main method" portion of gui.py and modify the rest of the code as necessary. If we run your program without the flag, it must implement the base project.

If you complete any extra credit, include a description of what you did and any design decisions made in a your repository's readme.txt file.

Panorama Links

Rubric

Your project will be graded based on the quality of the panoramas generated. An approximate point breakdown is given below. Keep in mind that later code depends on earlier code, so partial credit may be hard to assign if something early on is broken. If you're short on time, optimize for having working code for image alignment with homographies.

Correctness:

597P only:

Efficiency:

Clarity: Deductions for poor coding style may be made. Please see the syllabus for general coding guidelines. Up to two points may be deducted for each of the following:

Submission Mechanics Up to 10 points may be deducted for problems with submission mechanics that require manual handling: for example, problems with your git repository, code that exhibits runtime errors when used via the GUI, failure to notify me of late submission, etc.

Acknowledgments

Many thanks are due to those who developed and refined prior versions of this assignment, including Steve Seitz, Kavita Bala, Noah Snavely, and many underappreciated TAs.