CSCI 476/576 Project 3: Stereo

Scott Wehrwein

Winter 2024

Overview

In this assignment, you’ll implement the two-view plane sweep stereo algorithm. Given two calibrated images of the same scene, but taken from different viewpoints, your task is to recover a rough depth map.

Dates

Assigned: Tuesday, Feb 27th, 2024

Deadline: Tuesday, March 5th, 2024

Teamwork

You are required to complete this assignment in groups of two. To ensure everyone is paired up in a timely fashion, please complete the following steps by the start of class on Thursday, 2/29:

Find your partner. The break during Tuesday’s class would be a good time to pair up!
The first member of the pair to accept the Github Classroom invite should create a new team (it doesn’t matter which group member does this). Your team name should be the WWU usernames of the two teammates, ordered lexicographically, and separated by an underscore. For example, if Shri Mare and I were working on this project together, our team would be named mares_wehrwes.
The second member of the pair to accept the Github Classroom invite should find the team created by the first member and join it.

A couple other relevant policies:

You may only pair up with someone in your section (e.g., 476 students must work with 476 students).
To use slip days, both members must spend a slip day.

Setup

Skeleton Code

In the Project 3 assignment on Canvas, you will find a GitHub Classroom invitation link. Click this link to accept the Project 3 assignment invitation and create your personal repository for this project. Your repository already contains skeleton code

Software

This project has similar software requirements to the prior projects. If you don’t already have it, this project uses the imageio package, which you can install to your virtual environment with pip install imageio.

To view the results of turning your depth maps into 3D meshes, you can use a mesh viewer such as Meshlab. This is installed on the lab machines; if you’re developing remotely on lab machines, I’d recommend installing this locally and copying meshes to your local machine since Meshlab will have a lot of latency if running it the network.

Data

The input data is large enough that it was inadvisable to include it in the github repository. Go into the data directory and run download.sh to download the required datasets (tentacle is hosted locally, while the remaining datasets are downloaded from the Middlebury Stereo page). You can also uncomment other lines to download and try out additional datasets if you’d like.

Alternatively, you can also download these datasets in a web browser and extract them into the input directory (for tentacle) or data (for all others). Here’s the direct link to the listing of Middlebury dataset zip files: https://vision.middlebury.edu/stereo/data/scenes2014/zip/.

Preview

When finished, you’ll be able to run

python plane_sweep_stereo.py <dataset>

where dataset is one of ('tentacle', 'Adirondack', 'Backpack', 'Bicycle1', 'Cable', 'Classroom1', 'Couch', 'Flowers', 'Jadeplant', 'Mask', 'Motorcycle', 'Piano', 'Pipes', 'Playroom', 'Playtable', 'Recycle', 'Shelves', 'Shopvac', 'Sticks', 'Storage', 'Sword1', 'Sword2', 'Umbrella', 'Vintage'). Keep in mind except for tentacle and Flowers, you’ll need to modify data/download.sh to download any other datasets before running your code on them.

For example, if you use the tentacle dataset

python plane_sweep_stereo.py tentacle

the output will be in output/tentacle_{ncc.png,ncc.gif,depth.npy,projected.gif}.

The following illustrates the two input views for the tentacle dataset:

The outputs of the plane-sweep stereo for the tentacle dataset should look like this:

The first animated gif is tentacle_projected.gif, which shows each rendering of the scene as a planar proxy is swept away from the camera.

For this project, we use Normalized Cross Correlation (NCC) measure for matching scores. The second animated gif is tentacle_ncc.gif, which shows slices of the NCC cost volume where each frame corresponds to a single depth. White is high NCC and black is low NCC.

The last image shows the correct depth output tentacle_ncc.png for the tentacle dataset, which is computed from the argmax depth according to the NCC cost volume. White is near and black is far.

Tasks

Most of the code you will implement is in student.py, with the exception of the last task, which is to complete the main body of the planesweep loop in plane_sweep_stereo.py. It’s recommeded that you start by taking a look through the well-commented plane_sweep_stereo.py to get an idea of where these functions fit in. The functions to be implemented have detailed specifications - see those for details of what you need to do.

Implement project_impl. This projects 3D points into a camera given its extrinsic and intrinsic calibration matrices.
Implement unproject_corners_impl. This un-projects the corners of an image out into the scene to a distance depth from the camera and returns their world coordinates.
Complete the implementation of preprocess_ncc_impl. This prepares an image for NCC by building an array of size h x w x c * ncc_size * ncc_size, where the final dimension contains the normalized RGB values of all pixels in a c * ncc_size * ncc_size patch, unrolled into a vector. In other words, if the input is I and the output is A, then A[i,j,:] contains the normalized pixel values of the patch centered at I[i,j]. See the method spec for more details.

You have been given vectorized code that extracts the raw pixel values and builds an array of size (h, w, c, ncc_size, ncc_size). This uses a similar approach to what you did in Project 1 to accelerate cross-correlation, where it loops over the patch dimensions and fills in, e.g., the top-left pixel in all patches in one sliced assignment. Your job is to subtract the per-channel patch mean and divide each patch by the whole patch’s (not-per-channel) vector norm.

Potentially helpful features:
- np.mean, and particularly its axis argument
- The reshape method of array objects
- np.linalg.norm (can take an axis)
- Boolean array indexing, as in A[A>4] = A[A>4] + 1
Implement compute_ncc_impl. This takes two images that have been preprocessed as above and returns an h x w image of NCC scores at each pixel.
Fill in each (TODO) line in plane_sweep_stereo.py to complete the overall plane sweep stereo algorithm. This is mostly a matter of making calls to the functions you’ve already implemented.

Testing

You are provided with some test cases in tests.py. Feel free to run these with python tests.py to help you with debugging. There are unit tests for all the functions you write, but not for the main program. You can, however, check that your output on tentacle matches the results shown above.

If the code is running slowly while you’re debugging, you can speed things up by downsampling the datasets further, or computing fewer depth layers. In dataset.py, modify:

 self.depth_layers = 128

to change the number of depth hypotheses, or

 self.stereo_downscale_factor = 4

to change the downsampling factor applied to the Middlebury datasets. The output image will be of dimensions (height / 2^stereo_downscale_factor, width / 2^stereo_downscale_factor).

Efficiency

We’ve configured the tentacle dataset such that it takes about 0.5-100 seconds to compute depending on your implementation. Because we’re using opencv to warp compute homographies and warp images, the main bottleneck will likely be preprocess_ncc. Some tips:

The tricky part - collecting patches into a single array - is done for you; the approach is similar to that used in Project 1.
Avoid loops: do the mean subtraction and normalization using array operations.
Try to make your life as easy as possible by strategically reshaping your array in ways that make the current step as easy as possible to write and comprehend.
Make use of the axis keyword argument of numpy functions. For example, you can sum an array along only one axis, or along more than one axis if you provide a tuple.
You can introduce singleton dimensions by slicing and adding np.newaxis: if A.shape == (3, 4) then A[np.newaxis, :, :].shape = (1, 3, 4).

If two array operands have dimensions that match except for a singleton, the singleton dimension will be broadcast to match the non-singleton:

A = np.zeros((5,4,3))
B = np.sum(A, axis=2) # has shape (5, 4)
C = A + B # error, dimension mismatch
C = A + B[:,:,np.newaxis] # B is auto-replicated across the channel dimension

Mesh Reconstruction

There are no tasks for you to complete for this part, but the machinery is there and you’re welcome to try it out. Once you’ve computed depth for a scene, you can create a 3D model out of the depth map as follows:

python combine.py <dataset> depth

You can open up output/<dataset>_depth.ply in Meshlab to see your own result for any of the datasets. Here’s the mesh from my tentacle result:

You may need to fiddle with settings to get the colors to show - try un-toggling the triangley-cylinder button two buttons right from the wireframe cube at the top of the screen.

Extra Credit Extensions

The following extensions can be completed for modest (up to 5 points) amount of extra credit. Each extra credit point is exponentially harder to earn.

Direct and incremental homography construction: We’re using corner correspondences to fit homographies, but you can also build them analytically from the camera parameters. On top of this, you can find one initial homography for the first depth and then augment it with a sequence of incremental homographies to sweep through depths; these are called “dilation” homographies. Read up on this in the original planesweep paper and/or Section 12.1.2 of Szeliski and implement this approach, eliminating the need for OpenCV’s computeHomography.
Stereo evaluation: the Middlebury datasets come with ground truth disparity maps - see the webpage for details on the file formats, etc (pfm images should be readable with imageio). You’ll need to handle the conversion between depth and disparity, handle the different image sizes, and decide what metrics to use to measure accuracy. See Section 5 of Scharstein and Szeliski’s 2001 IJCV paper for some ideas on metrics.
Rectified stereo: Implement stereo rectification and compute the cost volume in the traditional order (for each pixel, for each desparity). The datasets come with calibration information that is loaded in by the code in datasets.py - feel free to use this.
Better stereo: find ways to make your stereo algorithm perform better (combining this with (1), you can measure quantiative improvement; otherwise, you can evaluate qualitatively by looking at the results side-by-side). Some ideas include processing the cost volume in some way before doing the argmax, or using a different similarity metric from NCC. Try out some ideas and see if you can get better results. Feel free to look in the literature for inspiration - once again, A Taxonomy and Evaluation of Dense Two-FrameStereo Correspondence Algorithms is a great place to start for a review of what people have done in the (now somewhat distant) past. Trying out ideas is the goal here - you don’t need to show a big improvement to get full credit.
8-point algorithm: Implement the 8-point algorithm to estimate the fundamental matrix (see wikipedia). You don’t need to get into estimating \(R\) and \(t\) (though if you’re feeling ambitious, go ahead!). Find some way to validate your algorithm (careful - this may be the hard part!). Compare this matrix to the “true” fundamental matrix computed from the camera calibration parameters. You may need to give some thought to the best way to compare these matrices - something like SSD on the matrix elements is likely not linearly related to any meaningful notion of geometric similarity. You can use OpenCV’s feature matching pipeline as we did in Project 2.

You can also propose your own extensions - feel free to run your ideas by me.

To get credit for your extension(s), you must:

Write your code in a separate file(s) (but feel free to import functions from the base assignment). The base assignment code should run as specified without modification.
Include a readme.txt, readme.pdf, or readme.html file in your repository’s base directory containing:
a brief description of what enhancement(s) you completed
a brief description of your code layout/architecture any design decisions or assumptions you made
outputs and/or analysis resulting from your extensions. For example, these might include:
- a table with at least a handful of performance metrics calculated for at least a handful of datasets
- depth maps produced by rectified stereo (or your improved stereo algorithm) compared side-by-side with planesweep results
- analysis comparing your estimated fundamental matrices to the “ground truth” ones
instructions for running your code to generate the results in your writeup

Submission

Generate results for the tentacle dataset and the Flowers dataset, and commit them to your repository. You don’t need to submit .ply files.
If you implemented any extensions, make sure your readme is committed to the repository and contains all the materials outlined above.
Push your final changes to your repository to github before the deadline.
Fill out the P3 Survey on Canvas. The submission time of your survey will be considered the submission time of your code when determining slip days. Each partner must fill out the survey separately, and the later submission will be the submission time.

Rubric

Points are awarded for correctness and efficiency, and deducted for issues with clarity or submission mechanics.

Correctness (45 points)
Unit tests (35 points)	Correctness as determined by `tests.py` (`score = ceil(n_passed*1.5`))
Stereo output (10 points)	Output on `tentacle` and `Flowers`
Efficiency (5 points)
5 points	`python plane_sweep_stereo.py tentacle` runs in under 30 seconds

Clarity Deductions for poor coding style may be made. Please see the syllabus for general coding guidelines. Points may be deducted for any of the following:

Methods should be written as concisely and clearly as possible
Methods should not be too long - use helper methods to break code into sensible subroutines
Code should not be cryptic and terse - explain nontrivial blocks with comments
Methods you introduce should be accompanied by a precise specification
Variable and function names should be informative but not overly verbose

Acknowledgements

This assignment is based on versions developed and refined by Kavita Bala, Noah Snavely, and numerous underappreciated TAs.