CSCI 497P/597P Project 3: Stereo

Scott Wehrwein

Spring 2020

Assigned: Monday, May 18, 2020

Deadline: Tuesday, May 26, 2020

This assignment is to be done individually.

Overview

In this assignment, you’ll implement the two-view plane sweep stereo algorithm. Given two calibrated images of the same scene, but taken from different viewpoints, your task is to recover a rough depth map.

Setup

Skeleton Code

In the Project 3 assignment on Canvas, you will find a GitHub Classroom invitation link. Click this link to accept the Project 3 assignment invitation and create your personal repository for this project. Your repository already contains skeleton code

Software

This project has similar software requirements to the prior projects. Two additional python packages are used:

Because these are pure python packages, you can install these in a virtual environment. Here’s my suggested approach:

# create a virtual environment in the 497_p3_env dir; reuse system-installed packages:
wehrwes@linux-07:~/497/venvtest$ python3 -m venv 497_p3_env --system-site-packages

# activate the virtual environment:
wehrwes@linux-07:~/497/venvtest$ source 497_p3_env/bin/activate

# notice that my shell prompt now has (497_p3_env) prepended to it; install packages:
(497_p3_env) wehrwes@linux-07:~/497/venvtest$ pip install nose imageio
Collecting nose
  Using cached https://files.pythonhosted.org/packages/15/d8/dd071918c040f50fa1cf80da16423af51ff8ce4a0f2399b7bf8de45ac3d9/nose-1.3.7-py3-none-any.whl
Collecting imageio
  Using cached https://files.pythonhosted.org/packages/4c/2b/9dd19644f871b10f7e32eb2dbd6b45149c350b4d5f2893e091b882e03ab7/imageio-2.8.0-py3-none-any.whl
Requirement already satisfied: numpy in /usr/lib/python3/dist-packages (from imageio)
Requirement already satisfied: pillow in /usr/lib/python3/dist-packages (from imageio)
Installing collected packages: nose, imageio
Successfully installed imageio-2.8.0 nose-1.3.7
# all set up! go ahead and work on your project

# when done working, deactivate the environment to go back to real life:
(497_p3_env) wehrwes@linux-07:~/497/venvtest$ deactivate
wehrwes@linux-07:~/497/venvtest$

To view the results of turning your depth maps into 3D meshes, you can install a mesh viewer such as Meshlab. I recommend installing this locally and copying meshes to your home machine even if you’re developing and running your code in the lab environment.

Data

The input data is large enough that it was inadvisable to include it in the github repository. Go into the data directory and run download.sh to download the required datasets (tentacle is hosted locally, while the remaining datasets are downloaded from the Middlebury Stereo page). You can also download and extract these datasets in a web browser and extract them into the input directory (for tentacle) or data (for all others). You can also uncomment other lines to download and try out additional datasets if you’d like.

Preview

When finished, you’ll be able to run

python plane_sweep_stereo.py <dataset>

where dataset is one of ('tentacle', 'Adirondack', 'Backpack', 'Bicycle1', 'Cable', 'Classroom1', 'Couch', 'Flowers', 'Jadeplant', 'Mask', 'Motorcycle', 'Piano', 'Pipes', 'Playroom', 'Playtable', 'Recycle', 'Shelves', 'Shopvac', 'Sticks', 'Storage', 'Sword1', 'Sword2', 'Umbrella', 'Vintage'). Keep in mind except for tentacle, you’ll need to use data/download.sh to download all other datasets before running your code on them.

For example, if you use the tentacle dataset

python plane_sweep_stereo.py tentacle

the output will be in output/tentacle_{ncc.png,ncc.gif,depth.npy,projected.gif}.

The following illustrates the two input views for the tentacle dataset:

The outputs of the plane-sweep stereo for the tentacle dataset should look like this:

The first animated gif is tentacle_projected.gif, which shows each rendering of the scene as a planar proxy is swept away from the camera.

For this project, we use Normalized Cross Correlation (NCC) measure for matching scores. The second animated gif is tentacle_ncc.gif, which shows slices of the NCC cost volume where each frame corresponds to a single depth. White is high NCC and black is low NCC.

The last image shows the correct depth output tentacle_ncc.png for the tentacle dataset, which is computed from the argmax depth according to the NCC cost volume. White is near and black is far.

Tasks

Most of the code you will implement is in student.py, with the exception of the last task, which is to complete the main body of the planesweep loop in plane_sweep_stereo.py. It’s recommeded that you start by taking a look through the well-commented plane_sweep_stereo.py to get an idea of where these functions fit in. The functions to be implemented have detailed specifications - see those for details of what you need to do.

  1. Implement project_impl. This projects 3D points into a camera given its extrinsic and intrinsic calibration matrices.

  2. Implement unproject_corners_impl. This un-projects the corners of an image out into the scene to a distance depth from the camera and returns their world coordinates.

  3. Complete the implementation of preprocess_ncc_impl. This prepares an image for NCC by building an array of size h x w x c * ncc_size * ncc_size, where the final dimension contains the normalized RGB values of all pixels in a c * ncc_size * ncc_size patch, unrolled into a vector. In other words, if the input is I and the output is A, then A[i,j,:] contains the normalized pixel values of the patch centered at I[i,j]. See the method spec for more details.

    You have been given vectorized code that extracts the raw pixel values and builds an array of size (h, w, c, ncc_size, ncc_size). This uses a similar approach to what you did in Project 1 to accelerate cross-correlation, where it loops over the patch dimensions and fills in, e.g., the top-left pixel in all patches in one sliced assignment. Your job is to subtract the per-channel patch mean and divide each patch by the whole patch’s (not-per-channel) vector norm.

    Potentially helpful features:

  4. Implement compute_ncc_impl. This takes two images that have been preprocessed as above and returns an h x w image of NCC scores at each pixel.

  5. Fill in each (TODO) line in plane_sweep_stereo.py to complete the overall plane sweep stereo algorithm. This is mostly a matter of making calls to the functions you’ve already implemented.

Testing

You are provided with some test cases in tests.py. Feel free to run these with python tests.py to help you with debugging. There are unit tests for all the functions you write, but not for the main program. You can, however, check that your output on tentacle matches the results shown above.

If the code is running slowly while you’re debugging, you can speed things up by downsampling the datasets further, or computing fewer depth layers. In dataset.py, modify:

 self.depth_layers = 128 

to change the number of depth hypotheses, or

 self.stereo_downscale_factor = 4 

to change the downsampling factor applied to the Middlebury datasets. The output image will be of dimensions (height / 2^stereo_downscale_factor, width / 2^stereo_downscale_factor).

Efficiency

We’ve configured the tentacle dataset such that it takes about 0.5-100 seconds to compute depending on your implementation. Because we’re using opencv to warp compute homographies and warp images, the main bottleneck will likely be preprocess_ncc. Some tips:

Mesh Reconstruction

There are no tasks for you to complete for this part, but the machinery is there and you’re welcome to try it out. Once you’ve computed depth for a scene, you can create a 3D model out of the depth map as follows:

python combine.py <dataset> depth

You can open up output/<dataset>_depth.ply in Meshlab to see your own result for any of the datasets. Here’s the mesh from my tentacle result:

You may need to fiddle with settings to get the colors to show - try un-toggling the triangley-cylinder button two buttons right from the wireframe cube at the top of the screen.

597P / Extra Credit

597P students must complete at least one of the following extensions. 497P students may complete one or more of these for extra credit.

Disclaimer: I haven’t done these myself (at least not in this codebase and/or not in a long time). If you run into problems, don’t assume it’s because you’re missing something - I have not anticipated every obstacle you might face. Talk to me and we’ll come up with a plan.

  1. Stereo evaluation: the Middlebury datasets come with ground truth disparity maps - see the webpage for details on the file formats, etc (pfm images should be readable with imageio). You’ll need to handle the conversion between depth and disparity, handle the different image sizes, and decide what metrics to use to measure accuracy. See Section 5 of Scharstein and Szeliski’s 2001 IJCV paper for some ideas on metrics.
  2. Rectified stereo: Implement stereo rectification and compute the cost volume in the traditional order (for each pixel, for each desparity). The datasets come with calibration information that is loaded in by the code in datasets.py - feel free to use this.
  3. Better stereo: find ways to make your stereo algorithm perform better (combining this with (1), you can measure quantiative improvement; otherwise, you can evaluate qualitatively by looking at the results side-by-side). Some ideas include processing the cost volume in some way before doing the argmax, or using a different similarity metric from NCC. Try out some ideas and see if you can get better results. Feel free to look in the literature for inspiration - once again, A Taxonomy and Evaluation of Dense Two-FrameStereo Correspondence Algorithms is a great place to start for a review of what people have done in the (now somewhat distant) past. Trying out ideas is the goal here - you don’t need to show a big improvement to get full credit.
  4. 8-point algorithm: Implement the 8-point algorithm to estimate the fundamental matrix (see wikipedia). You don’t need to get into estimating \(R\) and \(t\) (though if you’re feeling ambitious, go ahead!). Find some way to validate your algorithm (careful - this may be the hard part!). Compare this matrix to the “true” fundamental matrix computed from the camera calibration parameters. You may need to give some thought to the best way to compare these matrices - something like SSD on the matrix elements is likely not linearly related to any meaningful notion of geometric similarity. You can use OpenCV’s feature matching pipeline as we did in Project 2.

You can also propose your own extensions - feel free to run your ideas by me.

To get credit for your extension(s), you must:

Submission

  1. Generate results for the tentacle dataset and the Flowers dataset, and commit them to your repository. You don’t need to submit .ply files.
  2. If you implemented any extensions, make sure your readme is committed to the repository and contains all the materials outlined above.
  3. Push your final changes to your repository to github before the deadline.
  4. Fill out the P3 Survey on Canvas. The submission time of your survey will be considered the submission time of your code when determining slip days. If working in groups, each partner must fill out the survey separately, and the later submission will be the submission time.

Rubric

Points are awarded for correctness and efficiency, and deducted for issues with clarity or submission mechanics.

Correctness (23 points)
Unit tests (23 points) Correctness as determined by tests.py
Stereo output (2 points) Output on tentacle and Flowers
Efficiency (4 points)
4 points python plane_sweep_stereo.py tentacle runs in under 30 seconds
P3 Survey (1 point) P3 Survey is filled out (by both team members, if applicable)
Extensions (597P)
(10 points)
At least one extension is implemented, analyzed, and documented thoroughly.

Clarity Deductions for poor coding style may be made. Please see the syllabus for general coding guidelines. Up to two points may be deducted for each of the following:

Acknowledgements

This assignment is based on versions developed and refined by Kavita Bala, Noah Snavely, and countless underappreciated TAs.