Fall 2020
In this assignment, you’ll implement the two-view plane sweep stereo algorithm. Given two calibrated images of the same scene, but taken from different viewpoints, your task is to recover a rough depth map.
Assigned: Wednesday, Nov 4, 2020
Deadline: Monday, Nov 16, 2020
You may work on this assignment individually or in groups of two. If you would like to work in a pair, you need to complete the following steps by the end of the day on Friday, November 6th:
myersdj_wehrwes
.A couple other relevant policies:
In the Project 3 assignment on Canvas, you will find a GitHub Classroom invitation link. Click this link to accept the Project 3 assignment invitation and create your personal repository for this project. Your repository already contains skeleton code
This project has similar software requirements to the prior projects. Two additional python packages are used:
nose
imageio
Because these are pure python packages, you can install these in a virtual environment. Here’s my suggested approach:
# create a virtual environment in the 497_p3_env dir; reuse system-installed packages:
wehrwes@linux-07:~/497/venvtest$ python3 -m venv 497_p3_env --system-site-packages
# activate the virtual environment:
wehrwes@linux-07:~/497/venvtest$ source 497_p3_env/bin/activate
# notice that my shell prompt now has (497_p3_env) prepended to it; install packages:
(497_p3_env) wehrwes@linux-07:~/497/venvtest$ pip install nose imageio
Collecting nose
Using cached https://files.pythonhosted.org/packages/15/d8/dd071918c040f50fa1cf80da16423af51ff8ce4a0f2399b7bf8de45ac3d9/nose-1.3.7-py3-none-any.whl
Collecting imageio
Using cached https://files.pythonhosted.org/packages/4c/2b/9dd19644f871b10f7e32eb2dbd6b45149c350b4d5f2893e091b882e03ab7/imageio-2.8.0-py3-none-any.whl
Requirement already satisfied: numpy in /usr/lib/python3/dist-packages (from imageio)
Requirement already satisfied: pillow in /usr/lib/python3/dist-packages (from imageio)
Installing collected packages: nose, imageio
Successfully installed imageio-2.8.0 nose-1.3.7
# all set up! go ahead and work on your project
# when done working, deactivate the environment to go back to real life:
(497_p3_env) wehrwes@linux-07:~/497/venvtest$ deactivate
wehrwes@linux-07:~/497/venvtest$
To view the results of turning your depth maps into 3D meshes, you can install a mesh viewer such as Meshlab. I recommend installing this locally and copying meshes to your home machine even if you’re developing and running your code in the lab environment.
The input data is large enough that it was inadvisable to include it in the github repository. Go into the data
directory and run download.sh
to download the required datasets (tentacle
is hosted locally, while the remaining datasets are downloaded from the Middlebury Stereo page). You can also uncomment other lines to download and try out additional datasets if you’d like.
Alternatively, you can also download these datasets in a web browser and extract them into the input
directory (for tentacle
) or data
(for all others). Here’s the direct link to the listing of Middlebury dataset zip files: https://vision.middlebury.edu/stereo/data/scenes2014/zip/.
When finished, you’ll be able to run
python plane_sweep_stereo.py <dataset>
where dataset
is one of ('tentacle', 'Adirondack', 'Backpack', 'Bicycle1', 'Cable', 'Classroom1', 'Couch', 'Flowers', 'Jadeplant', 'Mask', 'Motorcycle', 'Piano', 'Pipes', 'Playroom', 'Playtable', 'Recycle', 'Shelves', 'Shopvac', 'Sticks', 'Storage', 'Sword1', 'Sword2', 'Umbrella', 'Vintage')
. Keep in mind except for tentacle
and Flowers
, you’ll need to modify data/download.sh
to download any other datasets before running your code on them.
For example, if you use the tentacle dataset
python plane_sweep_stereo.py tentacle
the output will be in output/tentacle_{ncc.png,ncc.gif,depth.npy,projected.gif}
.
The following illustrates the two input views for the tentacle dataset:
The outputs of the plane-sweep stereo for the tentacle dataset should look like this:
The first animated gif is tentacle_projected.gif
, which shows each rendering of the scene as a planar proxy is swept away from the camera.
For this project, we use Normalized Cross Correlation (NCC) measure for matching scores. The second animated gif is tentacle_ncc.gif
, which shows slices of the NCC cost volume where each frame corresponds to a single depth. White is high NCC and black is low NCC.
The last image shows the correct depth output tentacle_ncc.png
for the tentacle dataset, which is computed from the argmax depth according to the NCC cost volume. White is near and black is far.
Most of the code you will implement is in student.py
, with the exception of the last task, which is to complete the main body of the planesweep loop in plane_sweep_stereo.py
. It’s recommeded that you start by taking a look through the well-commented plane_sweep_stereo.py
to get an idea of where these functions fit in. The functions to be implemented have detailed specifications - see those for details of what you need to do.
Implement project_impl
. This projects 3D points into a camera given its extrinsic and intrinsic calibration matrices.
Implement unproject_corners_impl
. This un-projects the corners of an image out into the scene to a distance depth
from the camera and returns their world coordinates.
Complete the implementation of preprocess_ncc_impl
. This prepares an image for NCC by building an array of size h x w x c * ncc_size * ncc_size
, where the final dimension contains the normalized RGB values of all pixels in a c * ncc_size * ncc_size
patch, unrolled into a vector. In other words, if the input is I
and the output is A
, then A[i,j,:]
contains the normalized pixel values of the patch centered at I[i,j]
. See the method spec for more details.
You have been given vectorized code that extracts the raw pixel values and builds an array of size (h, w, c, ncc_size, ncc_size)
. This uses a similar approach to what you did in Project 1 to accelerate cross-correlation, where it loops over the patch dimensions and fills in, e.g., the top-left pixel in all patches in one sliced assignment. Your job is to subtract the per-channel patch mean and divide each patch by the whole patch’s (not-per-channel) vector norm.
Potentially helpful features:
np.mean
, and particularly its axis
argumentreshape
method of array objectsnp.linalg.norm
(can take an axis
)A[A>4] = A[A>4] + 1
Implement compute_ncc_impl
. This takes two images that have been preprocessed as above and returns an h x w
image of NCC scores at each pixel.
Fill in each (TODO)
line in plane_sweep_stereo.py
to complete the overall plane sweep stereo algorithm. This is mostly a matter of making calls to the functions you’ve already implemented.
You are provided with some test cases in tests.py
. Feel free to run these with python tests.py
to help you with debugging. There are unit tests for all the functions you write, but not for the main program. You can, however, check that your output on tentacle
matches the results shown above.
If the code is running slowly while you’re debugging, you can speed things up by downsampling the datasets further, or computing fewer depth layers. In dataset.py
, modify:
self.depth_layers = 128
to change the number of depth hypotheses, or
self.stereo_downscale_factor = 4
to change the downsampling factor applied to the Middlebury datasets. The output image will be of dimensions (height / 2^stereo_downscale_factor, width / 2^stereo_downscale_factor)
.
We’ve configured the tentacle dataset such that it takes about 0.5-100 seconds to compute depending on your implementation. Because we’re using opencv to warp compute homographies and warp images, the main bottleneck will likely be preprocess_ncc
. Some tips:
The tricky part - collecting patches into a single array - is done for you; the approach is similar to that used in Project 1.
Avoid loops: do the mean subtraction and normalization using array operations.
Try to make your life as easy as possible by strategically reshaping your array in ways that make the current step as easy as possible to write and comprehend.
Make use of the axis
keyword argument of numpy functions. For example, you can sum an array along only one axis, or along more than one axis if you provide a tuple.
You can introduce singleton dimensions by slicing and adding np.newaxis
: if A.shape == (3, 4)
then A[np.newaxis, :, :].shape = (1, 3, 4)
.
If two array operands have dimensions that match except for a singleton, the singleton dimension will be broadcast to match the non-singleton:
A = np.zeros((5,4,3))
B = np.sum(A, axis=2) # has shape (5, 4)
C = A + B # error, dimension mismatch
C = A + B[:,:,np.newaxis] # B is auto-replicated across the channel dimension
There are no tasks for you to complete for this part, but the machinery is there and you’re welcome to try it out. Once you’ve computed depth for a scene, you can create a 3D model out of the depth map as follows:
python combine.py <dataset> depth
You can open up output/<dataset>_depth.ply
in Meshlab to see your own result for any of the datasets. Here’s the mesh from my tentacle
result:
You may need to fiddle with settings to get the colors to show - try un-toggling the triangley-cylinder button two buttons right from the wireframe cube at the top of the screen.
597P students must complete at least one of the following extensions. 497P students may complete one or more of these for extra credit.
imageio
). You’ll need to handle the conversion between depth and disparity, handle the different image sizes, and decide what metrics to use to measure accuracy. See Section 5 of Scharstein and Szeliski’s 2001 IJCV paper for some ideas on metrics.datasets.py
- feel free to use this.You can also propose your own extensions - feel free to run your ideas by me.
To get credit for your extension(s), you must:
readme.txt
, readme.pdf
, or readme.html
file in your repository’s base directory containing:tentacle
dataset and the Flowers
dataset, and commit them to your repository. You don’t need to submit .ply files.Points are awarded for correctness and efficiency, and deducted for issues with clarity or submission mechanics.
Correctness (35 points) | |
---|---|
Unit tests (35 points) | Correctness as determined by tests.py (score = ceil(n_passed*1.5 )) |
Stereo output (10 points) | Output on tentacle and Flowers |
Efficiency (4 points) | |
4 points | python plane_sweep_stereo.py tentacle runs in under 30 seconds |
P3 Survey (1 point) | P3 Survey is filled out (by both team members, if applicable) |
Extensions (597P) (10 points) |
At least one extension is implemented, analyzed, and documented thoroughly. |
Clarity Deductions for poor coding style may be made. Please see the syllabus for general coding guidelines. Up to two points may be deducted for each of the following:
This assignment is based on versions developed and refined by Kavita Bala, Noah Snavely, and countless underappreciated TAs.