Winter 2024
In this assignment, you’ll implement the two-view plane sweep stereo algorithm. Given two calibrated images of the same scene, but taken from different viewpoints, your task is to recover a rough depth map.
Assigned: Tuesday, Feb 27th, 2024
Deadline: Tuesday, March 5th, 2024
You are required to complete this assignment in groups of two. To ensure everyone is paired up in a timely fashion, please complete the following steps by the start of class on Thursday, 2/29:
mares_wehrwes
.A couple other relevant policies:
In the Project 3 assignment on Canvas, you will find a GitHub Classroom invitation link. Click this link to accept the Project 3 assignment invitation and create your personal repository for this project. Your repository already contains skeleton code
This project has similar software requirements to the prior projects.
If you don’t already have it, this project uses the imageio
package, which you can install to your virtual environment with
pip install imageio
.
To view the results of turning your depth maps into 3D meshes, you can use a mesh viewer such as Meshlab. This is installed on the lab machines; if you’re developing remotely on lab machines, I’d recommend installing this locally and copying meshes to your local machine since Meshlab will have a lot of latency if running it the network.
The input data is large enough that it was inadvisable to include it
in the github repository. Go into the data
directory and
run download.sh
to download the required datasets
(tentacle
is hosted locally, while the remaining datasets
are downloaded from the Middlebury
Stereo page). You can also uncomment other lines to download and try
out additional datasets if you’d like.
Alternatively, you can also download these datasets in a web browser
and extract them into the input
directory (for
tentacle
) or data
(for all others). Here’s the
direct link to the listing of Middlebury dataset zip files: https://vision.middlebury.edu/stereo/data/scenes2014/zip/.
When finished, you’ll be able to run
python plane_sweep_stereo.py <dataset>
where dataset
is one of
('tentacle', 'Adirondack', 'Backpack', 'Bicycle1', 'Cable', 'Classroom1', 'Couch', 'Flowers', 'Jadeplant', 'Mask', 'Motorcycle', 'Piano', 'Pipes', 'Playroom', 'Playtable', 'Recycle', 'Shelves', 'Shopvac', 'Sticks', 'Storage', 'Sword1', 'Sword2', 'Umbrella', 'Vintage')
.
Keep in mind except for tentacle
and Flowers
,
you’ll need to modify data/download.sh
to download any
other datasets before running your code on them.
For example, if you use the tentacle dataset
python plane_sweep_stereo.py tentacle
the output will be in
output/tentacle_{ncc.png,ncc.gif,depth.npy,projected.gif}
.
The following illustrates the two input views for the tentacle dataset:
The outputs of the plane-sweep stereo for the tentacle dataset should look like this:
The first animated gif is tentacle_projected.gif
, which
shows each rendering of the scene as a planar proxy is swept away from
the camera.
For this project, we use Normalized Cross Correlation (NCC) measure
for matching scores. The second animated gif is
tentacle_ncc.gif
, which shows slices of the NCC cost volume
where each frame corresponds to a single depth. White is high NCC and
black is low NCC.
The last image shows the correct depth output
tentacle_ncc.png
for the tentacle dataset, which is
computed from the argmax depth according to the NCC cost volume. White
is near and black is far.
Most of the code you will implement is in student.py
,
with the exception of the last task, which is to complete the main body
of the planesweep loop in plane_sweep_stereo.py
. It’s
recommeded that you start by taking a look through the well-commented
plane_sweep_stereo.py
to get an idea of where these
functions fit in. The functions to be implemented have detailed
specifications - see those for details of what you need to do.
Implement project_impl
. This projects 3D points into
a camera given its extrinsic and intrinsic calibration
matrices.
Implement unproject_corners_impl
. This un-projects
the corners of an image out into the scene to a distance
depth
from the camera and returns their world
coordinates.
Complete the implementation of preprocess_ncc_impl
.
This prepares an image for NCC by building an array of size
h x w x c * ncc_size * ncc_size
, where the final dimension
contains the normalized RGB values of all pixels in a
c * ncc_size * ncc_size
patch, unrolled into a vector. In
other words, if the input is I
and the output is
A
, then A[i,j,:]
contains the normalized pixel
values of the patch centered at I[i,j]
. See the method spec
for more details.
You have been given vectorized code that extracts the raw pixel
values and builds an array of size
(h, w, c, ncc_size, ncc_size)
. This uses a similar approach
to what you did in Project 1 to accelerate cross-correlation, where it
loops over the patch dimensions and fills in, e.g., the top-left pixel
in all patches in one sliced assignment. Your job is to subtract the
per-channel patch mean and divide each patch by the whole patch’s
(not-per-channel) vector norm.
Potentially helpful features:
np.mean
, and particularly its axis
argumentreshape
method of array objectsnp.linalg.norm
(can take an axis
)A[A>4] = A[A>4] + 1
Implement compute_ncc_impl
. This takes two images
that have been preprocessed as above and returns an h x w
image of NCC scores at each pixel.
Fill in each (TODO)
line in
plane_sweep_stereo.py
to complete the overall plane sweep
stereo algorithm. This is mostly a matter of making calls to the
functions you’ve already implemented.
You are provided with some test cases in tests.py
. Feel
free to run these with python tests.py
to help you with
debugging. There are unit tests for all the functions you write, but not
for the main program. You can, however, check that your output on
tentacle
matches the results shown above.
If the code is running slowly while you’re debugging, you can speed
things up by downsampling the datasets further, or computing fewer depth
layers. In dataset.py
, modify:
self.depth_layers = 128
to change the number of depth hypotheses, or
self.stereo_downscale_factor = 4
to change the downsampling factor applied to the Middlebury datasets.
The output image will be of dimensions
(height / 2^stereo_downscale_factor, width / 2^stereo_downscale_factor)
.
We’ve configured the tentacle dataset such that it takes about
0.5-100 seconds to compute depending on your implementation. Because
we’re using opencv to warp compute homographies and warp images, the
main bottleneck will likely be preprocess_ncc
. Some
tips:
The tricky part - collecting patches into a single array - is done for you; the approach is similar to that used in Project 1.
Avoid loops: do the mean subtraction and normalization using array operations.
Try to make your life as easy as possible by strategically reshaping your array in ways that make the current step as easy as possible to write and comprehend.
Make use of the axis
keyword argument of numpy
functions. For example, you can sum an array along only one axis, or
along more than one axis if you provide a tuple.
You can introduce singleton dimensions by slicing and adding
np.newaxis
: if A.shape == (3, 4)
then
A[np.newaxis, :, :].shape = (1, 3, 4)
.
If two array operands have dimensions that match except for a singleton, the singleton dimension will be broadcast to match the non-singleton:
A = np.zeros((5,4,3))
B = np.sum(A, axis=2) # has shape (5, 4)
C = A + B # error, dimension mismatch
C = A + B[:,:,np.newaxis] # B is auto-replicated across the channel dimension
There are no tasks for you to complete for this part, but the machinery is there and you’re welcome to try it out. Once you’ve computed depth for a scene, you can create a 3D model out of the depth map as follows:
python combine.py <dataset> depth
You can open up output/<dataset>_depth.ply
in
Meshlab to see your own result for any of the datasets. Here’s the mesh
from my tentacle
result:
You may need to fiddle with settings to get the colors to show - try un-toggling the triangley-cylinder button two buttons right from the wireframe cube at the top of the screen.
The following extensions can be completed for modest (up to 5 points) amount of extra credit. Each extra credit point is exponentially harder to earn.
computeHomography
.imageio
). You’ll need to handle the conversion between
depth and disparity, handle the different image sizes, and decide what
metrics to use to measure accuracy. See Section 5 of Scharstein and
Szeliski’s 2001 IJCV
paper for some ideas on metrics.datasets.py
- feel free to use
this.You can also propose your own extensions - feel free to run your ideas by me.
To get credit for your extension(s), you must:
readme.txt
, readme.pdf
, or
readme.html
file in your repository’s base directory
containing:tentacle
dataset and the
Flowers
dataset, and commit them to your repository. You
don’t need to submit .ply files.Points are awarded for correctness and efficiency, and deducted for issues with clarity or submission mechanics.
Correctness (45 points) | |
---|---|
Unit tests (35 points) | Correctness as determined by tests.py
(score = ceil(n_passed*1.5 )) |
Stereo output (10 points) | Output on tentacle and Flowers |
Efficiency (5 points) | |
5 points | python plane_sweep_stereo.py tentacle runs in under 30
seconds |
Clarity Deductions for poor coding style may be made. Please see the syllabus for general coding guidelines. Points may be deducted for any of the following:
This assignment is based on versions developed and refined by Kavita Bala, Noah Snavely, and numerous underappreciated TAs.