Winter 2024
Look at the image from very close and then very far. What do you see?
Assigned: Tuesday, January 16th, 2024
Code Deadline: Friday, Feburary 2nd, 2024 at 10pm
Artifact Deadline: Saturday, February 3rd, 2024 at 10pm
The objective of this project is to efficiently implement image filtering (i.e., convolution) and demonstrate its use in a couple of visually interesting image processing applications. Completion of this project satisfies part of the first course-level learning objective:
Demonstrate a thorough understanding of photometric […] image transformations, including convolution […]
Along the way, this project will help you gain the following skills and knowledge, which will be useful throughout the remainder of the course:
Ability to work with image data represented as multidimensional arrays in Python
Ability to write efficient vectorized code for image processing operations
Understanding of spatial frequency and how it can be manipulated using filters.
This assignment explores two applications of image filtering using convolution.
The first application is to create hybrid images like the one shown above using a simplified version of this SIGGRAPH paper by Oliva, Torralba, and Shyns from 2006. The basic idea is that high frequency tends to dominate perception when it is available, but, at a distance, only the low frequency (smooth) part of the signal can be seen. By blending the high frequency portion of one image with the low-frequency portion of another, you get a hybrid image that leads to different interpretations at different distances. You will implement filtering routines in numpy, then use them to create your own hybrid images.
The second application is detail enhancement using a Laplacian pyramid. Making use of the same filtering routines from the first part, you’ll write code to build a laplacian pyramid for an image, then reconstruct the image with each layer scaled up or down to allow you to accentuate or attenuate different frequency content in the image.
In the Project 1 assignment on Canvas, you will find a GitHub
Classroom invitation link. Click this link to accept the Project 1
assignment invitation and create your personal repository for this
project. Your repository already contains skeleton code, including a
user interface for creating hybrid images (hybrid_gui.py
)
and a UI for Laplacian detail enhancement
(laplacian_gui.py
). You will complete several functions in
filtering.py
that implement the functionality used by the
UI programs. The next section walks you through each function. Please
keep track of the approximate number of hours you spend on this
assignment, as you will be asked to report this in
hours.txt
when you submit.
The CS lab computers have all the necessary dependencies installed for you to run this project. If you wish to work on it on your own computer, it is up to you to install the following dependencies. This list may be overcomplete; I don’t think it’s missing anything, but please send me email if I’ve missed something. The parenthesized versions are what is currently installed in the lab; other versions may well suffice, but it’s recommended that you stick with the same, or at least newer, major version numbers (i.e., Python >=3.11, OpenCV >=4.6.0) to minimize compatibility problems.
General help with accessing the lab systems remotely can be found here: https://gitlab.cs.wwu.edu/cs-support/public/-/wikis/Remotely_Accessing_Resources.
To run the UI remotely, you’ll need to set up X forwarding. Some instructions on this can be found here: https://gitlab.cs.wwu.edu/cs-support/public/-/wikis/SSH. If you’re coming from a non-unixy platform, you likely won’t have an X server installed, so you’ll need to install XQuartz if you’re on a Mac, or VcXsrv if you’re on windows.
For just this assignment, you are forbidden from using any built-in
functions from Numpy, Scipy, OpenCV, or other libraries that pertain to
filtering and resizing. This
limitation will be lifted in future assignments, but for now, you should
use for
loops, vectorized numpy operations and
slicing/indexing. Basic math operations like np.sum
are
fine. If you’re not sure whether something is permitted, just ask.
Your first step is to implement the basic filtering routines to perform 2D discrete cross-correlation and convolution. You will implement the following five functions:
cross_correlation_2d
This will be the workhorse for much of the remaining functionality. Take a look back at the lecture slides if you need a reminder of the definition of cross-correlation; some basic pseudocode was also provided. In this implementation, we’re using the “same” output size and zero padding to fill in values outside the input image.
The cross_correlation_2d
function is computationally
intensive: filtering an image of size M x N with a kernel of size K x K
is an \(O(MNK^2)\) operation. For
arbitrary kernels, this is unavoidable without using Fourier domain
tricks that we haven’t covered. However, numpy’s array processing
routines are highly optimized and allow for huge speedups of array
operations relative to Python for
loops, which must be
executed line by line by the Python interpreter.
As usual, you should focus on getting a correct solution first. I strongly encourage you to write a slow version with as many nested for loops as you need. Then, see if you can eliminate some of the nested loops by batching computations with numpy array operations. Because the rest of the assignment depends heavily on this function, it’s worth some effort to optimize it. One way to go about this is to look in the code for computations that could be batched together as array operations. Another would be to play around with the equation for calculating cross correlation and try rearranging terms to minimize repetition.
A full-credit solution will use only two for
loops to
loop over the values in the kernel (not the image), for
a total of only 9 python for loop iterations given a 3x3 kernel. That
said, most of the efficiency points are awarded for an asymptotically
efficient approach (see the rubric for details on the efficiency
points). Try not to sacrifice readability: make sure your approach is
well-commented if you’re making any nontrivial optimizations.
convolve_2d
You should make use of your cross-correlation function to implement this in a small few lines.
gaussian_blur_kernel_2d
This function generates a Gaussian blur filter of a given size.The coordinate system of a filter places (0,0) at the middle of its center pixel, and pixel centers are assumed to be spaced 1 unit apart. Evaluate the Gaussian function (given in the lecture slides) with the given \(\sigma\) for the position of each pixel in a filter of the given dimensions.
We’d like our filter values to sum to one; meanwhile, one property of a Gaussian function is that its integral over the entire domain is one. This means if our filter has finite size, its values won’t (quite) sum to 1. Because we want to preserve overall image brightness, you should re-normalize the values in your Gaussian kernel so that they do sum to exactly 1.
low_pass
Recall that a low-pass filter leaves lower frequencies alone while attenuating high frequencies. This is exactly what blurring does, so using the functions you’ve already implemented makes this one pretty short.
high_pass
A high-pass filter does the opposite of a low-pass filter: it preserves high frequencies while eliminating low frequencies. We could achieve this with a single filter, but it’s easier and barely less efficient to simply subtract the low-pass image from the original to get the high-pass result.
test.py
provides you with a non-exhaustive set of unit
tests that you may find helpful for debugging.
Hybrid images like the one at the top of this page are made by combining two different images, one low-pass filtered and one high-pass filtered. For example, the image at the top of the page is generated from these two images:
Adding the two together gives you the image at the top of this page. One easy way to visualize the effect is to view progressively downsampled versions of the image (i.e., a Gaussian pyramid):
You can also use your browser’s zoom functionality to resize images when viewing them in a web browser.
The function create_hybrid_images
has been implemented
for you. Its parameters permit the caller to choose whether each image
is high- or low-pass filtered, as well as the sigma and kernel size for
each filter.
The file hybrid_gui.py
is a program that lets you see
the results of creating hybrid images, and play with parameters
interactively. This is where it becomes really nice to have your
cross-correlation running blazing fast.
The GUI allows you to load two images, align them, then combine
different frequency content from each image into a single hybrid image.
Start by clicking “Load First Image”, loading
resources/dog.jpg
, then “Load Second Image”, and load
resources/cat.jpg
. Now, you’ll click three points on each
image to help the program align them. Click the dog’s left eye, right
eye, and tongue (in that order). Then click the cat’s left eye, right
eye, and nose. Now click “View Hybrid”, where you’ll see the combined
image in the middle. You can now play around with the filter size and
blur width (sigma) of each filter.
You can also save and load correspondences and filter configurations. A preset that gets you something somewhat close to the image at the top of this page can be loaded with the following command:
python3 hybrid_gui.py -t resources/sample-correspondence.json -c resources/sample-config.json
Note: The result from this preset will not match the result on this webpage (which comes directly from the SIGGRAPH paper) because our implementation is a simplified version than theirs.
In class we talked about how Laplacain pyramids can be used to separate out different slices of frequency content in an image. One straightforward application of these is to apply weights to independently boost or attenuate specific levels of the pyramid when reconstructing. For example, I took this image of an undisclosed beach in the vicinity of Bellingham:
I built a 7-level Laplacian pyramid, then chose weights that were less than one for the two lowest-frequency levels, greater than one for two middle-frequency levels, and less than one for the highest three. Reconstructing with the levels weighted this way results in an image with the low and high frequencies muted, and the mid-frequency contrasts enhanced:
Your task in this section is to implement the two functions
construct_laplacian
and reconstruct_laplacian
.
For an overview of how this is done, take a look back at the lecture
slides. The sections below specify the differences from the basic
version presented in lecture. To keep things simple, this function
assumes that the dimensions of your image are divisible by 2 enough
times to do simple 2x down- and up-sampling for each level of the
pyramid without any integer division roundoff error in dimensions.
construct_laplacian
In lecture, we computed the high-pass image as
L_i = f - blur(f)
. The problem with this approach is that
when we go to reconstruct, the upsampled image that will get added to
the current high-pass image doesn’t exactly equal blur(f)
,
and so L_i + upsample(rec)
won’t exactly equal the original
f
(here, rec
refers to the thus-far
reconstructed image from the next smaller level of the pyramid). We can
solve this by tweaking the algorithm a little to compute the high-pass
image based on exactly what we’ll have when reconstructing,
namely, upsample(rec)
. Instead of
L_i = f - blur(f)
, we’ll instead do the down and upsampling
up front so we can save the precise difference:
L_i = f - upsample(subsample(blur(f)))
.
You will probably find it helpful to make some helper methods here
for down- and up-sampling. For best results, I recommend using the blur
filter proposed in the original
paper, which is a separable filter built from the following
approximation of a 1D Gaussian:
[0.0625, 0.25, 0.375, 0.25, 0.0625]
. Whatever filter you
use, you will need to use the same one for downsampling and upsampling
(reconstruction) in order to achieve an accurate reconstruction.
reconstruct_laplacian
The reconstruction procedure follows the one presented in lecture
with one modification: each level of the pyramid can be multiplied by a
scalar weight before being added back into the result image. This allows
for independent manipulation of frequency slices independently. The
function takes a weights
parameter that can be
None
, in which case the weights are assumed to be all 1, or
a list containing one weight per level of the pyramid.
When both methods are implemented, you should be able to call
reconstruct
on the output of construct
with
weights=None
and get a visually identical image back (there
may be imperceptible differences due to compression and
quantization).
A barebones GUI program is provided in laplacian_gui.py
that allows you to interactively edit images using your Laplacian
pyramid function. You can run the GUI with no arguments and load an
image using the button. You can also specify an input image to load
immediately with the --image
(-i
) flag, and a
number of pyramid levels to compute (the default is 5) with the
--levels
(-l
) flag. For example, running:
python3 laplacian_gui.py -i resources/beach.jpg -l 7
loads the beach image and allows you to edit it using a 7-level pyramid. As with the hybrid GUI, you can save out the edited image with the Save Image button.
Now that you’ve made some nifty image editing tools, use them to make some cool images. Find your own source images and create your own hybrid image. I suggest reading Section 2.2 of the paper and taking a look at their hybrid images gallery for guidance and inspiration on what kinds of image pairs make good hybrids.
Also pick a photo and use the Laplacian pyramid editor to edit different frequency bands to create an an intersesting result.
Optionally, post your artifacts on Instagram and use a hashtag such as #laplacianfilter or #highpassfilter to make sure people know how clever you are.
I will collect the artifacts into a showcase webpage and the class will get the opportunity to vote on their favorites. The top three artifacts will receive a nominal amount of extra credit.
hybrid.jpg
, your hybrid image artifact, and
laplacian.jpg
, your Laplacian edited artifact, to the P1
assignment on Canvas by the artifact deadline. Note that the artifact
deadline is one day later than the code deadline.Points are awarded for correctness and efficiency, and deducted for issues with clarity or submission mechanics.
Correctness (30 points) | |
---|---|
Filtering (20 points) | Correctness as determined by automated tests. |
Laplacian construction (4 points) | Construction produces a correct Laplacian pyramid. |
Laplacian reconstruction (4 points) | Unweighted reconstructed image is visually identical to the original. |
Laplacian editing (2 points) | Reconstruction correctly applies weights to individual levels. |
Efficiency (15 points) | |
10 points | Filtering routines are asymptotically efficient |
3 points | cross_correlation_2d uses vectorization to avoid
quadruply-nested for loops |
3 points | cross_correlation_2d uses no more than 2 nested python
loops that traverse over the kernel, not the image |
Artifacts (2 points each) | Artifacts are submitted to Canvas |
Clarity Deductions for poor coding style may be made. Please see the syllabus for general coding guidelines. Up to two points may be deducted for each of the following:
Part 1 of this assignment is based on versions developed and refined by Noah Snavely, Kavita Bala, James Hays, Derek Hoiem, and numerous underappreciated TAs.