CSCI 497P/597P Project 1: Filtering

Scott Wehrwein

Spring 2020

Look at the image from very close and then very far. What do you see?

Assigned: Friday, October 2nd, 2020

Code Deadline: Monday, October 12, 2020 at 10pm

Artifact Deadline: Tuesday, October 13, 2020 10pm

Overview

This assignment explores two applications of image filtering using convolution.

The first application is to create hybrid images like the one shown above using a simplified version of this SIGGRAPH paper by Oliva, Torralba, and Shyns from 2006. The basic idea is that high frequency tends to dominate perception when it is available, but, at a distance, only the low frequency (smooth) part of the signal can be seen. By blending the high frequency portion of one image with the low-frequency portion of another, you get a hybrid image that leads to different interpretations at different distances. You will implement filtering routines in numpy, then use them to create your own hybrid images.

The second application is detail enhancement using a Laplacian pyramid. Making use of the same filtering routines from the first part, you’ll write code to build a laplacian pyramid for an image, then reconstruct the image with each layer scaled up or down to allow you to accentuate or attenuate different frequency content in the image.

Setup

Skeleton Code

In the Project 1 assignment on Canvas, you will find a GitHub Classroom invitation link. Click this link to accept the Project 1 assignment invitation and create your personal repository for this project. Your repository already contains skeleton code, including a user interface for creating hybrid images (hybrid_gui.py) and a UI for Laplacian detail enhancement (laplacian_gui.py). You will complete several functions in filtering.py that implement the functionality used by the UI programs. The next section walks you through each function. Please keep track of the approximate number of hours you spend on this assignment, as you will be asked to report this in hours.txt when you submit.

Software

The CS lab computers have all the necessary dependencies installed for you to run this project. If you wish to work on it on your own computer, it is up to you to install the following dependencies. This list is probably overcomplete; I don’t think it’s incomplete, but please send me email if I’ve missed something. The parenthesized versions are what is currently installed in the lab; other versions may well suffice, but it’s recommended that you stick with the same, or at least newer, major version numbers (i.e., Python >=3.6, OpenCV >=3.2.0) to minimize compatibility problems.

Remote access issues for Fall 2020

General help with accessing the lab systems remotely can be found here: https://gitlab.cs.wwu.edu/cs-support/public/-/wikis/Remotely_Accessing_Resources.

To run the UI remotely, you’ll need to set up X forwarding. Some instructions on this can be found here: https://gitlab.cs.wwu.edu/cs-support/public/-/wikis/SSH. If you’re coming from a non-unixy platform, you likely won’t have an X server installed, so you’ll need to install XQuartz if you’re on a Mac, or VcXsrv if you’re on windows.

Forbidden Functions

For just this assignment, you are forbidden from using any built-in functions from Numpy, Scipy, OpenCV, or other libraries that pertain to filtering and resizing. This limitation will be lifted in future assignments, but for now, you should use for loops, vectorized numpy operations and indexing. Basic math operations like np.sum are fine. If you’re not sure whether something is permitted, just ask.

Part 1: Filtering Functions

Your first step is to implement the basic filtering routines to perform 2D discrete cross-correlation and convolution. You will implement the following five functions:

1. cross_correlation_2d

This will be the workhorse for much of the remaining functionality. Take a look back at the lecture slides if you need a reminder of the definition of cross-correlation; some basic pseudocode was also provided. In this implementation, we’re using the “same” output size and zero padding to fill in values outside the input image.

Efficiency

The cross_correlation_2d function is computationally intensive: filtering an image of size M x N with a kernel of size K x K is an \(O(MNK^2)\) operation. For arbitrary kernels, this is unavoidable without using Fourier domain tricks that we haven’t covered. However, numpy’s array processing routines are highly optimized and allow for huge speedups of array operations relative to Python for loops, which must be executed line by line by the Python interpreter.

As usual, you should focus on getting a correct solution first. I strongly encourage you to write a slow version with as many nested for loops as you need. Then, see if you can eliminate some of the nested loops by batching computations with numpy array operations. Because the rest of the assignment depends heavily on this function, it’s worth some effort to optimize it. One way to go about this is to look in the code for computations that could be batched together as array operations. Another would be to play around with the equation for calculating cross correlation and try rearranging terms to minimize repetition.

A full-credit solution will use only two for loops to loop over the values in the kernel (not the image), for a total of only 9 python for loop iterations given a 3x3 kernel. That said, most of the efficiency points are awarded for an asymptotically efficient approach (see the rubric for details on the efficiency points). Try not to sacrifice readability: make sure your approach is well-commented if you’re making any nontrivial optimizations.

2. convolve_2d

This one’s not so bad - you should make use of your cross-correlation function to implement this in a small few lines.

3. gaussian_blur_kernel_2d

This function generates a Gaussian blur filter of a given size.The coordinate system of a filter places (0,0) at the middle of its center pixel, and pixel centers are assumed to be spaced 1 unit apart. Evaluate the Gaussian function (given in the lecture slides) with the given \(\sigma\) for the position of each pixel in a filter of the given dimensions.

We’d like our filter values to sum to one; meanwhile, one property of a Gaussian function is that its integral over the entire domain is one. This means if our filter has finite size, its values won’t (quite) sum to 1. Because we want to preserve overall image brightness, you should re-normalize the values in your Gaussian kernel so that they do sum to exactly 1.

4. low_pass

Recall that a low-pass filter leaves lower frequencies alone while attenuating high frequencies. This is exactly blurring does, so using the functions you’ve already implemented makes this one pretty short.

5. high_pass

A high-pass filter does the opposite of a low-pass filter: it preserves high frequencies while eliminating low frequencies. We could achieve this with a single filter, but it’s easier and barely less efficient to simply subtract the low-pass image from the original to get the high-pass result.

Testing

test.py provides you with a non-exhaustive set of unit tests that you may find helpful for debugging.

Hybrid Images

Hybrid images like the one at the top of this page are made by combining two different images, one low-pass filtered and one high-pass filtered. For example, the image at the top of the page is generated from these two images:

Adding the two together gives you the image at the top of this page. One easy way to visualize the effect is to view progressively downsampled versions of the image (i.e., a Gaussian pyramid):

You can also use your browser’s zoom functionality to resize images when viewing them in a web browser.

The function create_hybrid_images has been implemented for you. Its parameters permit the caller to choose whether each image is high- or low-pass filtered, as well as the sigma and kernel size for each filter.

The file hybrid_gui.py is a program that lets you see the results of creating hybrid images, and play with parameters interactively. This is where it becomes really nice to have your cross-correlation running blazing fast.

Using the GUI

The GUI allows you to load two images, align them, then combine different frequency content from each image into a single hybrid image. Start by clicking “Load First Image”, loading resources/dog.jpg, then “Load Second Image”, and load resources/cat.jpg. Now, you’ll click three points on each image to help the program align them. Click the dog’s left eye, right eye, and tongue (in that order). Then click the cat’s left eye, right eye, and nose. Now click “View Hybrid”, where you’ll see the combined image in the middle. You can now play around with the filter size and blur width (sigma) of each filter.

You can also save and load correspondences and filter configurations. A preset that gets you something somewhat close to the image at the top of this page can be loaded with the following command:

python3 hybrid_gui.py -t resources/sample-correspondence.json -c resources/sample-config.json

Note: The result from this preset will not match the result on this webpage (which comes directly from the SIGGRAPH paper) because our implementation is a simplified version than theirs.

Part 2: Laplacian Detail Enhancement

In class we talked about how Laplacain pyramids can be used to separate out different slices of frequency content in an image. One straightforward application of these is to apply weights to independently boost or attenuate specific levels of the pyramid when reconstructing. For example, I took this image of an undisclosed beach in the vicinity of Bellingham:

I built a 7-level Laplacian pyramid, then chose weights that were less than one for the two lowest-frequency levels, greater than one for two middle-frequency levels, and less than one for the highest three. Reconstructing with the levels weighted this way results in an image with the low and high frequencies muted, and the mid-frequency contrasts enhanced:

Your task in this section is to implement the two functions construct_laplacian and reconstruct_laplacian. For an overview of how this is done, take a look back at the lecture slides. The sections below specify the differences from the basic version presented in lecture. To keep things simple, this function assumes that the dimensions of your image are divisible by 2 enough times to do simple 2x down- and up-sampling for each level of the pyramid without any integer division roundoff error in dimensions.

1. construct_laplacian

In lecture, we computed the high-pass image as L_i = f - blur(f). The problem with this appraoch is that when we go to reconstruct, the upsampled image that will get added to the current high-pass image doesn’t exactly equal blur(f), and so L_i + upsample(rec) won’t exactly equal the original f (here, rec refers to the thus-far reconstructed image from the next smaller level of the pyramid). We can solve this by tweaking the algorithm a little to compute the high-pass image based on exactly what we’ll have when reconstructing, namely, upsample(rec). Instead of L_i = f - blur(f), we’ll instead do the down and upsampling up front so we can save the precise difference: L_i = f - upsample(subsample(blur(f))).

You will probably find it helpful to make some helper methods here for down- and up-sampling. For best results, I recommend using the blur filter proposed in the original paper, which is a separable filter built from the following approximation of a 1D Gaussian: [0.0625, 0.25, 0.375, 0.25, 0.0625]. Whatever filter you use, you will need to use the same one for downsampling and upsampling (reconstruction) in order to achieve an accurate reconstruction.

2. reconstruct_laplacian

The reconstruction procedure follows the one presented in lecture with one modification: each level of the pyramid can be multiplied by a scalar weight before being added back into the result image. This allows for independent manipulation of frequency slices independently. The function takes a weights parameter that can be None, in which case the weights are assumed to be all 1, or a list containing one weight per level of the pyramid.

When both methods are implemented, you should be able to call reconstruct on the output of construct with weights=None and get a visually identical image back (there may be imperceptible differences due to compression and quantization).

GUI

A barebones GUI program is provided in laplacian_gui.py that allows you to interactively edit images using your Laplacian pyramid function. You can run the GUI with no arguments and load an image using the button. You can also specify an input image to load immediately with the --image (-i) flag, and a number of pyramid levels to compute (the default is 5) with the --levels (-l) flag. For example, running:

python3 laplacian_scratch.py -i resources/beach.jpg -l 7

loads the beach image and allows you to edit it using a 7-level pyramid. As with the hybrid GUI, you can save out the edited image with the Save Image button.

Artifacts

Now that you’ve made some nifty image editing tools, use them to make some cool images. Find your own source images and create your own hybrid image. I suggest reading Section 2.2 of the paper and taking a look at their hybrid images gallery for guidance and inspiration on what kinds of image pairs make good hybrids.

Also pick a photo and use the Laplacian pyramid editor to edit different frequency bands to create an an intersesting result.

Optionally, post your artifacts on Instagram and use a hashtag such as #laplacianfilter or #highpassfilter to make sure people know how clever you are.

I will collect the artifacts into a showcase webpage and the class will get the opportunity to vote on their favorites. The top three artifacts will receive a nominal amount of extra credit.

Submission

  1. On the first line of hours.txt, write an estimated (integer) number of hours you spent on this assignment. On the line below the number of hours, feel free to tell me how the assignment went, any parts you found particularly challenging, or anything else you want me to know.
  2. Push your final changes to the P1 repository to github before the code deadline.
  3. Submit hybrid.jpg, your hybrid image artifact, and laplacian.jpg, your Laplacian edited artifact, to the P1 assignment on Canvas by the artifact deadline.

Rubric

Points are awarded for correctness and efficiency, and deducted for issues with clarity or submission mechanics.

Correctness (30 points)
Filtering (20 points) Correctness as determined by automated tests.
Laplacian construction (4 points) Construction produces a correct Laplacian pyramid.
Laplacian reconstruction (4 points) Unweighted reconstructed image is visually identical to the original.
Laplacian editing (2 points) Reconstruction correctly applies weights to individual levels.
Efficiency (15 points)
10 points Filtering routines are asymptotically efficient
3 points cross_correlation_2d uses vectorization to avoid quadruply-nested for loops
2 points cross_correlation_2d uses no more than 2 nested python loops that traverse over the kernel, not the image
Artifacts (2 points each) Artifacts are submitted to Canvas
hours.txt (1 point) hours.txt contains a single integer on the first with the approximate number of hours you spent on the assignment

Clarity Deductions for poor coding style may be made. Please see the syllabus for general coding guidelines. Up to two points may be deducted for each of the following:

Acknowledgements

Part 1 of this assignment is based on versions developed and refined by Noah Snavely, Kavita Bala, James Hays, Derek Hoiem, and numerous underappreciated TAs.