The goal of feature detection and matching is to identify a pairing between a point in one image and a corresponding point in another image. These correspondences can then be used to stitch multiple images together into a panorama (Project 3), or in various other computer vision applications (examples given in lecture).
In this project, you will write code to detect discriminating features (which are reasonably invariant to translation, rotation, and illumination) in an image and find the best matching features in another image.
To help you visualize the results and debug your program, we provide a user interface that displays detected features and best matches in another image. We also provide an example ORB feature detector, a popular technique in the vision community, for comparison. ORB is intended as a replacement for SIFT: it builds on the FAST keypoint detector and the BRIEF feature descriptor.
The project has three parts: feature detection, feature description, and feature matching.
In this step, you will identify points of interest in the image using the Harris corner detection method. The steps are as follows (see the lecture slides and textbook for more details). For each point in the image, consider a window of pixels around that point. Compute the Harris matrix H for (the window around) that point, defined as
where the summation is over all pixels p in the
window. is the x derivative of the image at point p, the notation is similar for the y derivative. You should use the 3x3 Sobel operator to compute the x, y derivatives (extrapolate pixels outside the image with reflection). The weights
should
be circularly symmetric (for rotation invariance) - use a 5x5 Gaussian mask with 0.5 sigma (or you may set Gaussian mask values more than 4 sigma from the mean to zero. This is within tolerance.). Use reflection for gradient values outside of the image range. Note that H is a 2x2 matrix.
Then use H to compute the corner strength function, c(H), at every pixel.
We will also need the orientation in degrees at every pixel. Compute the approximate orientation as the angle of the gradient. The zero angle points to the right and positive angles are counter-clockwise. Note: do not compute the orientation using eigen analysis of the structure tensor.
We will select the strongest keypoints (according to c(H)) which are local maxima in a 7x7 neighborhood.
Now that you have identified points of interest, the next step is to come up with a descriptor for the feature centered at each interest point. This descriptor will be the representation you use to compare features in different images to see if they match.
You will implement two descriptors for this project. For starters, you will implement a simple descriptor which is the pixel intensity values in the 5x5 neighborhood. This should be easy to implement and should work well when the images you are comparing are related by a simple translation.
Second, you will implement a simplified version of the MOPS descriptor. You will compute an 8x8 oriented patch sub-sampled from a 40x40 pixel region around the feature. You have to come up with a transformation matrix which transforms the 40x40 rotated window around the feature to an 8x8 patch rotated so that its keypoint orientation points to the right. You should also normalize the patch to have zero mean and unit standard deviation. If the standard deviation is very close to zero (less than 10-5 in magnitude) then you should just return an all-zeros vector to avoid a divide by zero error.
You will use cv2.warpAffine
to perform the transformation.
warpAffine
takes a 2x3 forward warping affine matrix, which is
multiplied from the left with the input coordinates given as a column vector.
The easiest way to generate the 2x3 matrix is by combining a sequence of
translation (T1), rotation (R), scaling (S) and translation (T2) matrices.
Recall that a sequence of linear transformations left-multiplied by a vector
are applied applied right-to-left; this means that if the transformation
matrix is the matrix product T2 S R T1, then you perform the translation T1,
followed by the rotation R, then scale S, then another translation T2. The
figures below illustrate the sequence. You are provided with some utility
methods (more details below) in transformations.py
that may help
you in constructing your transformation matrix.
Now that you have detected and described your features, the next step is to write code to match them (i.e., given a feature in one image, find the best matching feature in another image).
The simplest approach is the following: compare two features and calculate a scalar distance between them. The best match is the feature with the smallest distance. You will implement two distance functions:
We have implemented tests for TODO's 1-6. You should design your own tests for TODO 7 and 8. Lastly, be aware that the supplied tests are very simple - they are meant as an aid to help you get started. Passing the supplied test cases does not mean the graded test cases will be passed.
Now you are ready to go! Using the UI and skeleton code that we provide, you can load in a set of images, view the detected features, and visualize the feature matches that your algorithm computes. Watch this video to see the UI in action.
By running featuresUI.py, you will see a UI where you have the following choices:
The UI is a tool to help you visualize the keypoint detections and feature matching results. Keep in mind that your code will be primarily evaluated numerically, not visually.
We provide a set of benchmark images to be used to test the performance of your algorithm as a function of different types of controlled variation (i.e., rotation, scale, illumination, perspective, blurring). For each of these images, we know the correct transformation and can therefore measure the accuracy of each of your feature matches. This is done using a routine that we supply in the skeleton code.
You should also go out and take some photos of your own to see how well your approach works on more interesting data sets. For example, you could take images of a few different objects (e.g., books, offices, buildings, trees, etc.) and see how well it works.
Skeleton. Find the Github Classroom invitation link in
the Project 2 assignment on Canvas. You will be writing code in
features.py
. The user interface is given in
featuresUI.py
- run this file with Python to start up the user
interface. Sample images and benchmarks are provided in the
resources
subdirectory. Please keep track of the approximate
number of hours you spend on this assignment, as you will be asked to report
this in your writeup a file called hours.txt when you submit.
Software : The lab machines should have all the necessary software to run the code for this project. The dependencies for this project should be the same as the ones listed in Project 1 - please see the Project 1 writeup for details. If you encounter issues (especially if you find that there are additional dependencies missing), please post on Piazza so the writeup can be updated accordingly.
We have given you a number of classes and methods.
You are responsible for reading the code documentation and understanding the role of each class
and function.
The code you need to write will be for your feature detection methods,
your feature descriptor methods and your
feature matching methods. All your required edits will be in features.py
.
The feature computation and matching methods are called from the UI functions in
featuresUI.py
. Feel free to extend the functionality of the UI code but remember
that only the code in features.py will be graded.
597P students have one additional task (#4 below).
The function detectKeypoints
in HarrisKeypointDetector
is one of the main ones you
will complete, along with the helper
functions computeHarrisValues
(computes Harris scores and orientation for each pixel in the image)
and computeLocalMaxima
(computes a Boolean array which tells for each pixel if it is equal to the local maximum). These implement Harris corner detection.
You may find the following functions helpful:
scipy.ndimage.sobel
: Filters the input image with Sobel filter.scipy.ndimage.gaussian_filter
: Filters the input image with a Gaussian filter.scipy.ndimage.filters.maximum_filter
: Filters the input image with a maximum filter.scipy.ndimage.filters.convolve
: Filters the input image with the selected filter.SimpleFeatureDescriptor
and MOPSFeaturesDescriptor
classes. The describeFeatures
function
of these classes take the location
and orientation information already stored in a set of key points (e.g.,
Harris corners), and
compute descriptors for these key points, then store these
descriptors in a numpy array which has the same number of rows as
the computed key points and columns as the dimension of the feature (e.g., 25 for the 5x5 simple feature descriptor).
For the MOPS implementation,
you should create a matrix which transforms the 40x40 patch centered at the key point to a canonical orientation and scales it down by 5,
as described in the lecture slides. You may find the functions in transformations.py
helpful, which implement 3D affine transformations (you will need to convert them to 2D manually). Reading the opencv documentation for cv2.warpAffine
is recommended.
matchFeatures
function of SSDFeatureMatcher
and RatioFeatureMatcher
.
These functions return a list of cv2.DMatch
objects. These are simply struct-like objects with three fields:
queryIdx
trainIdx
distance
queryIdx
attribute to the index of the feature in the first image,
the trainIdx attribute to the index of the feature in the second image
and the distance attribute to the distance between the two features as
defined by the particular distance metric (e.g., SSD or ratio). You may find
scipy.spatial.distance.cdist and numpy.argmin helpful when
implementing these functions.
For 597P students only:
Once you have completed all of the above tasks, make a copy of your
features.py
and save it as features_scale_invariant.py
, being
sure to add it to your git repository. In this file, you will improve your
implementation by making the descriptor scale-invariant. This means you will
need to change the way features are detected, described, or both. How you do
it is your design choice, but it should be well motivated. You may refer to the
original MOPS
paper to see how the authors accomplished scale invariance.
You may choose to implement that, something else, or invent your own approach.
You will describe your implementation and how it works in your writeup.
In our codebase, the simplest approach is probably something like the following:
detectKeypoints
, compute features on each layer of a Gaussian pyramid.octave
field to
indicate which level of the pyramid it was detected in.describeFeatures
, use the octave
value of
the keypoint to modify the feature transformation matrix so that it
extracts a patch from the original image that's equivalent to a 40x40
patch in the given level of the Gaussian pyramid. Alternatively, you could
compute the Gaussian pyramid and use octave to decide which pyramid level to
extract the descriptor from. You may want to look up OpenCV's built-in
functions for creating Gaussian pyramids.
For all students:
Create a short report (details below), which includes benchmarking results in
terms of ROC curves and AUC on the Yosemite
dataset provided in
the resources
directory. You can obtain ROC curves by running
featuresUI.py, switching to the "Benchmark" tab, pressing "Run Benchmark",
selecting the directory "resources/yosemite" and waiting a short while. Then
the AUC will be shown on the bottom of the screen and you may save the ROC
curve by pressing "Screenshot". You will also need to include one harris
image. This is saved as harris.png
, every time the harris
keypoints are computed.
You may use NumPy, SciPy and OpenCV2 functions to implement mathematical, filtering and transformation operations. Do not use functions which implement keypoint detection or feature matching.
When using the Sobel operator or gaussian filter, you should use the 'reflect' mode, which gives a zero gradient at the edges.
Here is a list of potentially useful functions (you are not required to use them):
scipy.ndimage.sobel
scipy.ndimage.gaussian_filter
scipy.ndimage.filters.convolve
scipy.ndimage.filters.maximum_filter
scipy.spatial.distance.cdist
cv2.warpAffine
np.max, np.min, np.std, np.mean, np.argmin, np.argpartition
np.degrees, np.radians, np.arctan2
transformations.get_rot_mx
(in transformations.py
)transformations.get_trans_mx
transformations.get_scale_mx
All Students: Push your finished code to Github. Create a file
called hours.txt in the repository's root directory that includes a single
lone integer estimate of the number of hours Include an estimate of the
number of hours you worked in a file called hours.txt in the repository's
root directory.
597P Students: Make sure your features_scale_invariant.py and readme_scale_invariant.txt are tracked in your repo and pushed to Github.
If you attempted the extra credit (see below) make sure extracredit.py and extracredit_readme.txt are tracked in your repo and pushed to Github. If you did not attempt the extra credit then you do not need to submit extracredit.py or extracredit_readme.txt (please do not include empty extra credit files).
Create a report in form of a pdf document or a small website describing your work and submit it to Canvas. If you submit a webpage, you should submit it as a zip file containing an index.html
file along with any embedded files e.g. images. PDF reports should be submitted as report.pdf
.
Your report should contain the following deliverables:
We will give extra credit for solutions (feature detection and descriptors) that improve mean AUC for the ratio test by at least 15%. A 15% improvement is interpreted as a 15% reduction in (1 - AUC). i.e. the area above the curve. Here is one suggestion (and you are encouraged to come up with your own ideas as well!)
If you attempt the extra credit then you must submit two solutions to CMS. The first solution will be graded by the project rubric so it must implement the algorithm described in this document. Extra credit solutions should include a readme that describes your changes to the algorithm and the thresholds for each benchmark image. The submitted code will be checked against this description. Extra credit submissions which do not improve mean AUC by at least 15% will not be graded. You can measure your mean AUC by running the UI benchmark on the 5 datasets (bikes, graf, leuven, wall, yosemite). We will specifically check performance on the yosemite dataset.
The extra credit will be given based on the merit of your improvement and its justification. Simple hyperparameter changes will receive less extra credit than meaningful improvements to the algorithm.
It is important that you solve the main assignment first before attempting the extra credit. The main assignment will be worth much more points than the extra credit.
This project is worth a total of 50 points for 497P, and 60 points for 597P. Points are earned for correctness and efficiency, while deductions are possible for issues with clarity, coding style, or submission mechanics.
Correctness (30 points) | |
30 points | Correctness as determined by automated tests. Some tests have large tolerances, so approximately correct solutions may receive partial credit. |
Efficiency (10 points) | |
3 points | Corner detection is done with filtering routines |
3 points | Descriptor is computed with a single call to cv2.warpAffine |
4 points | Tests run within order of magnitude of the runtime the solution code, which contains no special optimizations. |
Writeup (10 points) | |
4 points | ROC curves and AUC scores are reported for all four combinations of descriptor and distance metric. |
3 points | harris.png for yosemite/yosemite1.jpg and commentary |
2 points | Feature matching example on your own images |
1 point | Approximate number of hours you spent on the assignment is listed at the top of the writeup. |
597P Only (10 points) | |
5 points | Multiscale feature implementation |
5 points | Writeup - description of your multi-scale approach and ROC curves for at least one dataset. |
Clarity Deductions for poor coding style may be made. Please see the syllabus for general coding guidelines. Up to two points may be deducted for each of the following:
Submission Mechanics Up to 10 points may be deducted for problems with submission mechanics that require manual handling: for example, problems with your git repository, code that fails to run with the automated test suite, failure to notify me of your late submission, etc.