Spring 2023
You are provided with a partial implementation of a Binary Search
tree in AVL.java
. Your task will be to complete this
implementation and turn the BST into an AVL tree by maintaining the AVL
balance property on insertion. You will use this AVL tree to efficiently
count the number of unique lines in a text document.
Putting elements into a set is one way to find how many unique items are in the set. Though we aren’t using the name, you’ll notice that our AVL tree is implementing the Set ADT here, storing a set of Strings with no duplicates allowed.
The Github Classroom invitation link for this assignment is in
Assignment 2 on Canvas. Begin by accepting the invitation and cloning a
local working copy of your repository as you did in Assignment 1. Make
sure to clone it somewhere outside the local working copies for other
assignments and labs (e.g., clone to ~/csci241/a2
) to avoid
nesting local repositories.
The main program is found in Unique.java
. This program
takes two command line arguments: a mode and the name of a text file.
The mode is either naive
or avl
(it defaults
to avl
if anything other than those two is given). The
program reads lines from the text file one by one and prints the number
of unique lines. A naive \(O(n^2)\)
solution is already implemented for you in naiveUnique
.
Your job is to implement the avlUnique
method such that it
counts the number of unique lines in the file in \(O(n \log n)\). To do this, you’ll complete
the AVL tree implementation in AVL.java
.
Two sample text files are provided to demo the difference between the
naive and efficient solutions. prefixes.txt
contains the
first 5 characters of each word in a large list of English words (you
can find a similar one on the lab systems at
/usr/share/dict/american-english
).
prefixes_small.txt
contains the first 50,000 lines of that
file. On my laptop, the naive implementation takes around 26 seconds on
the complete file and about 3 seconds for the small file. The efficient
implementation takes a second or less on either file, thanks to its
\(O(n \log n)\) runtime.
$ gradle run --args "avl prefixes.txt"
> Task :compileJava UP-TO-DATE
> Task :processResources NO-SOURCE
> Task :classes UP-TO-DATE
> Task :run
Finding unique lines in prefixes.txt
prefixes.txt
AVL:
65137
BUILD SUCCESSFUL in 1s
2 actionable tasks: 1 executed, 1 up-to-date
$ gradle run --args "naive prefixes.txt"
> Task :compileJava UP-TO-DATE
> Task :processResources NO-SOURCE
> Task :classes UP-TO-DATE
> Task :run
Finding unique lines in prefixes.txt
Naive:
65137
BUILD SUCCESSFUL in 26s
2 actionable tasks: 1 executed, 1 up-to-date
$
Skeleton code is provided in your repository. The AVL class in
app/src/main/java/avl/AVL.java
currently implements the
search
functionality for a BST.
Implement standard BST (not AVL) insert functionality in
the provided bstInsert
method stub. As with
search
, the AVL class has a public
bstInsert(String w)
method that calls a private
bstInsert(Node n, String w)
method that recursively inserts
on nodes. Notice that AVL class has a size
field that
should be kept up to date as words are inserted. Note:
bstInsert
does not need to keep heights up-to-date; this is
only necessary in avlInsert
, and you should assume that
bstInsert
calls are not mixed with avlInsert
calls.
Implement leftRotate
and rightRotate
helper methods to perform a rotation on a given node. Use the lecture
slides as a reference.
Implement rebalance
to fix a violation of the AVL
property caused by an insertion. In the process, you’ll need to
correctly maintain the height
field of each node. Remember
that height needs to be updated any time the tree’s structure
changes.
Implement avlInsert
to maintain AVL balance in the
tree after insertions using the rebalance
method.
Use your completed AVL tree class to implement the
avlUnique
method in Unique.java
such that it
runs in \(O(n \log n)\) time.
For up to 5 points of extra credit, you may complete some or all of the enhancements described below.
Submit the A2 Survey quiz, including the estimated total number of hours you spent on this assignment.
private
helper methods as
you need. You are especially encouraged to use helper methods for things
like calculating balance factors, updating heights, etc., in order to
keep the code for intricate procedures like rebalance
easy
to read.Unique.java
. Error catching beyond this is not
required - you may assume well-formed user input and that method
preconditions will not be violated.printTree
never
follows parent pointers; this means parent pointers can be misplaced and
printTree
will still look normal.height
method from Lab 3 is \(O(n)\), which means it’s not suitable for
our purposes. To maintain efficiency, you’ll need to update the height
of each node along the insertion path from the bottom up.app/src/test/java/avl/AVLTest.java
. Use
gradle test
often and pass tests for each task before
moving onto the next.Enhancements and git The base project will be graded
based on the master branch of your repository. Before you change your
code in the process of completing enhancements, create a new branch in
your repository (e.g., git checkout -b enhancements
). Keep
all changes related to enhancements on this branch—this way you can add
functionality, without affecting your score on the base project. Make
sure you’ve pushed both master and enhancements branches to GitHub
before the submission deadline.
You can earn up to 3 points of extra credit for completing the following:
(1 point) Implement remove
, maintaining AVL
balance.
(1 point) The base assignment implements a Set of Strings. In
your enhancements branch, modify your code to instead implement a Map
from strings to integers. This will allow it to behave something like a
HashMap<String, Integer>
would. In the context of
lines of a document, this means you’ll keep track of the number of
occurrences of each line you’ve seen. Modify your main program to use
this to calculate the most frequently-occurring line in addition to the
number of unique lines.
(1 point) Modify your tree to fully support the following semantics for removal: removing an element either decrements its count (if count was greater than 1) or removes it from the tree entirely (if the count was 1).
If you complete any of the above, explain what you did, how you did it, and instructions for testing your enhancements in a comment at the top of the corresponding java file.
Start small, test incrementally, and git commit often. Please keep track of the number of hours you spend on this assignment, as you will be asked to report it in A2 Survey. Hours spent will not affect your grade.
The tasks are best completed in the order presented. Make sure you pass the tests for the current task before moving on to the next. Rotations and rebalancing are the trickiest part. Visit the mentors, come to office hours, or post on Piazza if you are stuck. A suggested timeline for completing the assignment in a stress-free manner is given below:
Submit the assignment by pushing your final changes to GitHub before the deadline, then submitting A2 Survey on Canvas. If you completed any enhancements, be sure to push your enhancements branch as well.
You can earn points for the correctness and efficiency of your program, and points can be deducted for errors in commenting, style, clarity, and following assignment instructions. A2 is out of a total of 50 points.
Submission
Code : Correctness
Code : Efficiency
avlInsert
maintains \(O(\log n)\) performance by keeping track of
node heights and updating them as necessaryUnique
processes a document with \(n\) words in \(O(n \log n)\) timeClarity deductions (up to 2 points each)