Spring 2022
You are provided with a partial implementation of a Binary Search tree in AVL.java
. Your task will be to complete this implementation and turn the BST into an AVL tree by maintaining the AVL balance property on insertion. You will use this AVL tree to efficiently count the number of unique lines in a text document.
Putting elements into a set is one way to find how many unique items are in the set. Though we aren’t using the name, you’ll notice that our AVL tree is implementing the Set ADT here, storing a set of Strings with no duplicates allowed.
The Github Classroom invitation link for this assignment is in Assignment 2 on Canvas. Begin by accepting the invitation and cloning a local working copy of your repository as you did in Assignment 1. Make sure to clone it somewhere outside the local working copies for other assignments and labs (e.g., clone to ~/csci241/a2
) to avoid nesting local repositories.
The main program is found in Unique.java
. This program takes two command line arguments: a mode and the name of a text file. The mode is either naive
or avl
(it defaults to avl
if anything other than those two is given). The program reads lines from the text file one by one and prints the number of unique lines. A naive \(O(n^2)\) solution is already implemented for you in naiveUnique
. Your job is to implement the avlUnique
method such that it counts the number of unique lines in the file in \(O(n \log n)\). To do this, you’ll complete the AVL tree implementation in AVL.java
.
Two sample text files are provided to demo the difference between the naive and efficient solutions. prefixes.txt
contains the first 5 characters of each word in a large list of English words (you can find a similar one on the lab systems at /usr/share/dict/american-english
). prefixes_small.txt
contains the first 50,000 lines of that file. On my laptop, the naive implementation takes around 26 seconds on the complete file and about 3 seconds for the small file. The efficient implementation takes a second or less on either file, thanks to its \(O(n \log n)\) runtime.
$ gradle run --args "avl prefixes.txt"
> Task :compileJava UP-TO-DATE
> Task :processResources NO-SOURCE
> Task :classes UP-TO-DATE
> Task :run
Finding unique lines in prefixes.txt
prefixes.txt
AVL:
65137
BUILD SUCCESSFUL in 1s
2 actionable tasks: 1 executed, 1 up-to-date
$ gradle run --args "naive prefixes.txt"
> Task :compileJava UP-TO-DATE
> Task :processResources NO-SOURCE
> Task :classes UP-TO-DATE
> Task :run
Finding unique lines in prefixes.txt
Naive:
65137
BUILD SUCCESSFUL in 26s
2 actionable tasks: 1 executed, 1 up-to-date
$
Skeleton code is provided in your repository. The AVL class in app/src/main/java/avl/AVL.java
currently implements the search
functionality for a BST.
Implement standard BST (not AVL) insert functionality in the provided bstInsert
method stub. As with search
, the AVL class has a public bstInsert(String w)
method that calls a private bstInsert(Node n, String w)
method that recursively inserts on nodes. Notice that AVL class has a size
field that should be kept up to date as words are inserted. Note: bstInsert
does not need to keep heights up-to-date; this is only necessary in avlInsert
, and you should assume that bstInsert
calls are not mixed with avlInsert calls.
Implement leftRotate
and rightRotate
helper methods to perform a rotation on a given node. Use the lecture slides as a reference.
Implement rebalance
to fix a violation of the AVL property caused by an insertion. In the process, you’ll need to correctly maintain the height
field of each node. Remember that height needs to be updated any time the tree’s structure changes.
Implement avlInsert
to maintain AVL balance in the tree after insertions using the rebalance
method.
Use your completed AVL tree class to implement the avlUnique
method in Unique.java
such that it runs in \(O(n \log n)\) time.
For up to 5 points of extra credit, you may complete some or all of the enhancements described below.
Submit the A2 Survey quiz, including the estimated total number of hours you spent on this assignment.
private
helper methods as you need. You are especially encouraged to use helper methods for things like calculating balance factors, updating heights, etc., in order to keep the code for intricate procedures like rebalance
easy to read.Unique.java
. Error catching beyond this is not required - you may assume well-formed user input and that method preconditions will not be violated.printTree
never follows parent pointers; this means parent pointers can be misplaced and printTree
will still look normal.height
method from Lab 3 is \(O(n)\), which means it’s not suitable for our purposes. To maintain efficiency, you’ll need to update the height of each node along the insertion path from the bottom up.app/src/test/java/avl/AVLTest.java
. Use gradle test
often and pass tests for each task before moving onto the next.Enhancements and git The base project will be graded based on the master branch of your repository. Before you change your code in the process of completing enhancements, create a new branch in your repository (e.g., git checkout -b enhancements
). Keep all changes related to enhancements on this branch—this way you can add functionality, without affecting your score on the base project. Make sure you’ve pushed both master and enhancements branches to GitHub before the submission deadline.
You can earn up to 3 points of extra credit for completing the following:
(1 point) Implement remove
, maintaining AVL balance.
(1 point) The base assignment implements a Set of Strings. In your enhancements branch, modify your code to instead implement a Map from strings to integers. This will allow it to behave something like a HashMap<String, Integer>
would. In the context of lines of a document, this means you’ll keep track of the number of occurrences of each line you’ve seen. Modify your main program to use this to calculate the most frequently-occurring line in addition to the number of unique lines.
(1 point) Modify your tree to fully support the following semantics for removal: removing an element either decrements its count (if count was greater than 1) or removes it from the tree entirely (if the count was 1).
If you complete any of the above, explain what you did, how you did it, and instructions for testing your enhancements in a comment at the top of the corresponding java file.
Start small, test incrementally, and git commit often. Please keep track of the number of hours you spend on this assignment, as you will be asked to report it in A2 Survey. Hours spent will not affect your grade.
The tasks are best completed in the order presented. Make sure you pass the tests for the current task before moving on to the next. Rotations and rebalancing are the trickiest part. Visit the mentors, come to office hours, or post on Piazza if you are stuck. A suggested timeline for completing the assignment in a stress-free manner is given below:
Submit the assignment by pushing your final changes to GitHub before the deadline, then submitting A2 Survey on Canvas. If you completed any enhancements, be sure to push your enhancements branch as well.
You can earn points for the correctness and efficiency of your program, and points can be deducted for errors in commenting, style, clarity, and following assignment instructions. A2 is out of a total of 50 points.
Submission
Code : Correctness
Code : Efficiency
avlInsert
maintains \(O(\log n)\) performance by keeping track of node heights and updating them as necessaryUnique
processes a document with \(n\) words in \(O(n \log n)\) timeClarity deductions (up to 2 points each)