In this sequence of problems, we’ll try to show that the worst-case runtime of AVL insertion is O(log n).
What’s the runtime of performing a left or right rotation?
In the lecture video, I argued that each node needs to keep track of its own height. If this is done properly and thus height lookups are O(1), what’s the runtime of AVL insert, in terms of h, the height of the tree?
Suppose instead of storing height in nodes, you used the Lab 3 height
method to calculates the height recursively.
height
?height
called?height
is called on subtrees with different amounts of nodes as you walk up the tree. To simplify things a little, suppose you only consider the calls to height
made when checking the root’s balance factor. In this case, what would be the overall runtime of insert
using the naive height
method? Your answer to this should clarify why we can’t achieve O(log n) runtime for insert
unless we have O(1) access to heights.At this point, we need to link the height \(h\) with the number of nodes \(n\). Whereas for a BST, the worst case height is O(n), we can do better than this by relying on the AVL property. Notice that the worst case runtime for a tree with height \(h\) will come from a tree with the fewest possible nodes while still having that height. We’ll start by building some intuition for this: for each height 0..4
, determine the smallest (i.e., fewest nodes) AVL tree you can build with that height.
Did you notice that the children of a minimal tree of height \(h\) are themselves minimal trees of height \(h-1\) and \(h-2\)? That suggests that we could write down a recursive definition of a function \(N(h)\), which represents the minimum number of nodes in a tree of height \(h\). I’ll give you the base case, and let you write the recursive case:
\(N(0) = 1\)
\(N(1) = 2\)
\(N(h) = ??\)
The last step is to come up with a non-recursive bound on \(N(h)\) based on this definition. Someone will actually teach you how to do this when you get to CSCI 305, but see if you can invent your own solution. Hint: rather than maintain the equality, switch to using an inequality between \(N(h)\) and a simplified version of the right-hand side that is (a) definitely smaller than the exact quantity and (b) is easier to manipulate into something you can calculate in closed form. For example:
\(N(h) =\) your answer from above
\(N(h) >\) something definitely smaller than your answer from above
\(N(h) >\) a closed-form expression equivalent to the previous
\(h\) = something that is \(O(\log N(h))\)