Handout 37

Heap Sort


Heaps

A heap is a binary tree (not a binary search tree) with the following properties:

Full and dense means that only the bottom level of the tree has gaps, and the gaps are all on the right.

Being dense means the linear array representation is the most appropriate. A linked representation would waste space.

A min-heap contains the minimum value as the root, a max-heap contains the maximum value as the root.

Here are some heaps of varying sizes

    2     2   2    2     2      2
   / \        |   / \   / \    / \
  3   7       3  3   7 3   7  3   7
 / \  |               /      / \
9   7 8              9      9   7

Priority Queue

A heap is a great data structure for implementing a priority queue, because

It also turns out we can use the routines for adding and removing items to create an N log2N sort algorithm called heap sort.

Adding Data (min-heap)

The algorithm for adding data to a min-heap is simple, with one unintuitve aspect: we add an item to the bottom first then move it upwards, if necessary:

Walking up the tree will take at most log2N steps (compares and data transfers), because the tree is always balanced, so the height of the tree is log2N.

Here's an example, based on inserting the values 5, 6, 4, 4, 7, 2

Deleting Data (min-heap)

Both uses of heaps (priority queues and heap sort) only need to delete the root, so that's the only case we'll consider here. Deleting is somewhat similar to adding in that we'll replaced the deleted root with another item from the tree and bubble it down. The algorithm is made only slightly more complex because we have two children to worry about instead of just one parent.

  1. Remove the root.
  2. Replace it with the rightmost item in the bottom level of the tree.

    The new item is almost certainly in the wrong place because it's one of the larger data elements, but this keeps the heap full and dense.

  3. If the new root has no children smaller than it is, we're done.
  4. If it has just a left child that is smaller, swap it with the child and go back to step 2 with the left subtree.
  5. If it has two children that are smaller, swap it with the smallest child and go back to step 2 with the affected subtree.

Note that

Walking down the tree will take at most log2N steps (compares and data transfers). We may have to do three comparisons (parent with each child and child against child) but 3 log2N is still O(log2N).

Here's a sample case of removal of least element.

    2          ?          5          3          3
   / \        / \        / \        / \        / \
  3   4  to  3   4  to  3   4  to  5   4  to  4   4
 / \  |     / \  |     / \        / \        / \
6   4 5    6   4 5    6   4      6   4      6   5

Implementing Heaps

As mentioned above, heaps are best implemented using a linear array. The move up and down the tree are simple jumps around the array and the array is mostly filled. The only downside is that data has to be moved around, unless, of course, what we really have is an array of pointers to data, which is the best way to go with non-numeric data.

heap template implementation using an array

Some Tips on Array Implementation

Consider the following heap:

              a 
             / \
            b   c
           / \ / \              
           d c f g 

We can implement this heap as an array:

Index

0

1

2

3

4

5

6

Value

a

b

c

d

e

f

g

From any element in the array, we may find the elements parent by the relationship:

parent_index = (current_index - 1) / 2

We can the left and right children of any element by the relationships:

left_child_index = 2 * current_index + 1right_child_index = 2 * current_index + 2


sources include this and this and this