Handout 1 - Sorting


Selection Sort

The basic idea of the selection sort algorithm is:

1. Find ("Select") the largest element of the array or list of data and place it at the end of the array by swapping it with the last element of the array.

2. Now find the largest element in the subarray that is the original array minus the last element. Place this element as the last element of the subarray.

3. Now find the largest element in the subarray that is the original array minus the last two elements. Place this element at the end of this subarray.
4. Etc.

29 10 14 37 13 Initial array
29 10 14 13 37 Find largest element (37) and swap with the last element.

Next we consider the subarray consisting of the original array minus the last element:
13 10 14 29 37 Find the largest element of the subarray (29) and swap with the last element of this subarray

Now we consider the subarray that is the original array minus the last two elements:
13 10 14 29 37 The largest element in this subarray (14) is already in its proper place: no swap needed

Now we consider the subarray that is the original array minus the last three elements:
10 13 14 29 37 Find the largest element (13) and place at the end of this subarray. Done.

code analysis

Insertion Sort

Insertion Sort is probably what you would intuitively do if you were sorting a deck of cards that has spilled on the floor: You pick up one item (card) at a time and place it in its proper place in a sorted stack that you create as you pick up the cards.

In sorting an array with this method, we will divide the array into to two regions: a sorted and an unsorted region:

Initially, we assume that the first element is the whole unsorted region, and all the other elements are in the unsorted region. Insertion Sort now loops through the elements of the unsorted region, inserting each one into its proper place in the sorted region. The code given below inserts the element into the correct location in the sorted region by shifting the elements in the sorted region, one at a time, to the right until it detects an element that is less than the element to be inserted, or the beginning of the array is reached. It then inserts the element at this location.

Here is a small example worked out in detail:

Sort the 10: 29 10 14 37 13 Initial array: The 29 is the only element in the sorted region, the 10 needs to be inserted in the sorted region.
29 29 14 37 13 We save a copy of the 10 and compare it to the element to its left: the 29. Since the 10 is less than the 29, we "shift" the 29 to the right by copying it into the element to its right (where the 10 was)
Sort the 14: 10 29 14 37 13 Now we have reached the beginning of the array so we insert the 10 in this location. The sorted region now includes the 10 and the 29. We now consider the 14. We save a copy of the 14 and since the 14 is less than the element to its left (the 29), we shift the 29 to the right by copying it into the element to its right.
10 29 29 37 13 Now we compare the 14 to the 10. Since the 14 is greater than the 10, we insert the 14 in the current location - the second element of the array.
Sort the 37: 10 14 29 37 13 The sorted region now consists of the 10, 14, and 29 and we need to insert the 37. Since the 37 is greater than the last element of the sorted region (the 29) we leave it where it is.
Sort the 13:
10 14 29 37 13 Now the sorted region consists of the 10, 14, 29 and 37 and we consider the 13: We save a copy of the 13, and since it is less than the element to its left (the 37), we shift the 37 by copying it into the element to its right.
10 14 29 37 37 Now we compare the 13 to the next element, the 29. Again the 13 is less and we shift the 29 by copying it to the right
10 14 29 29 37 Now we compare the 14 to the next element, the 14. The 14 is still greater than the 13 so we shift the 14 to the right by copying.
10 14 14 29 37 Now finally the 13 is greater than the 10 and we insert the 13 in the current location (second element of the array).
10 13 14 29 37 The array is sorted.

code analysis

Bubble Sort

We make multiple passes through the list. For each pass do the following:

1. Compare the first element with the second element, and if the first element is larger, swap the two elements.

2. Now compare the second element with the third, again swapping the two if necessary.

3. This process is continued for all pairs of elements until the end of the list is reached

On the second pass, because the largest element "bubbled" to the end of the list on the first pass, we only need to compare pairs of elements up through the second to last element.

Likewise on the third pass, we do not need to consider the last two elements. Etc.

We continue making passes through the loop until either:

a) If during a pass through the list, no swaps need to be made, the data must be sorted and we are done.

b) In the worst case, the maximum possible number of passes that would have to be made is N-1 for a list with N elements.

Pass # 1:
29 10 14 37 13 Initial array: Compare the first pair of elements (the 29 and the 10): Swap the 29 with the 10
10 29 14 37 13 Compare the second pair of elements: Swap the 29 with the 14
10 14 29 37 13 Compare the third pair of elements: Do not need to swap the 29 with the 37.
10 14 29 37 13 Compare the last pair of elements: Swap the 13 and the 37. Note that
the largest element (the 37) has ""bubbled" to the end.
10 14 29 13 37 (End of pass #1)
Pass #2:
10 14 29 13 37 Now again start at the beginning of the list and compare the first pair of numbers:
Do not need to swap the 10 and the 14
10 14 29 13 37 Now consider the second pair, the 14 and the 29: Do not need to swap
10 14 29 13 37 Now the third pair: Need to swap the 29 and the 13
10 14 13 29 37 Since the largest element (the 37) bubbled to the end on the first pass we do not need to
consider the last pair.
10 14 13 29 37 (End of pass #2)
Pass #3:
10 14 13 29 37 Again, starting at the beginning, compare the 10 and the 14: Do not need to swap.
10 14 13 29 37 Now compare the 14 and the 13: Swap
10 13 14 29 37 Since the previous two passes placed the two largest (29 and 37) elements, done with this pass
Pass #4:
10 13 14 29 37 Compare the first pair: Do not need to swap
10 13 14 29 37 Since the previous three passes placed the last three elements. we are done.

Note that we had to go through this list with the maximum possible number of passes (N-1 or 4 in this case).

code analysis

Quick Sort Algorithm

Quick Sort is another very efficient, recursive sorting algorithm. The key step in Quick Sort is a "partitioning" of the array into two sections: One section where all the elements are less than a chosen "pivot" element, and another section where all the elements are greater than the pivot element:

This partitioning step therefore does a "rough" sort of the array into elements larger and elements smaller than the pivot element. After the array has been partitioned, the two partitions are sorted - using Quick Sort! The code for the Quick Sort method looks like this:

void QuickSort(int A[], int F, int L) {
        int PivotIndex;

        if (F<L) {
                // Create the partitions
                Partition(A,F,L,PivotIndex);

                // Call QuickSort on S1 (the lower partition)
                QuickSort(A,F,PivotIndex-1);
                
                // Call QuickSort on S2 (the upper partition)
                QuickSort(A, PivotIndex+1, L);
        }
}

We pass the function the array A[], and the first and last indices (F and L) of the subarray of A[] that should be sorted. The function "Partition" is the function that performs the partitioning step. QuickSort will make recursive calls to itself until the subarray only has one element (F == L) (this is the recursive base case). When Partition is passed a two element subarray (the recursive level just before the base case) it will actually sort the subarray. So it is through the Partition function that the array actually gets rearranged and sorted.

The Partition function looks like this:

     1  ////////////////////////////////////////////////////
     2  // FUNCTION: Partition(int A[], int F, int L, int& PivotIndex)
     3  // Partitions an array for QuickSort
     4  // ARGUMENTS:
     5  //      A[]:    The array
     6  //      F:      First index of the part to be partitioned
     7  //      L:      Last index of the part to be partitioned
     8  //      PivotIndex:     Index of the pivot element
     9  // POSTCONDITION: The array is partitioned into elements that
    10  //      are less than the pivot element, and elements that are
    11  //      greater than the pivot element.  The boundary is at the
    12  //      PivotIndex.
    13  ////////////////////////////////////////////////////
    14  void Partition(int A[], int F, int L, int& PivotIndex)
    15  {
    16
    17          PivotIndex = F; // Can try others,
    18                          // but must put as first element
    19
    20          int Pivot = A[F];       // Pivot element
    21
    22          int LastS1 = F;         // Will be the border between S1, S2
    23          int FirstUnknown = F+1; // First "unknown" element index
    24                                  // (to be placed)
    25
    26          // move one element at a time until unknown region is empty:
    27          for (; FirstUnknown <= L; FirstUnknown++) {
    28
    29                  // Move item from unknow to proper region
    30                  // Increment the number of compares for the following:
    31                  if (A[FirstUnknown] < Pivot) {  // belongs in S1
    32                          LastS1++;
    33                          Swap(A[FirstUnknown], A[LastS1]);
    34                  }
    35                  // else item belongs in S2 and will increment
    36                  // FirstUnknown anyways.
    37
    38          } // end for
    39
    40          // At this point the array is partitioned and only
    41          // need to move the pivot to its proper place:
    42          Swap(A[F], A[LastS1]);
    43          PivotIndex = LastS1;
    44
    45  } // end of Partition

Partition has to first choose an element to be the pivot element. If the original array is truly random, then one choice for the pivot element is as good as any other. So, partition chooses the first element to be the pivot element (on line 20). Partition then proceeds to examine each element in the array starting with the second element (loop on lines 27-38). Partition places elements in either the partition containing elements less than the pivot (called "S1"), or in the partition containing elements greater than or equal to the pivot (called "S2"). The index variables "LastS1" and "FirstUnknown" keep track of the end of the S1 region and the end of the S2 region ("FirstUnknown" is the element after the last element in S2) (see diagram below).

The key to the efficiency of Partition and Quick Sort is due to the way Partition has been designed and the minimal number of swaps or copies needed to place an element in S1 or S2. Initially the pivot element is placed or chosen as the first element and the rest of the array is unplaced or "unknown":

As Partition progresses, it will place elements in their proper locations, and at some point in the process will look something like this:

Now consider the placement of the next unknown element (with index "FirstUnknown"). There are two possibilities: The element belongs in S1 or in S2.

A. Suppose it belongs in S1 (x < p). One method would be to shift all the elements in S2 to the right in order to make room for the element "x". This is similar to what happens in the Insertion Sort algorithm and is not very efficient. Instead, the much more efficient procedure here is this:

1. Swap the element "x" with the first element in S2 (labelled with "z" above).

Now we have this situation:

But "x" belongs in S1 and "z" belongs in S2. So we do this:

2. Increment "LastS1" so that S1 now contains "x".

3. Increment "FirstUnknown" so that S2 contains "z".

These three operations are much simpler and less time consuming than, say, shifting all the elements in S2 to the right to make room for the new element.

B. Suppose "x" belongs in S2 (x>p). We again have this situation:

This time the procedure is even more trivial:

1. We simply increment "FirstUnknown" so that now S2 includes "x".

After all of the elements have been placed in S1 or S2 we have this diagram:

To put the pivot element between the S1 and S2 partitions we simply swap the first (pivot) element with the element with index "LastS1":

Partition then "returns" the value of the Pivot Element index ("LastS1") (through pass by reference) back to QuickSort, which then calls QuickSort on the lower and upper partitions.

code analysis

Merge Sort Algorithm

Merge Sort is a very efficient, recursive sorting algorithm. The basic idea is to divide the list or array into two halves, sort each half, and then merge the two halves. The key is the merging process. This is accomplished by comparing the first elements of each half, choosing the smaller one, and placing this as the next element of a temporary array. This process is continued until the two halves are empty.

How do we sort the two halves in the first place? We use Merge Sort! The code for the Merge Sort algorithm looks like this:

void MergeSort(int A[], int F, int L) {
        int Mid;        // Will be index of the middle element

        if (F < L) {    // only do if array more than one element

                Mid = (F + L)/2;

                // Sort the first half:
                MergeSort(A,F,Mid);

                // Sort the second half:
                MergeSort(A, Mid+1, L);

                // Now merge the two halves
                Merge(A,F,Mid,L);
        }
        return;
} // end MergeSort

We pass the function the array A[], and the first and lastindices (F and L) of the subarray of A[] that should be sorted. MergeSort will make recursive calls to itself until the subarray only has one element (F == L) (this is the recursive base case). At some point in the recursive process, the Merge function will be merging single element subarrays, creating sorted two element arrays. The "backing out" stage of the recursions will then merge, in a sorted manner, larger and larger subarrays until the whole array is sorted. It is the Merge function that is really doing the comparing and placing of the elements in their correct order.

MergeSort Example

code analysis

Comparison of Sorting Algorithms

----------------------------------------------------------------
Algorithm     Average     Best     Worst     In-place     Stable
----------------------------------------------------------------
Selection      O(n^2)   O(n^2)    O(n^2)       Yes          No
Insertion      O(n^2)   O(n^2)    O(n^2)       Yes          Yes
Bubble         O(n^2)   O(n^2)    O(n^2)       Yes          No
Quicksort      O(nlgn)  O(nlgn)   O(n^2)       Yes          No
Heapsort       O(nlgn)  O(nlgn)   O(nlgn)      Yes          No
Merge          O(nlgn)  O(nlgn)   O(nlgn)      No           Yes
----------------------------------------------------------------


sources include this and this and this