Algorithm and Blues

I’m not really an algorithms kinda guy. To be honest, they have always felt a bit like brain teasers and other such puzzles which I’ve never had any use for. This is probably a result of having never studied computers despite making a career out of using them. Computers are tools to be used not something to appreciate in their own right and the “algorithm” is a tool I’ve never needed and so never invested in.

And yet, computers are a fascinating piece of artwork. They are in fact more art than engineering, especially when it comes to coding, and I do enjoy art. Since I’ve made a career out of programming and configuring computers and I have a real appreciation of art, I’ve felt increasingly compelled to dig into algorithms and similar programming theory.

Oh, and I got my butt handed to me once in an interview because I blew the Bubble Sort coding test…

What got me (besides the fact I never studied it) was I got hung up on the word “Bubble” rather than the word “Sort.” Lame, I know, but there it is. I had this vision of grabbing the first word in the array and “bubbling it up” to its correct and final location in the, soon to be sorted, array. But that’s not how it works.

What is really happening is you are sorting your array in reverse. What bubbles up after each run is what will be the LAST [sorted] word in the array. So first you find the last word, then the second to the last and so on until you finally place the second (and by default the first) words into their final place in the array. It takes little thinking to realize you have to run through the array no more than n-1 times where n is the number of words in the array. Dr. Donald Knuth, in the Art of Computer Programming describes it this way (notice the beautiful styling of the algorithm):

Bubble Sort Records R₁, … , R_N are rearranged in place; after sorting is complete their keys will be in order, K₁ ≤ … ≤ K_N.

Initialize bound. Set bound ← N. bound is the highest index for which the record is not known to be in its final position; thus we are indicating that nothing is known at this point.
Loop on j. Set t ← 0. Perform step B3 for j = 1, 2, … , bound – 1, and then go to step B4. If bound = 1, this means go directly to B4.
Compare/exchange R_j:R_j+1. if K_j > K_j+1, interchange R_j ↔ R_j+1 and set t ← j.
Any exchanges? If t = 0, terminate the algorithm. Otherwise set bound ← t and return to step B2.

The Art of Computer Programming, Volume 3, Sorting and Searching, Second Edition

The first part is just a general description of the purpose of the algorithm. Note, this is not specifically a word sort, but a record sort. The list is composed of records “R” and is sorted alphabetically based on a key “K.” In a list of words, the entire record [word] is the key.

The first step does nothing more than set the upper boundary, aka. the index of the last word in the array.

The second step looks a bit confusing, but is actually pretty straight forward. We will loop on the index j, performing the next step (B3). Additionally, we set a temporary variable t equal to 0.

Step three is the bubbly part. We compare they values of our first and second records and if needed, we swap them. Each time we exchange two records, we set the variable t equal to the current index j.

At the end of the loop we check our variable t. If it equals 0, it means no records were exchanged and thus our list is sorted! That means, even if we have not looped n-1 times, we got lucky, our list is sorted so we can short circuit this process and exit.

You might notice the part where we set BOUND = t. What’s up with that? While strictly speaking unnecessary, it is a small optimization. Remember how I said we are sorting our list from back to front? What that means is, at any given time in our looping, some part of our array to the end is already sorted. That starting point is represented by the index of the last records we sorted. By resetting BOUND at the end of each loop, we make each loop there after shorter by at least 1 iteration which is pretty clever.

So, having looked this over, let’s see what it looks like in PERL.

Before diving into the algorithm itself, let’s do a little setup. Most books and tutorials tend to jump right into the discussion and the end result is a code snippet, rather than fully functional, running code. So, let’s start from the beginning.

#!C:\Strawberry\perl\bin\perl.exe -w

use strict;
use warnings;
use diagnostics;
use English;

local $OUTPUT_AUTOFLUSH = 1;

my $DEBUG = 1;

The first line is the “shebang” line and tells the operating system where your Perl interpreter is. On most operating systems, this allows you to invoke the script directly rather than having to call Perl first (e.g. perl -w script.pl). The ‘-w‘ command line parameter is similar to ‘use warnings‘ except it extends the warnings pragma to any called scripts as well as the script itself. Additionally, ‘use warnings‘ can be modified to include/exclude certain warnings as well as modifying their behavior slightly if you want. Including (use) strict, warnings, diagnostics and English are all fairly common additions to any Perl script. There are plenty of discussions out there on these so I won’t go into them here. The local $OUTPUT_AUTOFLUSH = 1 line forces any output to be immediately flushed (printed to the screen), rather than buffered. This is most helpful when using debugging print statements. If you do not include use English; this would be accomplished with $| = 1 where ‘|’ is the vertical pipe symbol. Finally, we create our own DEBUG flag. A value of 1 means print the debug statement, 0 means skip it. In normal run mode, this will be set to zero.

Next, let’s decide to put our algorithm into a separate subroutine. This will allow us later to create alternative designs and compare them. We’ll name our subroutine ‘bubblesort_don‘ for Dr. Donald Knuth and we will pass it a reference to an array. For time traveling reasons (i.e. I did more things with this later so went back and changed it), we will build that array directly in the call to the subroutine, rather than creating the array first.

#!C:\Strawberry\perl\bin\perl.exe -w

use strict;
use warnings;
use diagnostics;
use English;

local $OUTPUT_AUTOFLUSH = 1;
my $DEBUG = 1;

bubblesort_don(["wilma", "pebbles", "fred", "dino", "barney", "bam-bam"]);

sub bubblesort_don{
}

Just like our program starts off with some “standard” environmental setup, so does our subroutine. So let’s set the stage so to speak.

sub bubblesort_don{
    use Array::Base +1; # Start array index at 1 to match Algorithm description
    my $R = shift;      # reference to array of records (words)
    my $K = $R;         # secondary reference to records array.  used as the "key" for each record (word)
    my $BOUND;          # highest index for which the record is not known to be in its final position
    my $j;              # loop index
    my $t;              # last swapped value array index
    my $N = @$R;        # highest array index (aka, number of array elements)

    DeBug($R, "presort", "bubblesort_don") if $DEBUG;


    DeBug($R, "postsort", "bubblesort_don") if $DEBUG;

    no Array::Base;
}

So, the weird thing first. use Array::Base +1; Perl, like most (but not all) programming languages starts its array indexing at zero, not one. So the first element of a Perl array is usually, $array[0] and not $array[1]. However, Dr. Knuth describes his array index as starting from one. While not normally recommended, using the Array::Base Perl module allows us to write our code to more closely reflect the algorithm as described. The no Array::Base; line tells Perl to go back to normal mode.

The variable names declared after that directly correspond to those in the algorithm description. $R is a reference to our unsorted array and is passed as part of the subroutine call (discussed later). $K is a second reference to the same array. In the algorithm description, R is used to refer to the value of each record and K the key to that record. In our example however, each word is both record and key. Like changing the starting index, this is strictly speaking unnecessary, but allows our code to map more directly to the algorithm.

For more time traveling reasons we’ll create a subroutine named DeBug() which will output our array both before and after we sort it. I’ll show that subroutine later, for now, just know this is how we will call it.

Now, let’s add the code between the DeBug() calls. However, a quick Public Safety Announcement. The following code (like the use of Array::Base) will NOT follow Perl best practices. This code is specifically laid out to mimic the algorithm, which itself is unnecessary when implementing algorithms. It does however, facilitate discussion of the algorithm by demonstrating it with working code.

Step B1 is simple. It just says to set $BOUND = $N; where $BOUND and $N are already described above. The line before, that while uncommon in modern Perl is nothing more than a label with a comment following it.

    B1: # [Initialize BOUND.]
        $BOUND = $N;

Step B2 sets up the loop to execute step B3. As mentioned above, to match the algorithm description, our array index will start from 1 and increment by 1 with each loop. As step B3 says, we will be comparing each word in the array to the following word. Obviously we cannot compare the last word to something that does not exist so the maximum number of times we will loop through our comparison logic is the total number of words minus one.

    B2: # [Loop on j.]
        $t = 0;
        for ($j=1; $j<=$BOUND-1; $j++){
    B3: # [Compare/exchange Rj:Rj+1.]

        }

Step B3, inside the loop is where all the work happens. Very simply, each word is compared to the word after it in the array. If the first is greater than the second, it will “bubble up.” i.e. they are swapped. We then set t, our last swapped index, to the current index which indicates both that a swap occurred, and the index it occurred at.

            if ($K->[$j] gt $K->[$j+1]) {
                @$R[$j, $j+1] = @$R[$j+1, $j];
                $t = $j;
            }

After the entire array has been checked ($j<=$BOUND-1), we exit the loop and execute step B4. We check to see if our last swap index ($t) is equal to anything other than zero. If it is, reset $BOUND to the value of $t. Remember, this is a slight optimization that may allow for fewer loops in the next execution of step B3 if some words are already in alphabetical order. If $t does equal zero, it means no swaps were made, therefore, the list must be in order and we are done.

    B4: # [Any exchanges?]
    if ($t) {
        $BOUND = $t;
        goto B2;
    }

The entire code, all together looks like this (with our DeBug subroutine added for completness):

#!C:\Strawberry\perl\bin\perl.exe -w

use strict;
use warnings;
use diagnostics;
use English;

local $OUTPUT_AUTOFLUSH = 1;

my $DEBUG = 1;

bubblesort_don(["wilma", "pebbles", "fred", "dino", "barney", "bam-bam"]);

sub bubblesort_don{
    use Array::Base +1; # Start array index at 1 to match Algorithm description
    my $R = shift;      # reference to array of records (words)
    my $K = $R;         # secondary reference to records array.  used as the "key" for each record (word)
    my $BOUND;          # highest index for which the record is not known to be in its final position
    my $j;              # lopp index
    my $t;              # last swapped value array index
    my $N = $#$R;       # highest array index (aka, number of array elements)

    DeBug($R, "presort", "bubblesort_don") if $DEBUG;
    B1: # [Initialize BOUND.]
        $BOUND = $N;
    B2: # [Loop on j.]
        $t = 0;
        for ($j=1; $j<=$BOUND-1; $j++){
    B3: # [Compare/exchange Rj:Rj+1.]
            if ($K->[$j] gt $K->[$j+1]) {
                @$R[$j, $j+1] = @$R[$j+1, $j];
                $t = $j;
            }
        }
    B4: # [Any exchanges?]
    if ($t) {
        $BOUND = $t;
        goto B2;
    }
    DeBug($R, "postsort", "bubblesort_don") if $DEBUG;
    no Array::Base;
}

sub DeBug {
    my $array = shift;
    my $sorting = shift;
    my $calling_subroutine = shift;
    my $tab = "";
    $tab = "     " if ($sorting eq "postsort");
    print $tab . " Called by: " . $calling_subroutine . "\n";
    print $tab . "Word count: " . @$array . "\n";
    print $tab . "Sort State: " . $sorting . "\n";
    print $tab;
    print join("\n$tab", @$array),"\n";
    print "\n";
}

Running this yields the following:

PS C:\Dev\PerlPlay\TinyBubbles> .\TinyBubbles.pl
 Called by: bubblesort_don
Word count: 6
Sort State: presort
wilma
pebbles
fred
dino
barney
bam-bam

      Called by: bubblesort_don 
     Word count: 6
     Sort State: postsort       
     bam-bam
     barney
     dino
     fred
     pebbles
     wilma

PS C:\Dev\PerlPlay\TinyBubbles>

And we have sorted! Time to open some bubbly…

3 thoughts on “Algorithm and Blues”