Marking the Bench

I know it seems like I’m really milking this Bubble Sort thing. But we did just discuss three different implementations of the Bubble Sort (Algorithm and Blues, part 1, part 2 and part 3). This kind of begs the question, which is better? In fact, if you still have that copy of Jon Orwant’s book, Algorithms in Perl, handy, flip back to page 11. Right up front there, before even getting into the algorithms, he talks about the importance of actually benchmarking your implementations. If optimization is important, make sure you actually run the code! Since Perl has some built-in benchmarking tools, why don’t we take a look at them, using our favorite TinyBubbles.pl script.

Our TinyBubbles.pl script includes the following code to execute our subroutines and output the results.

local $OUTPUT_AUTOFLUSH = 1;

my $DEBUG = 1;

bubblesort_me(["wilma", "pebbles", "fred", "dino", "barney", "bam-bam"]);
bubblesort_jon(["wilma", "pebbles", "fred", "dino", "barney", "bam-bam"]);
bubblesort_don(["wilma", "pebbles", "fred", "dino", "barney", "bam-bam"]);

We will replace this with the following to execute our benchmarking.

use Benchmark; 

local $OUTPUT_AUTOFLUSH = 1;

my $DEBUG = 0;
my $sortCylces = 1000000;

my $results = timethese ( $sortCylces
            , { '1_bubbleMe'  => 'bubblesort_me(["wilma", "pebbles", "fred", "dino", "barney", "bam-bam"])'
              , '2_bubbleJon' => 'bubblesort_jon(["wilma", "pebbles", "fred", "dino", "barney", "bam-bam"])'
              , '3_bubbleDon' => 'bubblesort_don(["wilma", "pebbles", "fred", "dino", "barney", "bam-bam"])'
            }
          );
Benchmark::cmpthese($results);

Obviously, to use the Benchmark module, we must include it with the use Benchmark; line. We also introduce a new variable, $sortCycles and set it to 1,000,000. This will be the number of times we run our subroutine. To be clear, benchmarking is another of those areas I’ve made no real study of. The little bit I understand of it is we run it many, many times to normalize side effects of our script competing with the operating system and other processes for CPU time, etc. In fact, the Benchmark module will complain with the following “(warning: too few iterations for a reliable count)” if the number of cycles you selected is considered to be too small. All I did was keep upping the amount until it took longer to run than I was willing to sit around waiting.

We then set our output variable ($DEBUG) to zero since we really don’t want to see that information scroll by a million times. Plus, the actual time needed to dump the results to the screen is considerable.

The timethese() function is exported by the Benchmark module so can be invoked directly. It takes two parameters. the first is the number of times we want to execute each subroutine. This is set by our $sortCycles variable. The second parameter is a hash array containing the code we want to execute. The hash key is used as the title for each execution. It is also sorted in alphabetical order to determine which code to run first. That’s why I put numbers at the beginning of each title. The value for each key is our actual code. It is passed as a string and eval’d by the Benchmark module. In our case, we can take the exact line of code from our previous script and put it here within single quotes.

Called by itself, timethese() will pass each key/value pair of title/code to Benchmark::timethis() and produce some nice stats for each subroutine. However, timethis() and consequently, timethese() will also return a data structure we can use for other purposes. So we create a variable, $results to capture that data structure. The data structure itself is documented in the perldoc for the Benchmark module so you can do anything you want with it. For now, we will simply pass it to another Benchmark function cmpthese(). This function is not actually exported by default, so you need to either explicitly export it, create a Benchmark object to invoke it using the Perl object syntax or, as we did here, invoke it using the old Perl Module::function() syntax.

You can now run this script, but it turns out it has one potential flaw in need of correction first. The problem is actually hinted at from the code above, but let’s take a look at the beginning of one our subroutines to help make it more obvious.

sub bubblesort_me{
    my $words = shift;
    ...
}

We are passing our array to the subroutine as a reference. This means our subroutine does not make a temporary copy of the array to work on and then pass it back to the calling program. It means the subroutine is working directly on the original array. What this means is, subsequent calls to our subroutine, passing this same array will actually pass one that is already sorted!

It turns out, this particular version of my code does not actually suffer this problem. That’s why I wrote it the way I did. Since the entire code line being passed to timethese() is eval’d, the array passed is actually a new array, that just happens to look a lot like the one used in the previous runs. However, earlier versions of my script did suffer this problem. Also, I know what is coming next. So, for safety’s sake, let’s take care of it now. We will simply take the array we are passed and copy it into a new array. And yes, doing so would be cause for conversation as this would be a potentially unnecessary performance hit. Setting that discussion aside for the moment, the variable declarations for each subroutine will now look something like this.

sub bubblesort_me{
    my $unsortedWords = shift;

    my @words = @$unsortedWords;

    my $words = \@words;

    my $wordCount = @$words;
    my $lastIndex = $#$words;
...
}

The above allows for the fewest changes to our code, while making it obvious we are making a local copy of our array (if not exactly explaining why).

Running this produces the following:

PS D:\Dev\PerlPlay\TinyBubbles> .\TinyBubblesB0.pl
Benchmark: timing 1000000 iterations of 1_bubbleMe, 2_bubbleJon, 3_bubbleDon...
1_bubbleMe: 12 wallclock secs (12.25 usr +  0.00 sys = 12.25 CPU) @ 81639.32/s (n=1000000)
2_bubbleJon:  9 wallclock secs (10.11 usr +  0.00 sys = 10.11 CPU) @ 98921.75/s (n=1000000)
3_bubbleDon: 24 wallclock secs (24.55 usr +  0.00 sys = 24.55 CPU) @ 40738.18/s (n=1000000)
               Rate 3_bubbleDon  1_bubbleMe 2_bubbleJon
3_bubbleDon 40738/s          --        -50%        -59%
1_bubbleMe  81639/s        100%          --        -17%
2_bubbleJon 98922/s        143%         21%          --
PS D:\Dev\PerlPlay\TinyBubbles>

The first line of output (starting with “Benchmark: …”) is from the timethese() function call and tells us what it is about to do. The next three lines are the output from each individual call to timethis() and gives the results for each subroutine. The output is more completely described in the perldoc but the short form is, bubble_jon() was the fastest, followed by bubble_me() and finally, bubble_don().

The rest of the output is from the cmpthese() call which creates a comparison table. Tested subroutines are listed in order from slowest to fastest. The columns show how each subroutine compared to the others. So from this, we can see bubble_jon() was 143% faster than bubble_don() and 21% faster than bubble_me().

Now, we could consider this the end of the story. But was anyone besides me surprised that bubble_don() seemed to perform so poorly? If you read the sections of Jon Orwant’s book on benchmarking and sorting (or if you just already know the answer to this question) you might not. But to me, it appeared that bubble_don() would perform the fewest iterations in all cases but one. True, that’s the case we tested, but still it seems it should have done no more iterations than bubble_jon().

So let’s run a few more tests. The test above had our word list in precisely reverse order. So let’s try a more random starting order. Consider instead the following calls.

my $results = timethese ( $sortCylces
          , { '1_bubbleMe'  => 'bubblesort_me(["wilma", "fred", "barney", "bam-bam", "pebbles", "dino"])'
            , '2_bubbleJon' => 'bubblesort_jon(["wilma", "fred", "barney", "bam-bam", "pebbles", "dino"])'
            , '3_bubbleDon' => 'bubblesort_don(["wilma", "fred", "barney", "bam-bam", "pebbles", "dino"])'
            }
          );

Doing so nets us the following results which does show us an improvement. Now bubble_jon() shows itself only 77% better than bubble_don().

PS D:\Dev\PerlPlay\TinyBubbles> .\TinyBubblesB0.pl
Benchmark: timing 1000000 iterations of 1_bubbleMe, 2_bubbleJon, 3_bubbleDon...
1_bubbleMe: 11 wallclock secs (10.72 usr +  0.00 sys = 10.72 CPU) @ 93300.99/s (n=1000000)
2_bubbleJon: 10 wallclock secs ( 9.55 usr +  0.00 sys =  9.55 CPU) @ 104744.95/s (n=1000000)
3_bubbleDon: 17 wallclock secs (16.91 usr +  0.00 sys = 16.91 CPU) @ 59150.60/s (n=1000000)
                Rate 3_bubbleDon  1_bubbleMe 2_bubbleJon
3_bubbleDon  59151/s          --        -37%        -44%
1_bubbleMe   93301/s         58%          --        -11%
2_bubbleJon 104745/s         77%         12%          --
PS D:\Dev\PerlPlay\TinyBubbles>

The real kicker however is seeing what happens when we start with a list already ordered. Jon Orwant actually discussed in his book scenarios that would make this a valid or useful test. So, when we use the following call.

my $results = timethese ( $sortCylces
          , { '1_bubbleMe'  => 'bubblesort_me(["bam-bam", "barney", "dino", "fred", "pebbles", "wilma"])'
            , '2_bubbleJon' => 'bubblesort_jon(["bam-bam", "barney", "dino", "fred", "pebbles", "wilma"])'
            , '3_bubbleDon' => 'bubblesort_don(["bam-bam", "barney", "dino", "fred", "pebbles", "wilma"])'
            }
          );

We see bubble_don() now moves into the FIRST position and is 194% better than bubble_me() and 95% better than bubble_jon().

PS D:\Dev\PerlPlay\TinyBubbles> .\TinyBubblesB0.pl
Benchmark: timing 1000000 iterations of 1_bubbleMe, 2_bubbleJon, 3_bubbleDon...
1_bubbleMe:  9 wallclock secs ( 9.56 usr +  0.00 sys =  9.56 CPU) @ 104569.70/s (n=1000000)
2_bubbleJon:  7 wallclock secs ( 6.34 usr +  0.00 sys =  6.34 CPU) @ 157629.26/s (n=1000000)
3_bubbleDon:  3 wallclock secs ( 3.25 usr +  0.00 sys =  3.25 CPU) @ 307692.31/s (n=1000000)
                Rate  1_bubbleMe 2_bubbleJon 3_bubbleDon
1_bubbleMe  104570/s          --        -34%        -66%
2_bubbleJon 157629/s         51%          --        -49%
3_bubbleDon 307692/s        194%         95%          --
PS D:\Dev\PerlPlay\TinyBubbles>

So what gives? Well first of all, there is more to it than just what I’ve shown. On consecutive runs with the same data set, bubble_jon() seemed to vary the most in its timing. Occasionally bubble_jon() and bubble_me() would run the same time, but usually, the order stayed as shown above. Second, it is a really small data set which may just make accurate timing difficult. But if we set that aside for the moment there seem to be two obvious areas to look at first in bubble_don(). First, we use a goto rather than an outer loop. This may just be massively inefficient. Second, we have an additional variable to manage and check to control how many times we loop. Neither of the other two solutions did this. This extra manipulation may be costing us more than we think. One might even argue we are comparing apples to oranges here. This may be especially true since we could just as easily have interpreted Dr. Knuth’s description as a while or even a for loop, rather than a goto.

So what does this all tell us? Well, not much really. Mostly it tells us we need to do more work. First of all, the code efficiency cannot be analyzed by just looking at the code. The environment matters, in this case, the data. A word list is not just a word list. It comes in different states (unsorted, partially sorted, mostly/all sorted) and this state may drastically affect the actual performance. This article did not define a scope for the environment. Let’s also be clear, the three Bubble Sorts we described are NOT the same algorithm! They have minor variations that make a difference. Finally, our interpretation of Dr. Knuth’s algorithm may be valid, but it may not be the best one for Perl. As we mentioned above, making two loops would not have necessarily deviated from the given description.

The one thing it does tell us is testing can be enlightening. It gives credence to the warning against premature optimization. It tells us, there is value in taking a moment to walk through the park, take a seat on the bench, and watch our code play.

See the code here…

#!C:\Strawberry\perl\bin\perl.exe -w

use strict;
use warnings;
use diagnostics;
use English;
use Benchmark; 

local $OUTPUT_AUTOFLUSH = 1;

my $DEBUG = 0;
my $sortCylces = 1000000;

my $results = timethese ( $sortCylces
          , { '1_bubbleMe'  => 'bubblesort_me(["wilma", "fred", "barney", "bam-bam", "pebbles", "dino"])'
            , '2_bubbleJon' => 'bubblesort_jon(["wilma", "fred", "barney", "bam-bam", "pebbles", "dino"])'
            , '3_bubbleDon' => 'bubblesort_don(["wilma", "fred", "barney", "bam-bam", "pebbles", "dino"])'
            }
          );
Benchmark::cmpthese($results);

sub bubblesort_me{
    my $unsortedWords = shift;
    my @words = @$unsortedWords;

    my $words = \@words;
    my $wordCount = @$words;
    my $lastIndex = $#$words;

    DeBug($words, "presort", "bubblesort_me") if $DEBUG;
    for (my $i = 1; $i <= $wordCount -1; $i++) {
        printf( "%08.4f%%", ($i/$wordCount)*100 ) if $DEBUG;
        for (my $j=0; $j <= $lastIndex - 1; $j++){
            if ($words->[$j] gt $words->[$j+1]) {
                @$words[$j, $j+1] = @$words[$j+1, $j];
            }
        }
        printf ("\b\b\b\b\b\b\b\b\b") if $DEBUG;
    }
    DeBug($words, "postsort", "bubblesort_me") if $DEBUG;
}

sub bubblesort_jon{
    my $u = shift;
    my @array = @$u;

    my $array = \@array;
    my $i;
    my $j;

    DeBug($array, "presort", "bubblesort_jon") if $DEBUG;
    for ($i = $#$array; $i; $i--) {
        printf( "%08.4f%%", ($i/$#$array)*100 ) if $DEBUG;
        for ($j=1; $j<=$i; $j++){
            if ($array->[$j-1] gt $array->[$j]) {
                @$array[$j, $j-1] = @$array[$j-1, $j];
            }
        }
        printf ("\b\b\b\b\b\b\b\b\b") if $DEBUG;
    }
    DeBug($array, "postsort", "bubblesort_jon") if $DEBUG;

}

sub bubblesort_don{
    use Array::Base +1; # Start array index at 1 to match Algorithm description
    my $u = shift;
    my @R = @$u;

    my $R = \@R;
    my $K = $R;         # secondary reference to records array
    my $BOUND;          # highest index for which the record is not known to be in its final position
    my $j;              # lopp index
    my $t;              # last swapped value array index
    my $N = @$R;        # highest array index (aka, number of array elements)

    DeBug($R, "presort", "bubblesort_don") if $DEBUG;
    B1: # [Initialize BOUND.]
        $BOUND = $N;
    B2: # [Loop on j.]
        $t = 0;
        printf( "%08.4f%%", ($BOUND/$N)*100 ) if $DEBUG;
        for ($j=1; $j<=$BOUND-1; $j++){
    B3: # [Compare/exchange Rj:Rj+1.]
            if ($K->[$j] gt $K->[$j+1]) {
                @$R[$j, $j+1] = @$R[$j+1, $j];
                $t = $j;
            }
        }
        printf ("\b\b\b\b\b\b\b\b\b") if $DEBUG;
    B4: # [Any exchanges?]
    if ($t) {
        $BOUND = $t;
        goto B2;
    }

    DeBug($R, "postsort", "bubblesort_don") if $DEBUG;

    no Array::Base;
}

sub DeBug {
    my $array = shift;
    my $sorting = shift;
    my $calling_subroutine = shift;
    my $tab = "";
    $tab = "     " if ($sorting eq "postsort");
    print $tab . " Called by: " . $calling_subroutine . "\n";
    print $tab . "Word count: " . @$array . "\n";
    print $tab . "Sort State: " . $sorting . "\n";
    print $tab . "First word: " . $array->[0] . "\n";
    print $tab . " Last word: " . $array->[-1] . "\n";
    print "\n";
}

Leave a Reply Cancel reply