Sorting with Sort::Maker

Perl Best Practices writes "Consider building your sorting routines with Sort::Maker". As of version 1.1, Foswiki ships CPAN:Sort::Maker with the default libraries. This topic shows a couple of examples to get you started using this module.

Introduction

Using a subroutine like make_sorter( ) to create efficient sorts is a very good practice. It allows you to focus on specifying your sort criteria correctly, instead of on the mechanics of sorting. It also factors out the comparatively large amounts of coding infrastructure needed to optimize your sorts.

You don't even have to write make_sorter( ) yourself. The Sort::Maker CPAN module provides a very sophisticated implementation of the subroutine. It has options for building sorting subroutines using Orcish or Schwartzian optimizations, as well as the more advanced Guttman-Rosler Transform.

from Perl Best Practices

Which sorting routine performs best for the task can only be measured. Because Sort::Maker does all the low level work for you, it is quite easy to change the sorting algorithm to test which one performs best.

Examples

See CPAN:Sort::Maker for detailed documentation.

The sub make_sorter is exported by Sort::Maker. It makes a sort sub and returns a reference to it.

For instance, to create a sort sub that sorts case insensitive, you write:
my $sorter = make_sorter(qw( plain string no_case ));
my @sortedTopics = $sorter->(@topics);

Plain text sort

Plain sorting doesn't do any key caching. It is fine for short input lists and functions as benchmark against other sorting routines.

use Sort::Maker;

my @topics = qw(rJNXR thuJGf yHrmq cNmN nQMkT whWSpNOZQq GLtiYESuzPVJ OmRidI nqwv vnZwqHnJAH
gRzBDzu aHNymPrDEcqe BuLuTtXSLYI MQm qVv QrC TrGeqgzXO VtaTM QEx rAoCc);

# create plain sorter, sorting case insensitive
my $sorter = make_sorter(qw( plain string no_case ));
die "make_sorter: $@" unless $sorter;

# sort
my @sortedTopics = $sorter->(@topics);

use Data::Dumper;
print Dumper @sortedTopics;

Key sort

This example sorts a list of hashes, each with keys 'name' (string) and 'date' (number). The sort sub uses Schwarzian Transform (param ST) that consumes and outputs an array reference (params ref_in ref_out). See Wikipedia:Schwartzian_transform.
use Sort::Maker;

# create a list of hashes
# each with keys 'name' (string) and 'date' (number)
my $itemCount = 15;

my @topics =
  map { { name => "Topic$_", date => int rand 1000000 } } 1 .. $itemCount;

# create Schwarzian Transform sorter
# that consumes and outputs an array reference
my $sorter = make_sorter( qw( ST ref_in ref_out ), number => '$_->{date}' );

# sort
my $sortedTopicsRef = $sorter->( \@topics );

use Data::Dumper;
print Dumper $sortedTopicsRef;

Key sort 2

This example sorts a list of hashes, each with keys 'name' (string) and 'bug_severity' (string), similar to the example above. It uses init_code to declare lexical variables that the extraction code can use.

use Sort::Maker;

my $itemCount = 15;

my @values = qw(severe noncritical critical);
my @bugs =
  map { { name => "Topic$_", bug_severity => $values[ int rand 3 ] } }
  1 .. $itemCount;

my $sorter = make_sorter(
   'ST',
   name      => 'severitySorter',
   init_code => 'my %lookup = (
                critical => 3,
                severe => 2,
                noncritical => 1 );',
   number => [ code => '$lookup{$_->{bug_severity}}' ],
);

# sort
my @sortedBugs = severitySorter(@bugs);

use Data::Dumper;
print Dumper @sortedBugs;

Sort on multiple keys

This example sorts hashes on 2 keys: "date" (number) and "name" (string, case insensitive). The sort keys are enumerated in the make_sorter parameters:
my $sorter = make_sorter(
   qw( ST ),
   number => '$_->{date}',
   string => {
      code    => '$_->{name}',
      no_case => 1,
   }
);

The complete example:
use Sort::Maker;

sub generate_random_string {
    my $length_of_randomstring = shift;    # the length of
                                           # the random string to generate

    my @chars = ( 'a' .. 'z', 'A' .. 'Z' );
    my $random_string;
    foreach ( 1 .. $length_of_randomstring ) {

        # rand @chars will generate a random
        # number between 0 and scalar @chars
        $random_string .= $chars[ rand @chars ];
    }
    return $random_string;
}

# create a list of hashes
# each with keys 'name' (string) and 'date' (number)
my $itemCount = 15;

my @topics =
  map { { name => &generate_random_string(5), date => int rand 3 } }
  1 .. $itemCount;

# create Schwarzian Transform sorter
# with 2 sort keys
my $sorter = make_sorter(
   qw( ST ),
   number => '$_->{date}',
   string => {
      code    => '$_->{name}',
      no_case => 1,
   }
);

# sort
my @sortedTopics = $sorter->(@topics);

use Data::Dumper;
print Dumper @sortedTopics;

Sort with GRT

The example above can be quickly changed to use a Guttman-Rosler Transform.

The Guttman-Rosler Transform (popularized by Uri Guttman and Larry Rosler) uses a string to cache the extracted keys as well as either the record or its index. [...] The GRT gains its speed by using a single byte string to cache all of the extracted keys from a given input record. Packing keys into a string such that it will lexically sort the correct way requires some deep mojo and data munging. But that is why this module was written - to hide all that from the coder.

from CPAN:Sort::Maker

The call to make_sorter can simply be changed by substituting ST with GRT:
my $sorter = make_sorter(
   qw( GRT ),
   number => '$_->{date}',
   string => {
      code    => '$_->{name}',
      no_case => 1,
   }
);

If you use GRT, you can pass additional parameters, for example unsigned so the key can be packed in 4 bytes, or varying to tell the GRT that the key value is of varying length.

With these extra parameters, the call becomes:
my $sorter = make_sorter(
   qw( GRT ),
   number => {
      code     => '$_->{date}',
      unsigned => 1,
   },
   string => {
      code    => '$_->{name}',
      no_case => 1,
      varying => 1,
   }
);

Contributors: -- ArthurClemens - 03 Apr 2010

Since this was written we have removed the default dependency on Sort::Maker, as it was major overkill for the job. Please feel free to install and use this module.

-- CrawfordCurrie - 14 Mar 2014
Topic revision: r2 - 14 Mar 2014, CrawfordCurrie
The copyright of the content on this website is held by the contributing authors, except where stated elsewhere. See Copyright Statement. Creative Commons License    Legal Imprint    Privacy Policy