So what is the fastest way to count how often a substring occurs inside a string, e.g. how many dashes can be found in “1-4-7-8-37-5-7-8-3-42″? I identified five ways to make perl say “9″ to this challenge:

  1. my $size = (scalar(@{[$string =~ /-/g]}) + 1);
  2. my $size = scalar(split /-/, $string);
  3. my $size = (($string =~ s/-//g) + 1);
  4. my $size = (($string =~ tr/-//) + 1);
  5. my $size = 1; $size++ while $string =~ /-/g;

I would have assumed that number 1 would be the fastest way as it requires no modifcation of the string. A quick run with “cmpthese” from the Benchmark module revealed that my assumption was wrong:

Rate match (1) split (2) while (5) repl (3) tr (4)
match (1) 51106/s -90% -93% -94% -95%
split (2) 495540/s 870% -36% -45% -47%
while (5) 771605/s 1410% 56% -15% -18%
repl (3) 908265/s 1677% 83% 18% -4%
tr (4) 941620/s 1742% 90% 22% 4%

The table shows a clear advantage when counting the occurences via a while-loop (solution 5), count the replaced characters (solution 3) or truncating all occurences (solution 4). While “tr” was the fastest way during this test run, it wasn’t the best way on other benchmark runs or others systems. If you want to try it yourself you can do so with the attached test-script.

Attached Files:

test.pl 0K