opi: Pointer ans Ende der Datei setzen mit seek

Beitrag lesen

Hello again,

Spalte 1  Spalte 2  Spalte 3  Spalte 4  Spalte 5

dies      ist       ein       test      nummer 1
und       noch      ein       test      nummer 2
schon     wieder    ein       test      nummer 3

Mit split selber würde es nicht gehen, da jeder Delimiter in einer
Spalte selber vorkommen könnte.

ich war nochmal so frei und habe getestet, ob sich eine Datei, in der
die Daten in Tabellenform mit fester Stringlänge abgelegt werden
auch wirklich lohnt oder ob ich einen beliebigen Delimiter als
Trennzeichen nutze und die Daten einfach mit split oder etwas
anderem aufteile.

Mein Resultat ist, dass substr in jedem Fall schneller ist und das
sich eine Datendatei in Tabellenform mit fester Stringlänge in jedem
Fall lohnt.

Ob so:

my $string = "string0    string1   string2   string3   string4   string5   string6   string7   string8   string9   ";

Benchmark::cmpthese(-1, {
           'substr'  =>  sub { my $f0 = substr($string, 0, 10);
                               my $f1 = substr($string, 11, 10);
                               my $f2 = substr($string, 21, 10);
                               my $f3 = substr($string, 31, 10);
                               my $f4 = substr($string, 41, 10);
                               my $f5 = substr($string, 51, 10);
                               my $f6 = substr($string, 61, 10);
                               my $f7 = substr($string, 71, 10);
                               my $f8 = substr($string, 81, 10);
                               my $f9 = substr($string, 91, 10);
                             },
           'split'   =>  sub { my ($f0, $f1, $f2, $f3, $f4, $f5, $f6, $f7, $f8, $f9) = split /\s+/, $string; },
           'regexp'  =>  sub { $string =~ /^(\w+)\s+(\w+)\s+(\w+)\s+(\w+)\s+(\w+)\s+(\w+)\s+(\w+)\s+(\w+)\s+(\w+)\s+(\w+)\s+$/;
                               my $f0 = $1;
                               my $f1 = $2;
                               my $f2 = $3;
                               my $f3 = $4;
                               my $f4 = $5;
                               my $f5 = $6;
                               my $f6 = $7;
                               my $f7 = $8;
                               my $f8 = $9;
                               my $f9 = $10;
                         },
   });

Benchmark: running regexp, split, substr for at least 1 CPU seconds...
    regexp:  1 wallclock secs ( 1.00 usr +  0.06 sys =  1.06 CPU) @ 40572.64/s (n=43007)
     split:  1 wallclock secs ( 1.02 usr +  0.03 sys =  1.05 CPU) @ 51199.05/s (n=53759)
    substr:  1 wallclock secs ( 1.04 usr +  0.04 sys =  1.08 CPU) @ 99554.63/s (n=107519)
          Rate regexp  split substr
regexp 40573/s     --   -21%   -59%
split  51199/s    26%     --   -49%
substr 99555/s   145%    94%     --

Oder so:

my $string1 = "string0    string1   string2   string3   string4   string5   string6   string7   string8   string9   ";
   my $string2 = "string0 string1 string2 string3 string4 string5 string6 string7 string8 string9";

Benchmark::cmpthese(-1, {
           'substr'  =>  sub { my $f0 = substr($string1, 0, 10);
                               my $f1 = substr($string1, 11, 10);
                               my $f2 = substr($string1, 21, 10);
                               my $f3 = substr($string1, 31, 10);
                               my $f4 = substr($string1, 41, 10);
                               my $f5 = substr($string1, 51, 10);
                               my $f6 = substr($string1, 61, 10);
                               my $f7 = substr($string1, 71, 10);
                               my $f8 = substr($string1, 81, 10);
                               my $f9 = substr($string1, 91, 10);
                             },
           'split'   =>  sub { my ($f0, $f1, $f2, $f3, $f4, $f5, $f6, $f7, $f8, $f9) = split /\s+/, $string2; },
           'regexp'  =>  sub { $string2 =~ /^(\w+)\s+(\w+)\s+(\w+)\s+(\w+)\s+(\w+)\s+(\w+)\s+(\w+)\s+(\w+)\s+(\w+)\s+(\w+)\s+$/;
                               my $f0 = $1;
                               my $f1 = $2;
                               my $f2 = $3;
                               my $f3 = $4;
                               my $f4 = $5;
                               my $f5 = $6;
                               my $f6 = $7;
                               my $f7 = $8;
                               my $f8 = $9;
                               my $f9 = $10;
                         },
   });
}

Benchmark: running regexp, split, substr for at least 1 CPU seconds...
    regexp:  2 wallclock secs ( 1.22 usr +  0.03 sys =  1.25 CPU) @ 21504.00/s (n=26880)
     split:  1 wallclock secs ( 1.03 usr +  0.02 sys =  1.05 CPU) @ 54613.33/s (n=57344)
    substr:  2 wallclock secs ( 1.03 usr +  0.06 sys =  1.09 CPU) @ 98642.20/s (n=107520)
          Rate regexp  split substr
regexp 21504/s     --   -61%   -78%
split  54613/s   154%     --   -45%
substr 98642/s   359%    81%     --

Greez,
opi

--
Selfcode: ie:( fl:( br:^ va:) ls:] fo:) rl:( n4:? ss:| de:] ch:? mo:|