如何在Perl中拆分固定宽度的列？_Perl

如何在Perl中拆分固定宽度的列？

perl

如何在Perl中拆分固定宽度的列？,perl,Perl,编程对我来说太陌生了，我很抱歉不知道如何表达这个问题我有一个从内部工具获取变量的Perl脚本。这并不总是它看起来的样子，但它将始终遵循以下模式： darren.local 1987 A Sentence1 darren.local 1996 C Sentence2 darren.local 1991 E Sentence3 darren.local 1954 G

编程对我来说太陌生了，我很抱歉不知道如何表达这个问题

我有一个从内部工具获取变量的Perl脚本。这并不总是它看起来的样子，但它将始终遵循以下模式：

darren.local           1987    A      Sentence1
darren.local           1996    C      Sentence2
darren.local           1991    E      Sentence3
darren.local           1954    G      Sentence4
darren.local           1998    H      Sentence5

使用Perl，最简单的方法是如何将这些行中的每一行单独放入一个变量中？根据内部工具吐出的内容，每条线总是不同的，可以有五条以上的线。每行中的大写字母是其最终排序的依据（全部为As、全部为Cs、全部为Es等）。我应该看看正则表达式吗

使用严格；
use strict;
use warnings;

# this puts each line in the array @lines
my @lines = <DATA>; # <DATA> is a special filehandle that treats
                    # everything after __END__ as if it was a file
                    # It's handy for testing things

# Iterate over the array of lines and for each iteration
# put that line into the variable $line
foreach my $line (@lines) {
   # Use split to 'split' each $line with the regular expression /s+/
   # /s+/ means match one or more white spaces.
   # the 4 means that all whitespaces after the 4:th will be ignored
   # as a separator and be included in $col4
   my ($col1, $col2, $col3, $col4) = split(/\s+/, $line, 4);

   # here you can do whatever you need to with the data
   # in the columns. I just print them out
   print "$col1, $col2, $col3, $col4 \n";
}


__END__
darren.local           1987    A      Sentece1
darren.local           1996    C      Sentece2
darren.local           1991    E      Sentece3
darren.local           1954    G      Sentece4
darren.local           1998    H      Sentece5

使用警告；
#这会将每一行放入数组@lines中
我的@lines=#是一个特殊的文件句柄，用于
#一切结束后，就好像它是一个文件
#它很方便测试东西
#迭代行数组，每次迭代
#将该行放入变量$line中
foreach my$行（@行）{
#使用split将每个$行与正则表达式“拆分”+/
#/s+/表示匹配一个或多个空格。
#4表示将忽略4:th之后的所有空格
#作为分隔符并包含在$col4中
my（$col1，$col2，$col3，$col4）=拆分（/\s+/，$line，4）；
#在这里，您可以对数据执行任何需要的操作
#我只是把它们打印出来
打印“$col1、$col2、$col3、$col4\n”；
}
__结束__
darren.本地1987 A Sentece1
darren.local 1996 C Sentece2
darren.local 1991 E Sentece3
darren.local 1954 G Sentece4
darren.local 1998 H Sentece5

对于每行文本，如下所示：

my ($domain, $year, $grade, @text) = split /\s+/, $line;

我使用数组来表示句子，因为不清楚结尾的句子是否有空格。然后，如果需要，可以将@text数组加入到新字符串中。如果结尾的句子没有空格，则可以将@text转换为$text。

假设文本被放入单个变量$info中，则可以使用固有的perl split函数将其拆分为单独的行：

my @lines = split("\n", $info);

其中@lines是一个行数组。“\n”是换行符的正则表达式。您可以按如下方式循环每一行：

foreach (@lines) {
   $line = $_;
   # do something with $line....  
}

@sorted = sort { $a->[2] <=> $b->[2] } @arr;

然后可以在空白处拆分每一行（regex\s+，其中\s是一个空白字符，+表示1次或多次）：

然后可以通过数组索引直接访问每个字段：$field[0]、$field[1]等

或者，您可以执行以下操作：

($var1, $var2, $var3, $var4) = split("\s+", $line);

这将把每行中的字段放入单独的命名变量中

现在-如果要按第三列中的字符对行进行排序，可以执行以下操作：

my @lines = split("\n", $info); 
my @arr = ();    # declare new array

foreach (@lines) {
   my @fields = split("\s+", $_);
   push(@arr, \@fields)    # add @fields REFERENCE to @arr 
}

现在您有了一个“数组数组”。这很容易分类如下：

foreach (@lines) {
   $line = $_;
   # do something with $line....  
}

@sorted = sort { $a->[2] <=> $b->[2] } @arr;

现在您有了一个由第三列字符组成的散列，每个键的值都是包含该键的行。然后，您可以循环遍历散列并打印出来，或者以其他方式使用散列值。

我喜欢使用这种方法。它快速、灵活、可逆

您只需要知道每列的位置，

unpack

可以自动删除每列中多余的空格

如果更改其中一列中的某些内容，则可以通过使用相同的格式重新打包，很容易返回到原始格式：

my $format = 'A23 A8 A7 A*';

while( <DATA> ) {
    chomp( my $line = $_ );

    my( $machine, $year, $letter, $sentence ) =
        unpack( $format, $_ );

    # save the original line too, which might be useful later
    push @grades, [ $machine, $year, $letter, $sentence, $_ ];
    }

my @sorted = sort { $a->[2] cmp $b->[2] } @grades;

foreach my $tuple ( @sorted ) {
    print $tuple->[-1];
    }

# go the other way, especially if you changed things
foreach my $tuple ( @sorted ) {
    print pack( $format, @$tuple[0..3] ), "\n";
    }

__END__
darren.local           1987    A      Sentence1
darren.local           1996    C      Sentence2
darren.local           1991    E      Sentence3
darren.local           1954    G      Sentence4
darren.local           1998    H      Sentence5

my$format='A23A8A7A*'；
而（）{
chomp（我的$line=$）；
我的（$machine，$year，$letter，$句子）=
解包（$格式，$）；
#也保存原始行，以后可能会有用
推送@grades，[$machine，$year，$letter，$句子，$\uz]；
}
my@sorted=sort{$a->[2]cmp$b->[2]}@grades；
foreach my$tuple（@sorted）{
打印$tuple->[-1]；
}
#走另一条路，特别是如果你改变了事情
foreach my$tuple（@sorted）{
打印包（$format，@$tuple[0..3]），“\n”；
}
__结束__
darren.本地1987 A句1
darren.local 1996 C第2句
darren.本地1991 E句3
darren.本地1954 G语句4
darren.local 1998 H第5句

现在，还有一个额外的考虑。听起来你可能在一个变量中有一大块多行文字。通过打开对标量的引用的filehandle，可以像处理文件一样处理此问题。filehandle的内容会处理其余部分：

 my $lines = '...multiline string...';

 open my($fh), '<', \ $lines;

 while( <$fh> ) {
      ... same as before ...
      }

my$lines='…多行字符串…'；
打开我的（$fh），“使用CPAN和我的模块：
#/usr/bin/env perl
严格使用；
使用警告；
使用DataExtract:：FixedWidth；
我的@rows=；
my$defw=DataExtract:：FixedWidth->new（{heuristic=>\@rows，header\u row=>undef}）；
使用数据：：转储程序；
为@rows打印转储程序$defw->解析（$)；
__资料__
darren.本地1987 A句1
darren.local 1996 C第2句
darren.本地1991 E句3
darren.本地1954 G语句4
darren.local 1998 H第5句

再简单不过了。这些数据/行在哪里？您的内部工具是否将它们放入单个变量中？或者这些文本数据是您需要读入的文件吗？该工具将它们放入一个变量中。+1对于Perl新手-SentenceX是否意味着每行末尾都有一个多字句子？请记住，句子也会被\s+拆分。如果在本例中要使用拆分，请使用第三个参数限制它返回的元素数。如果最后一列有大量空白，您将丢失部分数据。如果在本例中要使用split，请使用第三个参数限制它返回的元素数。如果最后一列有大量空白，您将丢失部分数据。谢谢Richard--每一行都需要按大写字母分组。根据该查询的输出，我可以有多达20行或少至2行。线
#!/usr/bin/env perl
use strict;
use warnings;
use DataExtract::FixedWidth;

my @rows = <DATA>;

my $defw = DataExtract::FixedWidth->new({ heuristic => \@rows, header_row => undef });

use Data::Dumper;

print Dumper $defw->parse( $_ ) for @rows;

__DATA__
darren.local           1987    A      Sentence1
darren.local           1996    C      Sentence2
darren.local           1991    E      Sentence3
darren.local           1954    G      Sentence4
darren.local           1998    H      Sentence5