Perl-将文件文本解析为哈希

Perl-将文件文本解析为哈希,perl,parsing,hash,Perl,Parsing,Hash,我想解析一个文件文本,然后将其放入散列。我的文件看起来像: key1 val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val, val,val,val,val key2 val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val, val,val,val,val key3 val key4 val,val key5 val,v

我想解析一个文件文本,然后将其放入散列。我的文件看起来像:

key1 val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,
val,val,val,val
key2 val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,
val,val,val,val
key3 val
key4 val,val
key5 val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,
val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,
val,val,val,val,val,val,val,val,val,val,val,val,val,val,val
我的键在空格之前,我的值是空格之后和每个逗号之前的元素列表。我有一些行没有键,因为值在几行上继续

所以我想要这样的散列(我对Python最熟悉):

我的代码: `

my%hashNames;
打开infle,“./file.txt”或die$!;
我的@temp=();
while(我的$line=)
{
my@names=split/[\t,]/,$line;
my$ID=$names[0];
如果($line=~/\t/)
{
我的@temp=();
对于(我的$i=1;$i<@names;$i+=1)
{
推送(@temp,$names[$i]);
}
}
其他的
{   
对于(我的$i=0;$i<@names;$i+=1)
{
推送(@temp,$names[$i]);
}       
}
}`
给你:

my %results;
my $key;
while(my $line = <INFILE>) {
    chomp($line);
    my @items = split(/, */, $line);
    $key = shift @items;
    $results{$key} = \@items;
}
my%结果;
我的$key;
while(我的$line=){
chomp($line);
my@items=拆分(/,*/,$行);
$key=shift@items;
$results{$key}=\@items;
}
除了您的陈述外,它适用于简单的情况:

我有一些行没有键,因为值在几行上继续

不过,要处理这个问题,您必须解释如何检测下一行是键还是值。如果知道,则可以将其放入If语句中,并使用previous键向哈希添加新值:

my %results;
my $key;
while(my $line = <INFILE>) {
    chomp($line);
    my @items = split(/, */, $line);
    my $tmpkey = shift @items;
    if (is_real_key($tmpkey)) {
        $key = shift @items;
        $results{$key} = \@items;
    } else {
        push (@{$results{$key}}, $tmpkey, @items);
    }
}
my%结果;
我的$key;
while(我的$line=){
chomp($line);
my@items=拆分(/,*/,$行);
my$tmpkey=shift@items;
如果(是真正的钥匙($tmpkey)){
$key=shift@items;
$results{$key}=\@items;
}否则{
push(@{$results{$key}},$tmpkey,@items);
}
}

您的问题是新行不再分隔记录。因此,处理它的一种方法是禁用无效的默认输入记录分隔符
$/
,并模拟一个有效的分隔符:

use strict;
use warnings;
use Data::Dumper;

my %hash;
my $file;
{
    local $/;         # disable input record separator
    $file = <DATA>;   # entire file here now!
}

for my $line (split /^(?=\S+ )/m, $file) {  # records begin this way now
    $line =~ s/\n//g;                       # remove newlines
    my ($key, $val) = split ' ', $line, 2;  # divide into two fields
    $hash{$key} = [ split /,/, $val ];      # store the data
}

print Dumper \%hash;

__DATA__
key1 val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,
val,val,val,val
key2 val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,
val,val,val,val
key3 val
key4 val,val
key5 val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,
val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,
val,val,val,val,val,val,val,val,val,val,val,val,val,val,val
使用严格;
使用警告;
使用数据::转储程序;
我的%hash;
我的$file;
{
本地$/;#禁用输入记录分隔符
$file=#现在将整个文件放在这里!
}
对于我的$line(split/^(?=\S+)/m,$file){#记录现在就这样开始了
$line=~s/\n//g;#删除换行符
my($key,$val)=拆分“”,$line,2;#分为两个字段
$hash{$key}=[split/,/,$val];#存储数据
}
打印转储程序\%hash;
__资料__
键1 val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,
瓦尔,瓦尔,瓦尔,瓦尔
键2 val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,
瓦尔,瓦尔,瓦尔,瓦尔
键3 val
键4瓦尔,瓦尔
键5 val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,
val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,
val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val
解释:

  • 使用
    /m
    修饰符在
    /^(?=\S+)/m
    上拆分意味着
    ^
    现在将匹配字符串内的换行符,这将模拟输入记录分隔符
  • 在两个字段中拆分字符串的方法是在
    split
  • 我们使用一个匿名数组
    […]
    直接拆分成散列,其中包含一个split语句

这里的困难在于,您的记录以“不带逗号的换行符”结尾。遗憾的是,无法将输入记录分隔符
$/
设置为正则表达式。这就留下了三个舒适的解决方案:

  • 将整个文件加载到内存中。这并不像听起来那么糟糕,因为我们稍后在散列中有相同数量的信息。然后我们可以
    split/(?)来获得实际记录

    my %hash = do {
      local $/; # set to undef, for slurp
      map {
        my ($key, $vals) = split /\s+/, $_, 2; # split on first whitespace, into two strings
        $key => [ split /\s*,\s*/, $vals ];    # return a list of a key and a value array
      } split /(?<!,)\n/, <FILE>;              # split the file into records
    };
    
  • !/usr/bin/perl
    严格使用;
    使用警告;
    使用特征“说”;
    使用数据::转储程序;
    我的$res_hash={};
    我的($current_key,$values);
    我的$push_又来了;
    while(我的$line=){
    chomp$行;
    push({$resu hash->{$current_key},split(/,/,$values)),如果($current_key和$values和($index($line,)>0));
    如果(索引($line,)>0){
    $push_再次=0;
    ($current_key,$values)=拆分(/\s/,$line);
    }否则{
    $values.=$line;
    $push_再次=1;
    }
    };
    如果$push\u再次出现,则push(@{$res\u hash->{$current\u key},split(/,/,$values));
    说“结果:”.Dumper($res_hash);
    __资料__
    键1 val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,
    瓦尔,瓦尔,瓦尔,瓦尔
    键2 val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,
    瓦尔,瓦尔,瓦尔,瓦尔
    键3 val
    键4瓦尔,瓦尔
    键5 val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,
    val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,
    val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val
    
    使用模块

    !/usr/bin/env perl
    严格使用;
    使用警告;
    使用Parse::RecDescent;
    我们的%hash;
    我的$p=Parse::RecDescent->new(q!
    散列:条目
    条目:键值(s/,/){$::散列{$item[1]}=[@{$item[2]}]}
    密钥:/\S+/
    值:/([^,\n]\\\,])+/
    !);
    死“$0:未能创建解析器”,除非定义了$p;
    my$text=do{{{local$/;}};
    $p->hash($text)或死“$0:parse failed”;
    for(排序键%hash){
    打印“$\=>valx”,标量@{$hash{$\},“\n”;
    }
    __资料__
    键1 val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,
    瓦尔,瓦尔,瓦尔,瓦尔
    键2 val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,
    瓦尔,瓦尔,瓦尔,瓦尔
    键3 val
    键4瓦尔,瓦尔
    键5 val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,
    val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,
    val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val
    
    输出:

    key1 => val x 22 key2 => val x 22 key3 => val x 1 key4 => val x 2 key5 => val x 52 键1=>val x 22 键2=>val x 22 键3=>val x 1 键4=>val x 2
    key5=>val x 52向我们展示您的尝试如果您知道如何使用Python,请向我们展示。我认为我必须阅读每一行,如果有空格,我必须创建一个新键并推送不同的值(在列表中),如果
    my %hash = do {
      local $/; # set to undef, for slurp
      map {
        my ($key, $vals) = split /\s+/, $_, 2; # split on first whitespace, into two strings
        $key => [ split /\s*,\s*/, $vals ];    # return a list of a key and a value array
      } split /(?<!,)\n/, <FILE>;              # split the file into records
    };
    
    my %hash;
    while(<FILE>) {
      $_ .= <FILE> while /,\n\z/;
      my ($key, $value) = split /\s+/, $_, 2;
      push @{ $hash{$key} }, split /\s*,\s*/, $value; # allow multiple occurrences of one key, simply append values to list.
    }
    
    #!/usr/bin/perl
    
    use strict;
    use warnings;
    use feature 'say';
    
    use Data::Dumper;
    
    my $res_hash = {};
    my ($current_key, $values);
    my $push_again;
    while ( my $line = <DATA>) {
      chomp $line;
      push ( @{ $res_hash->{$current_key} }, split(/,/, $values) ) if ( $current_key and $values and ( index($line, ' ') > 0) );
      if ( index($line, ' ') > 0 ){
        $push_again = 0;
        ($current_key, $values) = split( /\s/, $line);    
      } else {
        $values .= $line;
        $push_again = 1;
      }
    
    };
    push ( @{ $res_hash->{$current_key} }, split(/,/, $values) ) if $push_again;
    
    say "result:".Dumper($res_hash);
    
    
    
    __DATA__
    key1 val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,
    val,val,val,val
    key2 val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,
    val,val,val,val
    key3 val
    key4 val,val
    key5 val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,
    val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,
    val,val,val,val,val,val,val,val,val,val,val,val,val,val,val
    
    #! /usr/bin/env perl
    
    use strict;
    use warnings;
    
    use Parse::RecDescent;
    
    our %hash;
    my $p = Parse::RecDescent->new(q!
      hash: entry(s?)
      entry: key value(s /,/)  { $::hash{$item[1]} = [ @{ $item[2] } ] }
      key: /\S+/
      value: /([^,\n]|\\,])+/
    !);
    die "$0: failed to create parser" unless defined $p;
    
    my $text = do {{ local $/; <DATA> }};
    $p->hash($text) or die "$0: parse failed";
    
    for (sort keys %hash) {
      print "$_ => val x ", scalar @{ $hash{$_} }, "\n";
    }
    
    __DATA__
    key1 val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,
    val,val,val,val
    key2 val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,
    val,val,val,val
    key3 val
    key4 val,val
    key5 val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,
    val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,
    val,val,val,val,val,val,val,val,val,val,val,val,val,val,val
    
    key1 => val x 22 key2 => val x 22 key3 => val x 1 key4 => val x 2 key5 => val x 52