Perl 当特定单词在给定数据子集中只出现一次时，如何替换该单词？_Perl

Perl 当特定单词在给定数据子集中只出现一次时，如何替换该单词？

perl

Perl 当特定单词在给定数据子集中只出现一次时，如何替换该单词？,perl,Perl,考虑下面的数据集。每个以数字开头的块都是一个“case”。在真实的数据集中，我有数十万个案例。当一个案例（例如案例10001）中只有一个单词排除时，我想将单词“排除”替换为“0” 如果我循环行，我可以计算出在每种情况下我有多少“排除”。但是，如果只有一行有“排除”一词，我不知道如何回到那一行并替换这个词我该怎么做 10001 M1|F1|SP1;12;12;12;11;13;10;Exclusion;D16S539 M1|F1|SP1;12;10;12;9;11;9;3.60;D16S M1|

考虑下面的数据集。每个以数字开头的块都是一个“case”。在真实的数据集中，我有数十万个案例。当一个案例（例如案例10001）中只有一个单词排除时，我想将单词“排除”替换为“0”

如果我循环行，我可以计算出在每种情况下我有多少“排除”。但是，如果只有一行有“排除”一词，我不知道如何回到那一行并替换这个词

我该怎么做

10001
M1|F1|SP1;12;12;12;11;13;10;Exclusion;D16S539
M1|F1|SP1;12;10;12;9;11;9;3.60;D16S
M1|F1|SP1;12;10;10;7;11;7;20.00;D7S
M1|F1|SP1;13;12;12;12;12;12;3.91;D13S
M1|F1|SP1;11;11;13;11;13;11;3.27;D5S
M1|F1|SP1;14;12;14;10;12;10;1.99;CSF
10002
M1|F1|SP1;8;13;13;8;8;12;2.91;D16S
M1|F1|SP1;13;11;13;10;10;10;4.13;D7S
M1|F1|SP1;12;9;12;10;11;16;Exclusion;D13S
M1|F1|SP1;12;10;12;10;14;15;Exclusion;D5S
M1|F1|SP1;13;10;10;10;17;18;Exclusion;CSF

读取文件时，缓冲一个案例中的所有行，并计算排除项

my ($case,$buf,$count) = (undef,"",0);
while(my $ln = <>) {

现在使用正则表达式检测“排除”吗

    elsif( $ln =~ /;Exclusion;/ ) { $count++; }
    $buf .= $l;
}

当你完成后，你可能还有一个案子要处理

if( length($buf)>0 ) {
    $buf =~ s/;Exclusion;/;0;/ if($count==1);
    print $buffer;
}

这是我能想到的最好的了。假设您将文件读入@行

# separate into blocks                                                                 
foreach my $line (@lines) {
    chomp($line);
    if ($line =~ m/^(\d+)/) {
        $key = $1;
    }
    else {
        push (@{$block{$key}}, $line);
    }
}

# go through each block                                                                
foreach my $key (keys %block) {
    print "$key\n";
    my @matched = grep ($_ =~ m/exclusion/i, @{$block{$key}});
    if (scalar (1 == @matched)){
        foreach my $line (@{$block{$key}}) {
            $line =~ s/Exclusion/0/i;
            print "$line\n";
        }
    }
    else {
        foreach my $line (@{$block{$key}}) {
            print "$line\n";
        }
    }
}

子进程\u块{
我的（$block）=@；
$block=~s/\b排除\b/0/
如果$block！~/\beexclusion\b.*\beexclusion\b/s；
印刷（印刷版）；
}
我的$buf；
而（）{
如果（/^\d/）{
如果$buf，则处理块（$buf）；
$buf=''；
}
$buf.=$\；
}
如果$buf，则处理块（$buf）；

这里已经有很多正确答案，它们使用缓冲区存储“案例”的内容

这里有另一个解决方案，使用

tell

和

seek

回放文件，因此不需要缓冲区。当您的“大小写”非常大并且您对性能或内存使用非常敏感时，这可能非常有用

use strict;
use warnings;

open FILE, "text.txt";
open REPLACE, ">replace.txt";

my $count = 0;      # count of 'Exclusion' in the current case
my $position = 0;
my $prev_position = 0;
my $first_occur_position = 0;   # first occurence of 'Exclusion' in the current case
my $visited = 0;    # whether the current line is visited before

while (<FILE>) {
    # keep track of the position before reading
    # the current line
    $prev_position = $position;
    $position = tell FILE;

    if ($visited == 0) {
        if (/^\d+/) {
            # new case
            if ($count == 1) {
                # rewind to the first occurence 
                # of 'Exclusion' in the previous case
                seek FILE, $first_occur_position, 0; 
                $visited = 1;
            }
            else {
                print REPLACE $_;
            }
        }
        elsif (/Exclusion/) {
            $count++;
            if ($count > 1) {
                seek FILE, $first_occur_position, 0;
                $visited = 1;
            }
            elsif ($count == 1) {
                $first_occur_position = $prev_position;
            }
        }
        else {
            print REPLACE $_ if ($count == 0);
        }

        if (eof FILE && $count == 1) {
            seek FILE, $first_occur_position, 0;
            $visited = 1;
        }
    }
    else {
        if ($count == 1) {
            s/Exclusion/0/;
        }
        if (/^\d+/) {
            $position = tell FILE;
            $visited = 0;
            $count = 0;
        }
        print REPLACE $_;
    }
}

close REPLACE;
close FILE;

使用严格；
使用警告；
打开文件“text.txt”；
打开REPLACE，“>REPLACE.txt”；
我的$count=0；#当前案例中的“排除”计数
我的$position=0；
我的$prev_位置=0；
我的$first_occurrent_position=0；#当前案例中首次出现“排除”
我的$visited=0；#之前是否访问过当前行
而（）{
#在阅读之前，请跟踪位置
#当前线路
$prev_position=$position；
$position=tell文件；
如果（$visted==0）{
如果（/^\d+/）{
#新病例
如果（$count==1）{
#倒带到第一次出现的位置
#前一案例中的“排除”
查找文件，$first\u出现位置，0；
$visited=1；
}
否则{
打印替换$；
}
}
elsif（/Exclusion/）{
$count++；
如果（$count>1）{
查找文件，$first\u出现位置，0；
$visited=1；
}
elsif（$count==1）{
$first\u occurrent\u position=$prev\u position；
}
}
否则{
如果（$count==0），则打印替换$；
}
如果（eof文件&&$count==1）{
查找文件，$first\u出现位置，0；
$visited=1；
}
}
否则{
如果（$count==1）{
s/排除/0/；
}
如果（/^\d+/）{
$position=tell文件；
$visited=0；
$count=0；
}
打印替换$；
}
}
闭合替换；
关闭文件；

优雅！就性能而言，我希望这与@ChuckCottrill的解决方案非常相似，甚至可能稍微好一点？谢谢你们提供的所有解决方案。ikegami，当“排除”一词不仅出现一次，而且最多出现两次时，修改它以替换“排除”一词将是一个简单的步骤？我尝试了“if$block！~/\bExclusion\b.*\bExclusion\b.*\bExclusion\b/s；”。它可以工作，但只替换第一次出现的内容。请使用

s///g

<代码>$block=~s/\beexclusion\b/0/g如果$block！~/\bExclusion\b（？：.*\bExclusion\b）{2}/s。那么计算出现次数可能更简单<代码>我的$count=（）=$block=~/\beexclusion\b/$block=~s/\beexclusion\b/0/g如果0<$count&&$count@ChuckCottrill，则为非真。对于这种假设的格式，方法也是一样的。为了计算块中的重复次数，必须将块分割。非常好。与@ikegami的作品非常相似，但regex魔法要少得多。我认为这对初学者来说更容易接近。@Mikko Lipasti-谢谢！我的计划是让解决方案对初学者来说既容易理解又容易理解。“远没有正则表达式的魔力”必须意味着“没有

\b

”（因为正则表达式很简单），这是一件坏事。他使用

而是在一个位置（好），但不是在另一个位置。他的耦合性也很差（process\u case严重依赖于它之外的代码），这使得它更难理解、更难维护和更容易出错。OP没有提供文件规范，只提供了示例。
sub process_block {
   my ($block) = @_;
   $block =~ s/\bExclusion\b/0/
      if $block !~ /\bExclusion\b.*\bExclusion\b/s;
   print($block);
}

my $buf;
while (<>) {
    if (/^\d/) {
        process_block($buf) if $buf;
        $buf = '';
    }

    $buf .= $_;
}

process_block($buf) if $buf;

use strict;
use warnings;

open FILE, "text.txt";
open REPLACE, ">replace.txt";

my $count = 0;      # count of 'Exclusion' in the current case
my $position = 0;
my $prev_position = 0;
my $first_occur_position = 0;   # first occurence of 'Exclusion' in the current case
my $visited = 0;    # whether the current line is visited before

while (<FILE>) {
    # keep track of the position before reading
    # the current line
    $prev_position = $position;
    $position = tell FILE;

    if ($visited == 0) {
        if (/^\d+/) {
            # new case
            if ($count == 1) {
                # rewind to the first occurence 
                # of 'Exclusion' in the previous case
                seek FILE, $first_occur_position, 0; 
                $visited = 1;
            }
            else {
                print REPLACE $_;
            }
        }
        elsif (/Exclusion/) {
            $count++;
            if ($count > 1) {
                seek FILE, $first_occur_position, 0;
                $visited = 1;
            }
            elsif ($count == 1) {
                $first_occur_position = $prev_position;
            }
        }
        else {
            print REPLACE $_ if ($count == 0);
        }

        if (eof FILE && $count == 1) {
            seek FILE, $first_occur_position, 0;
            $visited = 1;
        }
    }
    else {
        if ($count == 1) {
            s/Exclusion/0/;
        }
        if (/^\d+/) {
            $position = tell FILE;
            $visited = 0;
            $count = 0;
        }
        print REPLACE $_;
    }
}

close REPLACE;
close FILE;