在perl中比较两个列表时查找额外的、缺少的、无效的字符串 清单1清单2 11 23 33 44 56 67 八八 99

在perl中比较两个列表时查找额外的、缺少的、无效的字符串 清单1清单2 11 23 33 44 56 67 八八 99,perl,list,compare,Perl,List,Compare,期待产出 List-1 List-2 one one two three three three four four five six six seven eight eighttt nine nine 一|一通 两个|*失败丢失 三|三通 *|额外三次失败 四|四通 五个|*不合格 六|六通 *|额外七次失败 八次失败无效 九|九关 实际上,我当前解决方案的返回是对两个修改列表的引用和对“fail”列表的引

期待产出

List-1 List-2 one one two three three three four four five six six seven eight eighttt nine nine 一|一通 两个|*失败丢失 三|三通 *|额外三次失败 四|四通 五个|*不合格 六|六通 *|额外七次失败 八次失败无效 九|九关 实际上,我当前解决方案的返回是对两个修改列表的引用和对“fail”列表的引用,该列表将索引的失败描述为“no fail”、“missing”、“extra”或“invalid”,这也是(显然)良好的输出

我目前的解决办法是:

one | one PASS two | * FAIL MISSING three | three PASS * | three FAIL EXTRA four | four PASS five | * FAIL MISSING six | six PASS * | seven FAIL EXTRA eight | eighttt FAIL INVALID nine | nine PASS
子比较{
本地$thisfound=shift;
本地$thatfound=shift;
local@thisorig=@{$thisfound};
local@thatorig=@{$thatfound};
本地$best=9999;
每台$n(1..6){
本地$diff=0;
local@thisfound=@thisorig;
local@thatfound=@thatorig;
本地@fail=();
对于(local$i=0;$i),Perl(和类似语言)中的技巧是哈希,它不关心顺序

假设第一个数组包含有效元素。使用这些值作为键构造哈希:

sub compare {
    local $thisfound = shift;
    local $thatfound = shift;
    local @thisorig = @{ $thisfound };
    local @thatorig = @{ $thatfound };
    local $best = 9999; 

    foreach $n (1..6) {
        local $diff = 0;
        local @thisfound = @thisorig;
        local @thatfound = @thatorig;
        local @fail = ();
        for (local $i=0;$i<scalar(@thisfound) || $i<scalar(@thatfound);$i++) {
            if($thisfound[$i] eq $thatfound[$i]) { 
                $fail[$i] = 'NO_FAIL';
                next;
            }
            if($n == 1) {      # 1 2 3
                next unless __compare_missing__();
                next unless __compare_extra__();
                next unless __compare_invalid__();
            } elsif($n == 2) { # 1 3 2
                next unless __compare_missing__();
                next unless __compare_invalid__();
                next unless __compare_extra__();
            } elsif($n == 3) { # 2 1 3
                next unless __compare_extra__();
                next unless __compare_missing__();
                next unless __compare_invalid__();
            } elsif($n == 4) { # 2 3 1
                next unless __compare_extra__();
                next unless __compare_invalid__();
                next unless __compare_missing__();
            } elsif($n == 5) { # 3 1 2
                next unless __compare_invalid__();
                next unless __compare_missing__();
                next unless __compare_extra__();
            } elsif($n == 6) { # 3 2 1
                next unless __compare_invalid__();
                next unless __compare_extra__();
                next unless __compare_missing__();
            }
            push @fail,'INVALID'; 
            $diff += 1;
        }
        if ($diff<$best) {
            $best = $diff;
            @thisbest = @thisfound;
            @thatbest = @thatfound;
            @failbest = @fail;
        }
    }
    return (\@thisbest,\@thatbest,\@failbest)
}

sub __compare_missing__ {
    my $j;
    ### Does that command match a later this command? ###
    ### If so most likely a MISSING command           ###
    for($j=$i+1;$j<scalar(@thisfound);$j++) {
        if($thisfound[$j] eq $thatfound[$i]) {
            $diff += $j-$i;
            for ($i..$j-1) { push(@fail,'MISSING'); }
            @end = @thatfound[$i..$#thatfound];
            @thatfound = @thatfound[0..$i-1];
            for ($i..$j-1) { push(@thatfound,'*'); }
            push(@thatfound,@end);
            $i=$j-1;
            last;
        }
    }
    $j == scalar(@thisfound);
}

sub __compare_extra__ {
    my $j;
    ### Does this command match a later that command? ###
    ### If so, most likely an EXTRA command           ###
    for($j=$i+1;$j<scalar(@thatfound);$j++) {
        if($thatfound[$j] eq $thisfound[$i]) { 
            $diff += $j-$i;
            for ($i..$j-1) { push(@fail,'EXTRA'); }
            @end = @thisfound[$i..$#thisfound];
            @thisfound = @thisfound[0..$i-1];
            for ($i..$j-1) { push (@thisfound,'*'); }
            push(@thisfound,@end);
            $i=$j-1;
            last; 
        }
    }
    $j == scalar(@thatfound);
}

sub __compare_invalid__ {
    my $j;
    ### Do later commands match?                      ###
    ### If so most likely an INVALID command          ###
    for($j=$i+1;$j<scalar(@thisfound);$j++) {
        if($thisfound[$j] eq $thatfound[$j]) { 
            $diff += $j-$i;
            for ($i..$j-1) { push(@fail,'INVALID'); }
            $i=$j-1;
            last;
        }
    }
    $j == scalar(@thisfound);
}
现在,要查找无效元素,只需查找
%valid
散列中不存在的元素:

  my @valid = qw( one two ... );
  my %valid = map { $_, 1 } @valid;
如果您想知道无效元素的数组索引:

  my @invalid = grep { ! exists $valid{$_} } @array;
现在,您可以展开它来查找重复的元素。您不仅要检查
%valid
散列,还要跟踪您已经看到的内容:

  my @invalid_indices = grep { ! exists $valid{$_} } 0 .. $#array;
重复的有效元素是
%Seen
中的值大于1的元素:

 my %Seen;
 my @invalid_indices = grep { ! exists $valid{$_} && ! $Seen{$_}++ } 0 .. $#array;
要查找缺少的元素,请查看
%seed
,以检查哪些元素不在其中

 my @repeated_valid = grep { $Seen{$_} > 1 } @valid;
Perl(和类似语言)的诀窍是散列,它不关心顺序

假设第一个数组包含有效元素。使用这些值作为键构造哈希:

sub compare {
    local $thisfound = shift;
    local $thatfound = shift;
    local @thisorig = @{ $thisfound };
    local @thatorig = @{ $thatfound };
    local $best = 9999; 

    foreach $n (1..6) {
        local $diff = 0;
        local @thisfound = @thisorig;
        local @thatfound = @thatorig;
        local @fail = ();
        for (local $i=0;$i<scalar(@thisfound) || $i<scalar(@thatfound);$i++) {
            if($thisfound[$i] eq $thatfound[$i]) { 
                $fail[$i] = 'NO_FAIL';
                next;
            }
            if($n == 1) {      # 1 2 3
                next unless __compare_missing__();
                next unless __compare_extra__();
                next unless __compare_invalid__();
            } elsif($n == 2) { # 1 3 2
                next unless __compare_missing__();
                next unless __compare_invalid__();
                next unless __compare_extra__();
            } elsif($n == 3) { # 2 1 3
                next unless __compare_extra__();
                next unless __compare_missing__();
                next unless __compare_invalid__();
            } elsif($n == 4) { # 2 3 1
                next unless __compare_extra__();
                next unless __compare_invalid__();
                next unless __compare_missing__();
            } elsif($n == 5) { # 3 1 2
                next unless __compare_invalid__();
                next unless __compare_missing__();
                next unless __compare_extra__();
            } elsif($n == 6) { # 3 2 1
                next unless __compare_invalid__();
                next unless __compare_extra__();
                next unless __compare_missing__();
            }
            push @fail,'INVALID'; 
            $diff += 1;
        }
        if ($diff<$best) {
            $best = $diff;
            @thisbest = @thisfound;
            @thatbest = @thatfound;
            @failbest = @fail;
        }
    }
    return (\@thisbest,\@thatbest,\@failbest)
}

sub __compare_missing__ {
    my $j;
    ### Does that command match a later this command? ###
    ### If so most likely a MISSING command           ###
    for($j=$i+1;$j<scalar(@thisfound);$j++) {
        if($thisfound[$j] eq $thatfound[$i]) {
            $diff += $j-$i;
            for ($i..$j-1) { push(@fail,'MISSING'); }
            @end = @thatfound[$i..$#thatfound];
            @thatfound = @thatfound[0..$i-1];
            for ($i..$j-1) { push(@thatfound,'*'); }
            push(@thatfound,@end);
            $i=$j-1;
            last;
        }
    }
    $j == scalar(@thisfound);
}

sub __compare_extra__ {
    my $j;
    ### Does this command match a later that command? ###
    ### If so, most likely an EXTRA command           ###
    for($j=$i+1;$j<scalar(@thatfound);$j++) {
        if($thatfound[$j] eq $thisfound[$i]) { 
            $diff += $j-$i;
            for ($i..$j-1) { push(@fail,'EXTRA'); }
            @end = @thisfound[$i..$#thisfound];
            @thisfound = @thisfound[0..$i-1];
            for ($i..$j-1) { push (@thisfound,'*'); }
            push(@thisfound,@end);
            $i=$j-1;
            last; 
        }
    }
    $j == scalar(@thatfound);
}

sub __compare_invalid__ {
    my $j;
    ### Do later commands match?                      ###
    ### If so most likely an INVALID command          ###
    for($j=$i+1;$j<scalar(@thisfound);$j++) {
        if($thisfound[$j] eq $thatfound[$j]) { 
            $diff += $j-$i;
            for ($i..$j-1) { push(@fail,'INVALID'); }
            $i=$j-1;
            last;
        }
    }
    $j == scalar(@thisfound);
}
现在,要查找无效元素,只需查找
%valid
散列中不存在的元素:

  my @valid = qw( one two ... );
  my %valid = map { $_, 1 } @valid;
如果您想知道无效元素的数组索引:

  my @invalid = grep { ! exists $valid{$_} } @array;
现在,您可以展开它来查找重复的元素。您不仅要检查
%valid
散列,还要跟踪您已经看到的内容:

  my @invalid_indices = grep { ! exists $valid{$_} } 0 .. $#array;
重复的有效元素是
%Seen
中的值大于1的元素:

 my %Seen;
 my @invalid_indices = grep { ! exists $valid{$_} && ! $Seen{$_}++ } 0 .. $#array;
要查找缺少的元素,请查看
%seed
,以检查哪些元素不在其中

 my @repeated_valid = grep { $Seen{$_} > 1 } @valid;
从客户的回答到:


(本答案的部分内容由Anno Siegel和brian d foy提供)

听到“in”这个词表示您可能应该使用散列而不是列表或数组来存储数据。散列是为了快速有效地回答这个问题而设计的。数组不是

也就是说,有几种方法可以实现这一点。在Perl 5.10及更高版本中,可以使用智能匹配运算符检查数组或哈希中是否包含项:

 my @missing = grep { ! $Seen{$_ } } @valid;
对于早期版本的Perl,您需要做更多的工作。如果要对任意字符串值多次执行此查询,最快的方法可能是反转原始数组并维护一个哈希,其键是第一个数组的值:

use 5.010;

if( $item ~~ @array )
    {
    say "The array contains $item"
    }

if( $item ~~ %hash )
    {
    say "The hash contains $item"
    }
现在,您可以检查$is_blue{$some_color}。首先,将蓝色全部保存在散列中可能是个好主意

如果值都是小整数,则可以使用简单的索引数组。这种数组占用的空间较小:

@blues = qw/azure cerulean teal turquoise lapis-lazuli/;
%is_blue = ();
for (@blues) { $is_blue{$_} = 1 }
现在,您检查$是否为_tiny _prime[$some _number]

如果所讨论的值是整数而不是字符串,则使用位字符串可以节省大量空间:

@primes = (2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31);
@is_tiny_prime = ();
for (@primes) { $is_tiny_prime[$_] = 1 }
# or simply  @istiny_prime[@primes] = (1) x @primes;
现在检查vec($read,$n,1)对于某些$n是否为真

这些方法保证了快速的单独测试,但需要重新组织原始列表或数组。只有在您必须针对同一数组测试多个值时,它们才有回报

如果只测试一次,则标准模块列表::Util会首先为此导出函数。它的工作方式是在找到元素后停止。它是用C编写的,以提高速度,其Perl等价物类似于以下子例程:

@articles = ( 1..10, 150..2000, 2017 );
undef $read;
for (@articles) { vec($read,$_,1) = 1 }
如果速度不太重要,那么常用的习惯用法是在标量上下文中使用grep(返回通过其条件的项数)遍历整个列表。不过,这样做的好处是可以告诉您找到了多少个匹配项

sub first (&@) {
    my $code = shift;
    foreach (@_) {
        return $_ if &{$code}();
    }
    undef;
}
如果您想实际提取匹配元素,只需在列表上下文中使用grep即可

my $is_there = grep $_ eq $whatever, @array;
从客户的回答到:


(本答案的部分内容由Anno Siegel和brian d foy提供)

听到“in”这个词表示您可能应该使用散列而不是列表或数组来存储数据。散列是为了快速有效地回答这个问题而设计的。数组不是

也就是说,有几种方法可以实现这一点。在Perl 5.10及更高版本中,可以使用智能匹配运算符检查数组或哈希中是否包含项:

 my @missing = grep { ! $Seen{$_ } } @valid;
对于早期版本的Perl,您需要做更多的工作。如果要对任意字符串值多次执行此查询,最快的方法可能是反转原始数组并维护一个哈希,其键是第一个数组的值:

use 5.010;

if( $item ~~ @array )
    {
    say "The array contains $item"
    }

if( $item ~~ %hash )
    {
    say "The hash contains $item"
    }
现在,您可以检查$is_blue{$some_color}。首先,将蓝色全部保存在散列中可能是个好主意

如果值都是小整数,则可以使用简单的索引数组。这种数组占用的空间较小:

@blues = qw/azure cerulean teal turquoise lapis-lazuli/;
%is_blue = ();
for (@blues) { $is_blue{$_} = 1 }
现在,您检查$是否为_tiny _prime[$some _number]

如果所讨论的值是整数而不是字符串,则