Arrays 计算两个数组元素在一起出现的次数

Arrays 计算两个数组元素在一起出现的次数,arrays,perl,Arrays,Perl,我有一大堆词。 我想数一数,两个特定的单词出现的次数少于给定的距离 例如,如果“time”和“late”之间的距离不超过三个单词,那么我想增加一个计数器。单词“time”和“late”可以在数组中出现数百次。我怎样才能找到它们彼此靠近的时间数呢?你没有问任何问题,所以我想你已经想出了一个算法 遍历索引。 如果在该索引中找到第一个单词, 请注意索引 如果在该索引中找到第二个单词, 请注意索引 从另一个索引中减去一个索引 注: 您可能需要添加检查以确保找到每个单词 您没有指定当其中一个单词出

我有一大堆词。 我想数一数,两个特定的单词出现的次数少于给定的距离


例如,如果“time”和“late”之间的距离不超过三个单词,那么我想增加一个计数器。单词“time”和“late”可以在数组中出现数百次。我怎样才能找到它们彼此靠近的时间数呢?

你没有问任何问题,所以我想你已经想出了一个算法

  • 遍历索引。
  • 如果在该索引中找到第一个单词,
  • 请注意索引
  • 如果在该索引中找到第二个单词,
  • 请注意索引
  • 从另一个索引中减去一个索引
  • 注:

    • 您可能需要添加检查以确保找到每个单词
    • 您没有指定当其中一个单词出现多次时应该发生什么

    关于评论中提出的问题:

  • 遍历索引。
  • 如果在该索引中找到第一个单词,
  • 请注意索引
  • 如果在该索引中找到第二个单词,
  • 如果当前索引和注释索引之间的差异为≤ 3.
  • 递增计数器
  • 注:

    • 假设您只关心第二个单词和第一个单词的前一个实例之间的距离

    使用索引哈希将是非常有效的解决方案:

    my @words = qw( word1 word2 word3 word4 word5 word6 );
    
    # That can be expensive, but you do it only once
    my %index;
    @index{@words} = (0..$#words);
    
    # That will be real quick
    my $distance = $index{"word6"} - $index{"word2"}
    print "Distance: $distance \n";
    
    上述脚本的输出将是:

    Distance: 4
    

    注意:创建索引哈希可能会很昂贵。但是,如果您计划进行许多距离检查,这可能是值得的,因为任何查找都很快(恒定时间,而不是事件日志(n))

    是否需要支持重复的单词

    #! /usr/bin/perl
    use strict;
    use warnings;
    use constant DEBUG => 0;
    
    my @words;
    if( $ARGV[0] && -f $ARGV[0] ) {
        open my $fh, "<", $ARGV[0] or die "Could not read $ARGV[0], because: $!\n";
        my $hughTestFile = do { local $/; <$fh> };
        @words = split /[\s\n]/, $hughTestFile;  # $#words == 10M words with my test.log
        # Test words (below) were manually placed at equal distances (~every 900K words) in test.log
        # With above, TESTS ran in avg of 15 seconds.  Likely test.log was in buffers/cache.
    } else {
        @words = qw( word1 word2 word3 word4 word5 word6 word7 word8 word4 word9 word0 );
    }
    
    sub IndexOf {
        my $searchFor = shift;
        return undef if( !$searchFor );
        my $Nth = shift || 1;
    
        my $length = $#words;
        my $cntr = 0;
        for my $word (@words) {
            if( $word eq $searchFor ) {
                $Nth--;
                return $cntr if( $Nth == 0 );
            }
            $cntr++;
        }
        return undef;
    }
    
    sub Distance {
    # args:  <1st word>, <2nd word>, [occurrence_of_1st_word], [occurrence_of_2nd_word]
    # for occurrence counts:  0, 1 & undef - all have the same effect (1st occurrence)
        my( $w1, $w2 ) = ($_[0], $_[1]);
        my( $n1, $n2 ) = ($_[2] || undef, $_[3] || undef );
        die "Missing words\n" if( !$w1 );
        $w2 = $w1 if( !$w2 );
    
        my( $i1, $i2 ) = ( IndexOf($w1, $n1), IndexOf($w2, $n2) );
        if( defined($i1) && defined($i2) ) {
            my $offset = $i1-$i2;
            print "  Distance (offset) = $offset\n";
            return undef;
        } elsif( !defined($i1) && !defined($i2) ) {
            print "  Neither words were ";
        } elsif( !defined($i1) ) {
            print "  First word was not ";
        } else {
            print "  Second word was not ";
        }
        print "found in list\n";
    
        return undef;
    }
    
    # TESTS
    print "Your array has ".$#words." words\n";
    print "When 1st word is AFTER 2nd word:\n";
    Distance( "word7", "word3" );
    print "When 1st word is BEFORE 2nd word:\n";
    Distance( "word2", "word5" );
    print "When 1st word == 2nd word:\n";
    Distance( "word4", "word4" );
    print "When 1st word doesn't exist:\n";
    Distance( "word00", "word6" );
    print "When 2nd word doesn't exist:\n";
    Distance( "word1", "word99" );
    print "When neither 1st or 2nd words exist:\n";
    Distance( "word00", "word99" );
    print "When the 1st word is AFTER the 2nd OCCURRENCE of 2nd word:\n";
    Distance( "word9", "word4", 0, 2 );
    print "When the 1st word is BEFORE the 2nd OCCURRENCE of the 2nd word:\n";
    Distance( "word7", "word4", 1, 2 );
    print "When the 2nd OCCURRENCE of the 2nd word doesn't exist:\n";
    Distance( "word7", "word99", 0, 2 );
    print "When the 2nd OCCURRENCE of the 1st word is AFTER the 2nd word:\n";
    Distance( "word4", "word2", 2, 0 );
    print "When the 2nd OCCURRENCE of the 1st word is BEFORE the 2nd word:\n";
    Distance( "word4", "word0", 2, 0 );
    print "When the 2nd OCCURRENCE of the 1st word exists, but 2nd doesn't:\n";
    Distance( "word4", "word99", 2, 0 );
    print "When neither of the 2nd OCCURRENCES of the words exist:\n";
    Distance( "word00", "word99", 2, 2 );
    print "Distance between 2nd and 1st OCCURRENCES of the same word:\n";
    Distance( "word4", "", 2, 1 );
    
    #/usr/bin/perl
    严格使用;
    使用警告;
    使用常量DEBUG=>0;
    我的文字;
    如果($ARGV[0]&&f$ARGV[0]){
    
    打开我的$fh,"问题可能重复,以匹配下面评论中提出的问题。缺点:即使你想要的两个单词是前两个单词,你也必须处理整个列表。这是真的。另一方面,一旦创建了索引哈希,它可以用于快速查找。因此这是一种折衷。我会在回答中注意到这一点。如果你t保留必要数量的散列存储桶这可能是O(n logn),因此将混响添加到answInserting n元素到散列中是O(n),因此是O(n)无需预先保留。@ikegami是的,以防重新刷新。OP表示数组很大,因此将进行重新刷新。此外,如果数组已排序,则可以使用B搜索太快来完成此操作。如果“时间”和“延迟”之间的距离