Perl 如何列出包含同一单词的多个句子。标题是包含在这些句子中的单词_Perl

Perl 如何列出包含同一单词的多个句子。标题是包含在这些句子中的单词

perl

Perl 如何列出包含同一单词的多个句子。标题是包含在这些句子中的单词,perl,Perl,目前，它打印了所有的名词和句子，这些名词和句子可以在下面找到 #!/usr/bin/perl use strict; use warnings FATAL => "all"; my $search_key = "expend"; ## CHANGE "..." to <> open(my $tag_corpus, '<', "ch13tagged.txt") or die $!; my @sentences = <$tag_corpus>; #

目前，它打印了所有的名词和句子，这些名词和句子可以在下面找到

#!/usr/bin/perl
use strict;
use warnings FATAL => "all";
my $search_key = "expend";    ## CHANGE "..." to <>

open(my $tag_corpus, '<', "ch13tagged.txt") or die $!;

my @sentences = <$tag_corpus>;    # This breaks up each line into list
my @words;
my %seens = ();
my %seenw = ();

for (my $i = 0; $i <= @sentences; $i++) {
    if (defined($sentences[$i]) and $sentences[$i] =~ /($search_key)_VB.*/i) {
        @words = split /\s/, $sentences[$i];    ## \s is a whitespace
        for (my $j = 0; $j <= @words; $j++) {
            #FILTER if word is noun, and therefore will end with _NN:
            if (defined($words[$j]) and $words[$j] =~ /_NN/) {
                #PRINT word (without _NN) and sentence (without any _ENDING):
                next if $seenw{$words[$j]}++;    ## How to include plural etc
                push @words, $words[$j];
                print "**", split(/_\S+/, $words[$j]), "**", "\n";
                ## next if $seens{ $sentences[$i] }++;
                ## push @sentences, $sentences[$i];
                print split(/_\S+/, $sentences[$i]), "\n"
                ## HOW PRINT bold or specifically word bold?
                #FILTER if word has been output, add sentence under that heading
            }
        }    ## put print sentences here to print each sentence after all the nouns inside
    }
}
close $tag_corpus || die "Can't close $tag_corpus: $!";

#/usr/bin/perl
严格使用；
使用致命警告=>“全部”；
我的$search_key=“expense”#将“…”更改为
打开（我的$tag_语料库），你的原创：
#!/usr/bin/perl
use strict;
use warnings FATAL => "all";

这是一个好的开始
my $search_key = "expend";    ## CHANGE "..." to <>

这在很大程度上是相同的，但开销较小：
如果该行包含记录分隔符——并且它将包含，除非您chomp它，否则您将始终使用该分隔符
在文件结束前获取定义的行。无需测试定义的行
此外，您不需要在搜索词之后使用*
，也不需要捕获$search\u键
这里没有效果
        @words = split /\s/, $sentences[$i];    ## \s is a whitespace

您不希望在单个空格上分割空白。您应该使用/\s+/，但是
更好的是：@words=split'，$statemens[$i]；

但你甚至不需要这个
        for (my $j = 0; $j <= @words; $j++) {
            #FILTER if word is noun, and therefore will end with _NN:
            if (defined($words[$j]) and $words[$j] =~ /_NN/) {
                #PRINT word (without _NN) and sentence (without any _ENDING):

除非您想在每个句子后重置%seenw
，否则您只能处理每个\u NN
word每个文件一次
                push @words, $words[$j];

我不明白这个push如何通过附加名词来达到任何可能的目的
返回单词列表。确保在保存之前已进行唯一性检查
如果有任何\u NN单词，您将从无限循环中退出，但这只意味着您将拥有
句子中的所有单词，后面跟着所有的“名词”。不仅如此，你还很简单
去测试它是一个名词，什么都不做，更不用说你
用下一句话填空列表
                print "**", split(/_\S+/, $words[$j]), "**", "\n";

                ## next if $seens{ $sentences[$i] }++; 

你不想在单词循环中这样做
                ## push @sentences, $sentences[$i];

再说一次，我不认为如果它没有注释，你会想这样做
在单词loop之外，似乎2行之前的所有内容都是
在单词loop之后
                print split(/_\S+/, $sentences[$i]), "\n"
                ## HOW PRINT bold or specifically word bold?
                #FILTER if word has been output, add sentence under that heading
            }
        }    ## put print sentences here to print each sentence after all the nouns inside
    }
}
close $tag_corpus || die "Can't close $tag_corpus: $!";

不。这不会处理关闭时的错误返回。|或也处于“绑定”状态
紧紧地。您正在关闭$tag\u corpus或die的输出。幸运的（或者可能是不幸的）
骰子永远不会被调用，因为如果我们走到这一步，$tag\u corpus应该是一个
真正的价值
这是一种清理版本，您正试图用
我能理解的部分留在了里面
my @sentences;
# We're processing a single line at a time.
while ( <$tag_corpus> ) { 
    # Test if we want to work with the line
    next unless m/$verb_regex/;
    # If we do, then test that we haven't dealt with it before
    # Although I suspect that this may not be needed as much if we're not 
    # pushing to a queue that we're reading from.
    next if    $seens{ $_ }++;

    # split -> split ' ', $_
    # pass through only those words that match _NN at the end and
    # are unique so far. We test on a substitution, because the result
    # still uniquely identifies a noun
    foreach my $noun ( grep { s/_NN$// && !$seenw{ $_ }++ } split ) { 
        print "**$noun**\n";
    }
    # This will omit any adjacent punctuation you have after the word--if 
    # that's a problem.
    print split( /_\S+/ ), "\n";
    # Here we save the sentence.
    push @sentences, $_;
}
close $tag_corpus or die "Can't close ch13tagged.txt: $!";

my@语句；
#我们一次只处理一行。
而{
#测试我们是否要使用该线路
下一个，除非m/$verb_regex/；
#如果我们这样做了，那么测试我们以前没有处理过它
#虽然我怀疑，如果我们不这样做的话，可能就不需要这么多了
#推到我们正在阅读的队列。
下一步如果$seens{$}++；
#拆分->拆分“”$_
#仅通过与结尾处匹配的单词，然后
#到目前为止都是独一无二的。我们测试了一个替代品，因为结果
#仍然唯一地标识一个名词
foreach my$名词（grep{s/_NN$/&&！$seenw{$}++}split）{
打印“**$noon**\n”；
}
#这将省略单词后面的任何相邻标点符号——如果
#这是个问题。
打印拆分（/\us+/），“\n”；
#在这里，我们保存这个句子。
推送@句子，$\；
}
关闭$tag_语料库或死亡“无法关闭ch13taged.txt:$！”；
提供示例数据来处理。用它来指出你认为是“普通单词”、“标题”等。你的问题需要澄清。“下面列出的是标题”。？澄清标题并添加简短描述，最好是正常字体大小。没有必要将所有内容都塞进标题中。感谢深入的解决方案。我似乎无法让它打印出句子，但您将其保留为$SECTIONS[$I]
                print "**", split(/_\S+/, $words[$j]), "**", "\n";

                ## next if $seens{ $sentences[$i] }++; 

                ## push @sentences, $sentences[$i];

                print split(/_\S+/, $sentences[$i]), "\n"
                ## HOW PRINT bold or specifically word bold?
                #FILTER if word has been output, add sentence under that heading
            }
        }    ## put print sentences here to print each sentence after all the nouns inside
    }
}
close $tag_corpus || die "Can't close $tag_corpus: $!";

my @sentences;
# We're processing a single line at a time.
while ( <$tag_corpus> ) { 
    # Test if we want to work with the line
    next unless m/$verb_regex/;
    # If we do, then test that we haven't dealt with it before
    # Although I suspect that this may not be needed as much if we're not 
    # pushing to a queue that we're reading from.
    next if    $seens{ $_ }++;

    # split -> split ' ', $_
    # pass through only those words that match _NN at the end and
    # are unique so far. We test on a substitution, because the result
    # still uniquely identifies a noun
    foreach my $noun ( grep { s/_NN$// && !$seenw{ $_ }++ } split ) { 
        print "**$noun**\n";
    }
    # This will omit any adjacent punctuation you have after the word--if 
    # that's a problem.
    print split( /_\S+/ ), "\n";
    # Here we save the sentence.
    push @sentences, $_;
}
close $tag_corpus or die "Can't close ch13tagged.txt: $!";