Perl 如何列出包含同一单词的多个句子。标题是包含在这些句子中的单词
目前,它打印了所有的名词和句子,这些名词和句子可以在下面找到Perl 如何列出包含同一单词的多个句子。标题是包含在这些句子中的单词,perl,Perl,目前,它打印了所有的名词和句子,这些名词和句子可以在下面找到 #!/usr/bin/perl use strict; use warnings FATAL => "all"; my $search_key = "expend"; ## CHANGE "..." to <> open(my $tag_corpus, '<', "ch13tagged.txt") or die $!; my @sentences = <$tag_corpus>; #
#!/usr/bin/perl
use strict;
use warnings FATAL => "all";
my $search_key = "expend"; ## CHANGE "..." to <>
open(my $tag_corpus, '<', "ch13tagged.txt") or die $!;
my @sentences = <$tag_corpus>; # This breaks up each line into list
my @words;
my %seens = ();
my %seenw = ();
for (my $i = 0; $i <= @sentences; $i++) {
if (defined($sentences[$i]) and $sentences[$i] =~ /($search_key)_VB.*/i) {
@words = split /\s/, $sentences[$i]; ## \s is a whitespace
for (my $j = 0; $j <= @words; $j++) {
#FILTER if word is noun, and therefore will end with _NN:
if (defined($words[$j]) and $words[$j] =~ /_NN/) {
#PRINT word (without _NN) and sentence (without any _ENDING):
next if $seenw{$words[$j]}++; ## How to include plural etc
push @words, $words[$j];
print "**", split(/_\S+/, $words[$j]), "**", "\n";
## next if $seens{ $sentences[$i] }++;
## push @sentences, $sentences[$i];
print split(/_\S+/, $sentences[$i]), "\n"
## HOW PRINT bold or specifically word bold?
#FILTER if word has been output, add sentence under that heading
}
} ## put print sentences here to print each sentence after all the nouns inside
}
}
close $tag_corpus || die "Can't close $tag_corpus: $!";
#/usr/bin/perl
严格使用;
使用致命警告=>“全部”;
我的$search_key=“expense”#将“…”更改为
打开(我的$tag_语料库),你的原创:
#!/usr/bin/perl
use strict;
use warnings FATAL => "all";
这是一个好的开始
my $search_key = "expend"; ## CHANGE "..." to <>
这在很大程度上是相同的,但开销较小:
如果该行包含记录分隔符——并且它将包含,除非您chomp
它,否则您将始终使用该分隔符
在文件结束前获取定义的行。无需测试定义的行
此外,您不需要在搜索词之后使用*
,也不需要捕获$search\u键
这里没有效果
@words = split /\s/, $sentences[$i]; ## \s is a whitespace
您不希望在单个空格上分割空白。您应该使用/\s+/
,但是
更好的是:@words=split',$statemens[$i];
但你甚至不需要这个
for (my $j = 0; $j <= @words; $j++) {
#FILTER if word is noun, and therefore will end with _NN:
if (defined($words[$j]) and $words[$j] =~ /_NN/) {
#PRINT word (without _NN) and sentence (without any _ENDING):
除非您想在每个句子后重置%seenw
,否则您只能处理每个\u NN
word每个文件一次
push @words, $words[$j];
我不明白这个push
如何通过附加名词来达到任何可能的目的
返回单词列表。确保在保存之前已进行唯一性检查
如果有任何\u NN
单词,您将从无限循环中退出,但这只意味着您将拥有
句子中的所有单词,后面跟着所有的“名词”。不仅如此,你还很简单
去测试它是一个名词,什么都不做,更不用说你
用下一句话填空列表
print "**", split(/_\S+/, $words[$j]), "**", "\n";
## next if $seens{ $sentences[$i] }++;
你不想在单词循环中这样做
## push @sentences, $sentences[$i];
再说一次,我不认为如果它没有注释,你会想这样做
在单词loop之外,似乎2行之前的所有内容都是
在单词loop之后
print split(/_\S+/, $sentences[$i]), "\n"
## HOW PRINT bold or specifically word bold?
#FILTER if word has been output, add sentence under that heading
}
} ## put print sentences here to print each sentence after all the nouns inside
}
}
close $tag_corpus || die "Can't close $tag_corpus: $!";
不。这不会处理关闭时的错误返回。|
或也处于“绑定”状态
紧紧地。您正在关闭$tag\u corpus
或die的输出。幸运的(或者可能是不幸的)
骰子永远不会被调用,因为如果我们走到这一步,$tag\u corpus
应该是一个
真正的价值
这是一种清理版本,您正试图用
我能理解的部分留在了里面
my @sentences;
# We're processing a single line at a time.
while ( <$tag_corpus> ) {
# Test if we want to work with the line
next unless m/$verb_regex/;
# If we do, then test that we haven't dealt with it before
# Although I suspect that this may not be needed as much if we're not
# pushing to a queue that we're reading from.
next if $seens{ $_ }++;
# split -> split ' ', $_
# pass through only those words that match _NN at the end and
# are unique so far. We test on a substitution, because the result
# still uniquely identifies a noun
foreach my $noun ( grep { s/_NN$// && !$seenw{ $_ }++ } split ) {
print "**$noun**\n";
}
# This will omit any adjacent punctuation you have after the word--if
# that's a problem.
print split( /_\S+/ ), "\n";
# Here we save the sentence.
push @sentences, $_;
}
close $tag_corpus or die "Can't close ch13tagged.txt: $!";
my@语句;
#我们一次只处理一行。
而{
#测试我们是否要使用该线路
下一个,除非m/$verb_regex/;
#如果我们这样做了,那么测试我们以前没有处理过它
#虽然我怀疑,如果我们不这样做的话,可能就不需要这么多了
#推到我们正在阅读的队列。
下一步如果$seens{$}++;
#拆分->拆分“”$_
#仅通过与结尾处匹配的单词,然后
#到目前为止都是独一无二的。我们测试了一个替代品,因为结果
#仍然唯一地标识一个名词
foreach my$名词(grep{s/_NN$/&&!$seenw{$}++}split){
打印“**$noon**\n”;
}
#这将省略单词后面的任何相邻标点符号——如果
#这是个问题。
打印拆分(/\us+/),“\n”;
#在这里,我们保存这个句子。
推送@句子,$\;
}
关闭$tag_语料库或死亡“无法关闭ch13taged.txt:$!”;
提供示例数据来处理。用它来指出你认为是“普通单词”、“标题”等。你的问题需要澄清。“下面列出的是标题”。?澄清标题并添加简短描述,最好是正常字体大小。没有必要将所有内容都塞进标题中。感谢深入的解决方案。我似乎无法让它打印出句子,但您将其保留为$SECTIONS[$I]
print "**", split(/_\S+/, $words[$j]), "**", "\n";
## next if $seens{ $sentences[$i] }++;
## push @sentences, $sentences[$i];
print split(/_\S+/, $sentences[$i]), "\n"
## HOW PRINT bold or specifically word bold?
#FILTER if word has been output, add sentence under that heading
}
} ## put print sentences here to print each sentence after all the nouns inside
}
}
close $tag_corpus || die "Can't close $tag_corpus: $!";
my @sentences;
# We're processing a single line at a time.
while ( <$tag_corpus> ) {
# Test if we want to work with the line
next unless m/$verb_regex/;
# If we do, then test that we haven't dealt with it before
# Although I suspect that this may not be needed as much if we're not
# pushing to a queue that we're reading from.
next if $seens{ $_ }++;
# split -> split ' ', $_
# pass through only those words that match _NN at the end and
# are unique so far. We test on a substitution, because the result
# still uniquely identifies a noun
foreach my $noun ( grep { s/_NN$// && !$seenw{ $_ }++ } split ) {
print "**$noun**\n";
}
# This will omit any adjacent punctuation you have after the word--if
# that's a problem.
print split( /_\S+/ ), "\n";
# Here we save the sentence.
push @sentences, $_;
}
close $tag_corpus or die "Can't close ch13tagged.txt: $!";