String Perl：计算单词在文本中出现的次数并打印出周围的单词_String_Perl_Count

String Perl：计算单词在文本中出现的次数并打印出周围的单词

string perl

String Perl：计算单词在文本中出现的次数并打印出周围的单词,string,perl,count,String,Perl,Count,我想做两件事： 1）计算给定单词在文本文件中出现的次数 2）把那个词的上下文打印出来这是我当前使用的代码： my $word_delimiter = qr{ [^[:alnum:][:space:]]* (?: [[:space:]]+ | -- | , | \. | \t | ^ ) [^[:alnum:]]* }x; my $word = "hello"; my $count = 0; # # here, a file's contents are loaded in

我想做两件事：

1）计算给定单词在文本文件中出现的次数

2）把那个词的上下文打印出来

这是我当前使用的代码：

my $word_delimiter = qr{
  [^[:alnum:][:space:]]*
  (?: [[:space:]]+ | -- | , | \. | \t | ^ )
  [^[:alnum:]]*
 }x;

my $word = "hello";
my $count = 0;

#
# here, a file's contents are loaded into $lines, code not shown
#

$lines =~ s/\R/ /g; # replace all line breaks with blanks (cannot just erase them, because this might connect words that should not be connected)
$lines =~ s/\s+/ /g; # replace all multiple whitespaces (incl. blanks, tabs, newlines) with single blanks
$lines = " ".$lines." "; # add a blank at beginning and end to ensure that first and last word can be found by regex pattern below

while ($lines =~ m/$word_delimiter$word$word_delimiter/g ) {
    ++$count;
    # here, I would like to print the word with some context around it (i.e. a few words before and after it)
}

三个问题：

1）我的$word_分隔符模式是否捕获了所有我希望分隔单词的合理字符？当然，我不想分隔连字符的单词等[注：我在整个文本中使用UTF-8，但仅使用英语和德语文本；我理解如何合理分隔单词可能是一个判断问题]

2）当要分析的文件包含“再见-你好-再见”这样的文本时，计数器只增加一次，因为正则表达式只匹配第一次出现的“你好”。毕竟，当它第二次可以找到“hello”时，它前面并没有另一个空格。关于如何捕捉第二次事件，你有什么想法吗？我是否应该以某种方式重置pos（）

3）如何（合理有效地）在任何匹配的单词前后打印出几个单词

谢谢

一,。我的

$word\u定界符

模式是否捕获了我希望分隔单词的所有合理字符？

单词字符由字符类
```
\w
```
表示。它还匹配来自非罗马脚本的数字和字符
```
\W
```
表示否定意义（非单词字符）
```
\b
```
表示单词边界，长度为零

使用这些已经可用的字符类就足够了

2.关于如何捕捉第二次事件，你有什么想法吗？使用零长度单词边界

while ( $lines =~ /\b$word\b/g ) {
    
    ++$count;
}

有什么原因不使用作为单词分隔符吗？问题之一是，如果我在搜索，比如说“跳跃”，我希望匹配“跳跃”，但不匹配“跳跃”（有效）和“跳跃服”（不适用于\b）。此外，我将“you're”作为两个单词，我宁愿将其计算为一个