Regex 在Perl中计算字符串中单词的出现次数_Regex_Perl

Regex 在Perl中计算字符串中单词的出现次数

regex perl

Regex 在Perl中计算字符串中单词的出现次数,regex,perl,Regex,Perl,我正试图找出“the/the”出现的次数。下面是我尝试过的代码“ print（“输入字符串。\n”）； $inputline=；印章（输入行）； $regex=“\[Tt\]he”；如果（$inputline ne“”） { @splitarr=拆分（/$regex/，$inputline）； } $scalar=@splitarr；打印$scalar；字符串为：你好，你们想在这个项目上工作吗？但我想在美国工作它给出的输出是7。但是对于字符串：你好，你们想在这个项目上工作吗？但

我正试图找出“the/the”出现的次数。下面是我尝试过的代码“

print（“输入字符串。\n”）；
$inputline=；
印章（输入行）；
$regex=“\[Tt\]he”；
如果（$inputline ne“”）
{
@splitarr=拆分（/$regex/，$inputline）；
}
$scalar=@splitarr；
打印$scalar；

字符串为：

你好，你们想在这个项目上工作吗？但我想在美国工作

它给出的输出是7。但是对于字符串：

你好，你们想在这个项目上工作吗？但我想在美国工作

输出为5。我怀疑我的正则表达式有问题。有人能帮我指出问题所在吗。

使用

split

，您正在计算与之间的子字符串。请改用match：

#!/usr/bin/perl
use warnings;
use strict;

my $regex = qr/[Tt]he/;

for my $string ('Hello the how are you the wanna work on the project but i the u the The',
                'Hello the how are you the wanna work on the project but i the u the',
                'the theological cathedral'
               ) {
    my $count = () = $string =~ /$regex/g;
    print $count, "\n";

    my @between = split /$regex/, $string;
    print 0 + @between, "\n";

    print join '|', @between;
    print "\n";
}

请注意，对于您提到的两个输入，两种方法都返回相同的数字（第一种方法返回6，而不是7）。

使用

split

，您正在计算与之间的子字符串。请改用match：

#!/usr/bin/perl
use warnings;
use strict;

my $regex = qr/[Tt]he/;

for my $string ('Hello the how are you the wanna work on the project but i the u the The',
                'Hello the how are you the wanna work on the project but i the u the',
                'the theological cathedral'
               ) {
    my $count = () = $string =~ /$regex/g;
    print $count, "\n";

    my @between = split /$regex/, $string;
    print 0 + @between, "\n";

    print join '|', @between;
    print "\n";
}

请注意，这两种方法都为您提到的两个输入返回相同的数字（第一种方法返回6，而不是7）。

您需要的是“countof”操作符来计算匹配数：

my $string = "Hello the how are you the wanna work on the project but i the u the The";
my $count = () = $string =~/[Tt]he/g;
print $count;

如果只想选择单词

和或和，请添加单词边界：
my $string = "Hello the how are you the wanna work on the project but i the u the The";
my $count = () = $string =~/\b[Tt]he\b/g;
print $count;

您需要的是“countof”操作符来计算匹配数：
my $string = "Hello the how are you the wanna work on the project but i the u the The";
my $count = () = $string =~/[Tt]he/g;
print $count;

如果只想选择单词和或和，请添加单词边界：
my $string = "Hello the how are you the wanna work on the project but i the u the The";
my $count = () = $string =~/\b[Tt]he\b/g;
print $count;

我得到了第一个字符串的正确数字-6
但是，您的方法是错误的，因为如果您计算通过在正则表达式模式上拆分得到的片段数，它将根据单词是否出现在字符串开头给出不同的值。您还应该将单词边界\b
放入正则表达式中，以防止正则表达式与ke理论

此外，无需转义方括号，您可以使用/i
修饰符进行大小写无关的匹配
试试这样吧
use strict;
use warnings;

print 'Enter the String: ';
my $inputline = <>;
chomp $inputline;

my $regex = 'the';

if ( $inputline ne '' ) {
    my @matches = $inputline =~ /\b$regex\b/gi;
    print scalar @matches, " occurrences\n";
}

使用严格；
使用警告；
打印“输入字符串：”；
我的$inputline=；
chomp$inputline；
my$regex='the'；
如果（$inputline ne“”）{
my@matches=$inputline=~/\b$regex\b/gi；
打印标量@匹配项，“出现次数\n”；
}
我得到了第一个字符串的正确数字-6-
但是，您的方法是错误的，因为如果您计算通过在正则表达式模式上拆分得到的片段数，它将根据单词是否出现在字符串开头给出不同的值。您还应该将单词边界\b
放入正则表达式中，以防止正则表达式与ke理论

此外，无需转义方括号，您可以使用/i
修饰符进行大小写无关的匹配
试试这样吧
use strict;
use warnings;

print 'Enter the String: ';
my $inputline = <>;
chomp $inputline;

my $regex = 'the';

if ( $inputline ne '' ) {
    my @matches = $inputline =~ /\b$regex\b/gi;
    print scalar @matches, " occurrences\n";
}

使用严格；
使用警告；
打印“输入字符串：”；
我的$inputline=；
chomp$inputline；
my$regex='the'；
如果（$inputline ne“”）{
my@matches=$inputline=~/\b$regex\b/gi；
打印标量@匹配项，“出现次数\n”；
}
以下代码段使用代码副作用递增计数器，然后是始终失败的匹配以继续搜索。它为重叠的匹配生成正确答案（例如，“aaaa”包含“aa”3次，而不是2次）。基于拆分的答案没有得到正确答案
my $i;
my $string;

$i = 0;
$string = "aaaa";
$string =~ /aa(?{$i++})(?!)/;
print "'$string' contains /aa/ x $i (should be 3)\n";

$i = 0;
$string = "Hello the how are you the wanna work on the project but i the u the The";
$string =~ /[tT]he(?{$i++})(?!)/;
print "'$string' contains /[tT]he/ x $i (should be 6)\n";

$i = 0;
$string = "Hello the how are you the wanna work on the project but i the u the";
$string =~ /[tT]he(?{$i++})(?!)/;
print "'$string' contains /[tT]he/ x $i (should be 5)\n";

以下代码段使用代码副作用递增计数器，然后是始终失败的匹配以继续搜索。它为重叠的匹配生成正确答案（例如，“aaaa”包含“aa”3次，而不是2次）。基于拆分的答案没有得到正确答案
my $i;
my $string;

$i = 0;
$string = "aaaa";
$string =~ /aa(?{$i++})(?!)/;
print "'$string' contains /aa/ x $i (should be 3)\n";

$i = 0;
$string = "Hello the how are you the wanna work on the project but i the u the The";
$string =~ /[tT]he(?{$i++})(?!)/;
print "'$string' contains /[tT]he/ x $i (should be 6)\n";

$i = 0;
$string = "Hello the how are you the wanna work on the project but i the u the";
$string =~ /[tT]he(?{$i++})(?!)/;
print "'$string' contains /[tT]he/ x $i (should be 5)\n";

有5个单词，你应该使用match not split。你应该在第一个单词上得到7个，在第二个单词上得到6个。试试print@splitter
或者更好的方法，使用Data:：Dumper。有5个单词，你应该使用match not split。你应该在第一个单词上得到7个，在第二个单词上得到6个。试试print@splitter
或者更好的方法使用Data:：Dumper。感谢所有的答案。它让我了解了我正在做的事情。我学到了很多。：）感谢所有的答案。它让我了解了我正在做的事情。我学到了很多。：）