Regex Perl：从数组中搜索文本文件中的关键字_Regex_Arrays_Perl

Regex Perl：从数组中搜索文本文件中的关键字

regex arrays perl

Regex Perl：从数组中搜索文本文件中的关键字,regex,arrays,perl,Regex,Arrays,Perl,如何使用正则表达式中数组中的关键字搜索文件我试图查看一个文本文件，看看关键字是否出现以及出现在哪里。有两个文件keywords.txt keyword.txt word1 word2 word3 filestosearchon.txt a lot of words that go on and one and contain linebreaks and linebreaks (up to 100000 characters) 我想找到关键字和匹配的位置。这对一个单词有效，但我无法理解

如何使用正则表达式中数组中的关键字搜索文件

我试图查看一个文本文件，看看关键字是否出现以及出现在哪里。有两个文件keywords.txt

keyword.txt
word1
word2
word3

filestosearchon.txt
a lot of words that go on and one and contain linebreaks and linebreaks (up to 100000   characters)

我想找到关键字和匹配的位置。这对一个单词有效，但我无法理解如何在正则表达式上迭代关键字

#!/usr/bin/perl

# open profanity list
open(FILE, "keywords.txt") or die("Unable to open file");
@keywords = <FILE>; 
close(FILE);

# open text file
local $/=undef; 
open(txt, "filetosearchon.txt") or die("Unable to open file");
$txt = <txt>;

$regex = "keyword";


push @section,[length($`),length($&),$1]    
while ($txt =~ m/$regex/g);

foreach $element(@section)  
{
print (join(", ",@$element), $regex, "\n");    
}

#/usr/bin/perl
#公开亵渎清单
打开（文件“keywords.txt”）或死亡（“无法打开文件”）；
@关键词=；
关闭（文件）；
#打开文本文件
本地$/=undef；
打开（txt，“filetosearchon.txt”）或死亡（“无法打开文件”）；
$txt=；
$regex=“关键字”；
推送@section，[length（$`），length（$&），$1]
而（$txt=~m/$regex/g）；
foreach$元素（@section）
{
打印（join（“，”，@$element），$regex，“\n”）；
}

如何在while循环中迭代数组中的关键字以获得匹配的关键字和位置

谢谢你的帮助。谢谢

一种方法是只构建一个包含每个单词的正则表达式：

(alpha|bravo|charlie|delta|echo|foxtrot|...|zulu)

Perl的正则表达式编译器非常聪明，它会尽可能地减少这种情况，因此正则表达式将比您想象的更高效。例如，以下正则表达式：

(cat|rat|sat|mat)

将编译为：

(c|r|s|m)at

这是有效的运行。这种方法可能优于“依次搜索每个关键字”的方法，因为它只需要对输入字符串进行一次遍历；这种简单的方法要求每个要搜索的关键字只通过一次

顺便说一下；如示例代码所示，如果您正在构建亵渎过滤器，请记住对故意拼写错误进行解释：“pron”、“p0rn”等。

一种方法是只构建一个包含每个单词的正则表达式：

(alpha|bravo|charlie|delta|echo|foxtrot|...|zulu)

Perl的正则表达式编译器非常聪明，它会尽可能地减少这种情况，因此正则表达式将比您想象的更高效。例如，以下正则表达式：

(cat|rat|sat|mat)

将编译为：

(c|r|s|m)at

顺便说一下；如示例代码所示，如果您正在构建亵渎过滤器，请记住考虑故意拼写错误：“pron”、“p0rn”等。

请尝试

grep

：

@words = split(/\s+/, $txt);

for ($i = 0; $i < scalar(@words); ++$i) {
    print "word \#$i\n" if grep(/$words[$i]/, @keywords);
}

@words=split（/\s+/，$txt）；
对于（$i=0；$i<标量（@words）；++$i）{
如果grep（/$words[$i]/，@keywords），则打印“word\\\\$i\n”；
}

将为您提供在文本字符串中找到关键字的单词位置。这可能比基于字符的位置更有帮助，也可能没有帮助。

试试

grep

：

@words = split(/\s+/, $txt);

for ($i = 0; $i < scalar(@words); ++$i) {
    print "word \#$i\n" if grep(/$words[$i]/, @keywords);
}

@words=split（/\s+/，$txt）；
对于（$i=0；$i<标量（@words）；++$i）{
如果grep（/$words[$i]/，@keywords），则打印“word\\\\$i\n”；
}

将为您提供在文本字符串中找到关键字的单词位置。这可能比基于字符的位置更有帮助，也可能没有帮助。

我不确定您期望的输出是什么，但类似的东西可能会有用。我将关键字保存在散列中，读取下一个文件，将每一行拆分为单词，并在散列中搜索每一行

script.pl的内容

：

use warnings;
use strict;

die qq[Usage: perl $0 <keyword-file> <search-file>\n] unless @ARGV == 2;

open my $fh, q[<], shift or die $!;

my %keyword = map { chomp; $_ => 1 } <$fh>;

while ( <> ) {
        chomp;
        my @words = split;
        for ( my $i = 0; $i <= $#words; $i++ ) {
                if ( $keyword{ $words[ $i ] } ) {
                        printf qq[Line: %4d\tWord position: %4d\tKeyword: %s\n], 
                                $., $i, $words[ $i ];
                }
        }
}

输出应与此类似：

Line:    7      Word position:    7     Keyword: will
Line:    8      Word position:    8     Keyword: the
Line:    8      Word position:   10     Keyword: will
Line:   10      Word position:    4     Keyword: the
Line:   14      Word position:    1     Keyword: compile
Line:   18      Word position:    9     Keyword: the
Line:   20      Word position:    2     Keyword: the
Line:   20      Word position:    5     Keyword: the
Line:   22      Word position:    1     Keyword: the
Line:   22      Word position:   25     Keyword: the

我不确定您期望的输出是什么，但类似的东西可能会有用。我将关键字保存在散列中，读取下一个文件，将每一行拆分为单词，并在散列中搜索每一行

script.pl的内容

：

use warnings;
use strict;

die qq[Usage: perl $0 <keyword-file> <search-file>\n] unless @ARGV == 2;

open my $fh, q[<], shift or die $!;

my %keyword = map { chomp; $_ => 1 } <$fh>;

while ( <> ) {
        chomp;
        my @words = split;
        for ( my $i = 0; $i <= $#words; $i++ ) {
                if ( $keyword{ $words[ $i ] } ) {
                        printf qq[Line: %4d\tWord position: %4d\tKeyword: %s\n], 
                                $., $i, $words[ $i ];
                }
        }
}

输出应与此类似：

Line:    7      Word position:    7     Keyword: will
Line:    8      Word position:    8     Keyword: the
Line:    8      Word position:   10     Keyword: will
Line:   10      Word position:    4     Keyword: the
Line:   14      Word position:    1     Keyword: compile
Line:   18      Word position:    9     Keyword: the
Line:   20      Word position:    2     Keyword: the
Line:   20      Word position:    5     Keyword: the
Line:   22      Word position:    1     Keyword: the
Line:   22      Word position:   25     Keyword: the

如果只需要将keyword.txt中的整词与filestosearch.txt中的整词进行匹配，则可能不需要正则表达式。我只需要创建一个散列，关键字作为键，1作为值。然后尝试在哈希中的filestosearchon.txt中查找每个单词。如果查找成功，则存在匹配项。@BrianSwift:可能不是最有效的解决方案，因为它要求每个关键字对字符串进行一次遍历。有限自动机方法（即正则表达式）只需要一次遍历。@李昂业：我的方法只需要一次遍历输入字符串/文件，将其解析为单词，并尝试在使用关键字作为键的哈希中查找每个单词。这种方法的一个好处是关键字可以是正则表达式，而不仅仅是固定字符串。但是，使用regexp可能需要语法只匹配整个单词，这样

sex

就不会匹配

misexplain

@BrianSwift:Whoops，稍微误读一下您提出的方法。我同意只需一次就可以将所有单词添加到哈希中，但OP还想知道匹配的位置（如果匹配的话）。如果只需要将keyword.txt中的整词与filestosearch.txt中的整词进行匹配，则可能不需要正则表达式。我只需要创建一个散列，关键字作为键，1作为值。然后尝试在哈希中的filestosearchon.txt中查找每个单词。如果查找成功，则存在匹配项。@BrianSwift:可能不是最有效的解决方案，因为它要求每个关键字对字符串进行一次遍历。有限自动机方法（即正则表达式）只需要一次遍历。@李昂业：我的方法只需要一次遍历输入字符串/文件，将其解析为单词，并尝试在使用关键字作为键的哈希中查找每个单词。这种方法的一个好处是关键字可以是正则表达式，而不仅仅是固定字符串。但是，使用regexp可能需要语法只匹配整个单词，这样

sex

就不会匹配

misexplain

@BrianSwift:Whoops，稍微误读一下您提出的方法。我同意只需一次就可以将所有单词添加到哈希中，但OP还想知道匹配发生在哪里（如果发生）