Regex 在Perl中从哈希搜索子字符串匹配_Regex_Perl_Hash_Data Mapping

Regex 在Perl中从哈希搜索子字符串匹配

regex perl hash

Regex 在Perl中从哈希搜索子字符串匹配,regex,perl,hash,data-mapping,Regex,Perl,Hash,Data Mapping,我有一个文件，其中包含我需要在给定字符串中匹配的子字符串。这些给定字符串取自另一个包含实际数据的文件。这是csv文件中的一列。如果给定的字符串具有这些子字符串中的任何一个子字符串，它将被标记为TRUE。实现这一点的最佳方法是Perl 到目前为止，我所做的就是这样。似乎仍然存在一些问题： #!/usr/bin/perl use warnings; use strict; if ($#ARGV+1 != 1) { print "usage: $0 inputfilename\n"; exit

我有一个文件，其中包含我需要在给定字符串中匹配的子字符串。这些给定字符串取自另一个包含实际数据的文件。这是csv文件中的一列。如果给定的字符串具有这些子字符串中的任何一个子字符串，它将被标记为TRUE。实现这一点的最佳方法是Perl

到目前为止，我所做的就是这样。似乎仍然存在一些问题：

#!/usr/bin/perl

use warnings;
use strict;

if ($#ARGV+1 != 1) {
 print "usage: $0 inputfilename\n";
 exit;
}

our $inputfile = $ARGV[0];
our $outputfile = "$inputfile" . '.ads';
our $ad_file = "C:/test/easylist.txt";  
our %ads_list_hash = ();

our $lines = 0;

# Create a list of substrings in the easylist.txt file
 open ADS, "$ad_file" or die "can't open $ad_file";
 while(<ADS>) {
        chomp;
        $ads_list_hash{$lines} = $_;
        $lines ++;
 }  

 for(my $count = 0; $count < $lines; $count++) {
            print "$ads_list_hash{$count}\n";
       }
 open IN,"$inputfile" or die "can't open $inputfile";       
 while(<IN>) {      
       chomp;       
       my @hhfile = split /,/;       
       for(my $count = 0; $count < $lines; $count++) {
            print "$hhfile[10]\t$ads_list_hash{$count}\n";

            if($hhfile[9] =~ /$ads_list_hash{$count}/) {
                print "TRUE !\n";
                last;
            }
       }
 }

 close IN;

#/usr/bin/perl
使用警告；
严格使用；
如果（$#ARGV+1！=1）{
打印“用法：$0 inputfilename\n”；
出口
}
我们的$inputfile=$ARGV[0]；
我们的$outputfile=“$inputfile”。”。广告",；
我们的$ad_file=“C:/test/easylist.txt”；
我们的%ads_list_hash=（）；
我们的$line=0；
#在easylist.txt文件中创建子字符串列表
打开广告，“$ad_文件”或死亡“无法打开$ad_文件”；
while（）{
咀嚼；
$ads_list_hash{$lines}=$\u；
$lines++；
}  
对于（我的$count=0；$count<$lines；$count++）{
打印“$ads\u list\u hash{$count}\n”；
}
在“$inputfile”中打开，或在“无法打开$inputfile”中死亡；
while（）{
咀嚼；
我的@hhfile=split/，/；
对于（我的$count=0；$count<$lines；$count++）{
打印“$hhfile[10]\t$ads\u list\u hash{$count}\n”；
如果（$hhfile[9]=~/$ads\u list\u hash{$count}/）{
打印“真！\n”；
最后；
}
}
}
接近；

请参见-逗号分隔值

use 5.010;
use Text::CSV;
use Data::Dumper;
my @rows;
my %match;
my @substrings = qw/Hello Stack overflow/;
my $csv = Text::CSV->new ( { binary => 1 } )  # should set binary attribute.
                 or die "Cannot use CSV: ".Text::CSV->error_diag ();
open my $fh, "<:encoding(utf8)", "test.csv" or die "test.csv: $!";
while ( my $row = $csv->getline( $fh ) ) {
        if($row->[0] ~~ @substrings){ # 1st field 
            say "match " ;
            $match{$row->[0]} = 1;
        }
 }
$csv->eof or $csv->error_diag();
close $fh;
print Dumper(\%match);

使用5.010；
使用Text：：CSV；
使用数据：：转储程序；
我的@行；
我的%匹配；
my@substring=qw/Hello堆栈溢出/；
我的$csv=Text:：csv->new（{binary=>1}）#应该设置binary属性。
或die“无法使用CSV:”.Text:：CSV->error_diag（）；
打开我的$fh，“请参见-逗号分隔值
use 5.010;
use Text::CSV;
use Data::Dumper;
my @rows;
my %match;
my @substrings = qw/Hello Stack overflow/;
my $csv = Text::CSV->new ( { binary => 1 } )  # should set binary attribute.
                 or die "Cannot use CSV: ".Text::CSV->error_diag ();
open my $fh, "<:encoding(utf8)", "test.csv" or die "test.csv: $!";
while ( my $row = $csv->getline( $fh ) ) {
        if($row->[0] ~~ @substrings){ # 1st field 
            say "match " ;
            $match{$row->[0]} = 1;
        }
 }
$csv->eof or $csv->error_diag();
close $fh;
print Dumper(\%match);

使用5.010；
使用Text：：CSV；
使用数据：：转储程序；
我的@行；
我的%匹配；
my@substring=qw/Hello堆栈溢出/；
我的$csv=Text:：csv->new（{binary=>1}）#应该设置binary属性。
或die“无法使用CSV:”.Text:：CSV->error_diag（）；
打开my$fh，“您可以使用selectcol\u arrayref或fetchrow*和循环来获取要搜索的单词数组。然后通过将该数组与“\b”）（？：\b”和“（？：\b”和“\b”合并（或更适合您的需要的东西）来构建正则表达式模式。
您可以使用selectcol\u arrayref或fetchrow以及一个循环来获取要搜索的单词数组。然后通过将该数组与“\b”）（？：\b”和“\b”合并来构建正则表达式模式（？：\b'和'\b'）（或更适合您的需要的东西）。
这里有一些经过清理的代码，它将与您发布的代码执行相同的操作，但它不会打印$hhfile[10]
以及每个广告模式，然后再测试它们；如果您需要该输出，那么您必须循环所有模式，并以基本相同的方式单独测试每个模式。（尽管，即使在这种情况下，对于我的$count（0..$行），如果您的循环是会更好）
而不是（…；…；…）
）
我没有单独测试每个模式，而是使用了，它将构建一个单独的模式，相当于一次测试所有单独的子字符串。智能匹配操作符（~
）在Nikhil中，Jain的答案在使用时与他的答案基本相同，但它需要Perl 5.10或更高版本，而如果您使用的是5.8或（但愿如此！）5.6，则Regexp:：Assembly仍然适用
！/usr/bin/env perl
使用警告；
严格使用；
使用Regexp:：Assemble；
die“用法：$0 inputfilename\n”，除非@ARGV==1；
my$inputfile=$ARGV[0]；
my$outputfile=$inputfile.'.ads'；
my$ad_file=“C:/test/easylist.txt”；
我的@ad_列表；
#在easylist.txt文件中创建子字符串列表
打开我的$ads_fh，“这里有一些经过清理的代码，它们与您发布的代码的作用相同，只是它不打印$hh文件[10]
以及每个广告模式，然后再测试它们；如果您需要该输出，那么您必须循环所有模式，并以基本相同的方式单独测试每个模式。（尽管，即使在这种情况下，对于我的$count（0..$行），如果您的循环是会更好）
而不是（…；…；…）
）
我没有单独测试每个模式，而是使用了，它将构建一个单独的模式，相当于一次测试所有单独的子字符串。智能匹配操作符（~
）在Nikhil中，Jain的答案在使用时与他的答案基本相同，但它需要Perl 5.10或更高版本，而如果您使用的是5.8或（但愿如此！）5.6，则Regexp:：Assembly仍然适用
！/usr/bin/env perl
使用警告；
严格使用；
使用Regexp:：Assemble；
die“用法：$0 inputfilename\n”，除非@ARGV==1；
my$inputfile=$ARGV[0]；
my$outputfile=$inputfile.'.ads'；
my$ad_file=“C:/test/easylist.txt”；
我的@ad_列表；
#在easylist.txt文件中创建子字符串列表
打开我的$ads_fh，“@Ed我已经输入了我的代码。但是仍然有一些错误。但是它有很多错误。@Ed我输入了我的代码。但是仍然有一些错误。但是它有很多错误。