Perl 如何使用grep查看数组中的单词是否与字典中的单词列表匹配,并提取准确的单个单词?

Perl 如何使用grep查看数组中的单词是否与字典中的单词列表匹配,并提取准确的单个单词?,perl,grep,Perl,Grep,谢谢你的回答。。。我正在用你所有的答案尝试不同的可能性。 有一件事:在向你们提问时,我说得再清楚不过了,即,我用的是我当地的字体/字符(类似藏文),而不是英语单词 foreach my $word (@list) { if(grep(/$word/, $dict)) # i have dict in scalar ($dict) { print "Matched and Found\n"; } else {

谢谢你的回答。。。我正在用你所有的答案尝试不同的可能性。 有一件事:在向你们提问时,我说得再清楚不过了,即,我用的是我当地的字体/字符(类似藏文),而不是英语单词

foreach my $word (@list)
{
  if(grep(/$word/, $dict))       # i have dict in scalar ($dict)
       {
           print "Matched and Found\n";
        }
    else
      {
         print "Not Matched\n";
      }
}
$dict ="squirrel in my pocket ";

@list =(squirrel,in,me,poc);

foreach my $word (@list)
{
  if(grep(/\b$word\b/, $dict))       
  {
    print "\$word:$word  Matched with     \$dict :$dict \n";
  }
  else
  {
   print "\$word:$word  Not Matched with \$dict :$dict \n";
  }
}
重点是提取单个匹配的精确单词。我尝试使用/\b$word\b/。。。这在我们的剧本中似乎不起作用。。。我们的单词是由多音节组成的,每个音节之间用(.)隔开

$dict ="squirrel in my pocket ";

@list =(squirrel,in,me,poc);

foreach my $word (@list)
{
  if(grep(/\b$word\b/, $dict))       
  {
    print "\$word:$word  Matched with     \$dict :$dict \n";
  }
  else
  {
   print "\$word:$word  Not Matched with \$dict :$dict \n";
  }
}
其他信息:

$dict ="squirrel in my pocket ";

@list =(squirrel,in,me,poc);

foreach my $word (@list)
{
  if(grep(/\b$word\b/, $dict))       
  {
    print "\$word:$word  Matched with     \$dict :$dict \n";
  }
  else
  {
   print "\$word:$word  Not Matched with \$dict :$dict \n";
  }
}
对于初学者来说,藏语句子最具挑战性的特点是单词之间没有分隔。。。。由于单词后面没有空格,读者必须根据上下文和句子中的位置来找出每个单词。在字典中查找这两个字母可能会让你认为这句话是从地球表面开始的。然而,句子的其余部分、上下文以及缺少一个代理格连接词,表明这两个字母本身不是单词,而是单词“昨天”。从这里你可以看出,首先评估一个句子作为一个整体是很好的,通过识别它的各种元素,而不是逐字翻译

$dict ="squirrel in my pocket ";

@list =(squirrel,in,me,poc);

foreach my $word (@list)
{
  if(grep(/\b$word\b/, $dict))       
  {
    print "\$word:$word  Matched with     \$dict :$dict \n";
  }
  else
  {
   print "\$word:$word  Not Matched with \$dict :$dict \n";
  }
}

重点补充。请看

这一点很简单:没有错。
$dict ="squirrel in my pocket ";

@list =(squirrel,in,me,poc);

foreach my $word (@list)
{
  if(grep(/\b$word\b/, $dict))       
  {
    print "\$word:$word  Matched with     \$dict :$dict \n";
  }
  else
  {
   print "\$word:$word  Not Matched with \$dict :$dict \n";
  }
}
我可以在Perl中很好地运行该代码,并且它可以按预期工作。 问题一定在别的地方。您是否在文件顶部使用“use strict;”?

我喜欢

grep { $_ =~ /blah/} @foo
$dict ="squirrel in my pocket ";

@list =(squirrel,in,me,poc);

foreach my $word (@list)
{
  if(grep(/\b$word\b/, $dict))       
  {
    print "\$word:$word  Matched with     \$dict :$dict \n";
  }
  else
  {
   print "\$word:$word  Not Matched with \$dict :$dict \n";
  }
}
这让我以后修改条件比直接修改更容易

grep(/blah/, @foo)
$dict ="squirrel in my pocket ";

@list =(squirrel,in,me,poc);

foreach my $word (@list)
{
  if(grep(/\b$word\b/, $dict))       
  {
    print "\$word:$word  Matched with     \$dict :$dict \n";
  }
  else
  {
   print "\$word:$word  Not Matched with \$dict :$dict \n";
  }
}

但是我看不出你的语法有什么问题。

你的语法没有问题。只是不太完美。事实上,您的代码会说“你好,我有C语言背景!”。因此,首先,我要在
grep
之后去掉paren

$dict ="squirrel in my pocket ";

@list =(squirrel,in,me,poc);

foreach my $word (@list)
{
  if(grep(/\b$word\b/, $dict))       
  {
    print "\$word:$word  Matched with     \$dict :$dict \n";
  }
  else
  {
   print "\$word:$word  Not Matched with \$dict :$dict \n";
  }
}
但真正需要考虑的是正则表达式。如果
@list
包含“sex”,但
@dict
包含“Essex”,该怎么办?我将该正则表达式更改为:

m/^$word$/i
$dict ="squirrel in my pocket ";

@list =(squirrel,in,me,poc);

foreach my $word (@list)
{
  if(grep(/\b$word\b/, $dict))       
  {
    print "\$word:$word  Matched with     \$dict :$dict \n";
  }
  else
  {
   print "\$word:$word  Not Matched with \$dict :$dict \n";
  }
}

不要编写自己的代码来比较@list的每个元素和@dict的每个元素,而是使用一个已经为您完成这项工作的模块,如:

$dict ="squirrel in my pocket ";

@list =(squirrel,in,me,poc);

foreach my $word (@list)
{
  if(grep(/\b$word\b/, $dict))       
  {
    print "\$word:$word  Matched with     \$dict :$dict \n";
  }
  else
  {
   print "\$word:$word  Not Matched with \$dict :$dict \n";
  }
}
我会用它。它在第一个答案后停止处理列表<代码>grep不会这样做

if( defined first { /$word/ } @list ) {
    print "Matched and Found\n";
}
else {
    print "Not Matched\n";
}
$dict ="squirrel in my pocket ";

@list =(squirrel,in,me,poc);

foreach my $word (@list)
{
  if(grep(/\b$word\b/, $dict))       
  {
    print "\$word:$word  Matched with     \$dict :$dict \n";
  }
  else
  {
   print "\$word:$word  Not Matched with \$dict :$dict \n";
  }
}

你的grep语法很好

$dict ="squirrel in my pocket ";

@list =(squirrel,in,me,poc);

foreach my $word (@list)
{
  if(grep(/\b$word\b/, $dict))       
  {
    print "\$word:$word  Matched with     \$dict :$dict \n";
  }
  else
  {
   print "\$word:$word  Not Matched with \$dict :$dict \n";
  }
}
不过,我觉得有必要对你的算法发表评论。这是非常浪费的

$dict ="squirrel in my pocket ";

@list =(squirrel,in,me,poc);

foreach my $word (@list)
{
  if(grep(/\b$word\b/, $dict))       
  {
    print "\$word:$word  Matched with     \$dict :$dict \n";
  }
  else
  {
   print "\$word:$word  Not Matched with \$dict :$dict \n";
  }
}
对于
@list
中的每个单词,您都要在
@dict
上迭代一次

$dict ="squirrel in my pocket ";

@list =(squirrel,in,me,poc);

foreach my $word (@list)
{
  if(grep(/\b$word\b/, $dict))       
  {
    print "\$word:$word  Matched with     \$dict :$dict \n";
  }
  else
  {
   print "\$word:$word  Not Matched with \$dict :$dict \n";
  }
}
将一个数组分配到散列的键并查找散列会更快:

my %lut;
@lut{@list} = ();

for my $word ( @dict ) {
    print exists $lut{$word} ? "Matched and Found\n" : "Not Matched\n";
}
$dict ="squirrel in my pocket ";

@list =(squirrel,in,me,poc);

foreach my $word (@list)
{
  if(grep(/\b$word\b/, $dict))       
  {
    print "\$word:$word  Matched with     \$dict :$dict \n";
  }
  else
  {
   print "\$word:$word  Not Matched with \$dict :$dict \n";
  }
}

散列查找是在固定的时间内进行的,因此没有嵌套的循环,而是一个平面循环。随着单词列表的增长,速度差异应该变得非常明显。

在Perl 5.10中,我们有智能匹配

foreach my $word (@list) {
  say $word ~~ @dict ? 'Matched and Found' : 'Not Matched';
}
$dict ="squirrel in my pocket ";

@list =(squirrel,in,me,poc);

foreach my $word (@list)
{
  if(grep(/\b$word\b/, $dict))       
  {
    print "\$word:$word  Matched with     \$dict :$dict \n";
  }
  else
  {
   print "\$word:$word  Not Matched with \$dict :$dict \n";
  }
}

将字典保存在字符串中并使用grep进行搜索对于任何大小的字典来说都是非常缓慢的。你有没有考虑过在字典中使用哈希?即

$dict = { word1 => 1, word2 => 1....... etc } # for example...

for my $word (@list) 
{ 
   if ($dict->{$word})
   {
      print "Matched\n";
   }
   else
   {
      print "Not matched\n";
   }
}
$dict ="squirrel in my pocket ";

@list =(squirrel,in,me,poc);

foreach my $word (@list)
{
  if(grep(/\b$word\b/, $dict))       
  {
    print "\$word:$word  Matched with     \$dict :$dict \n";
  }
  else
  {
   print "\$word:$word  Not Matched with \$dict :$dict \n";
  }
}
请注意,我并不主张以这种方式创建哈希,这只是一个示例,演示如何将哈希用作字典,其中键是单词,值是常量“true”值。如果匹配必须不区分大小写,则在将字典中的单词插入哈希之前将其小写,并在执行查找之前将$word小写

$dict ="squirrel in my pocket ";

@list =(squirrel,in,me,poc);

foreach my $word (@list)
{
  if(grep(/\b$word\b/, $dict))       
  {
    print "\$word:$word  Matched with     \$dict :$dict \n";
  }
  else
  {
   print "\$word:$word  Not Matched with \$dict :$dict \n";
  }
}
编辑:这里有一些代码可以从每行一个单词的文件中加载字典

open(FH,'dictionary.txt');
$dict = { map {chomp; $_,1} <FH> }
close(FH)
$dict ="squirrel in my pocket ";

@list =(squirrel,in,me,poc);

foreach my $word (@list)
{
  if(grep(/\b$word\b/, $dict))       
  {
    print "\$word:$word  Matched with     \$dict :$dict \n";
  }
  else
  {
   print "\$word:$word  Not Matched with \$dict :$dict \n";
  }
}
open(FH,'dictionary.txt');
$dict={map{chomp;$\u1}
关闭(FH)
说明:

$dict ="squirrel in my pocket ";

@list =(squirrel,in,me,poc);

foreach my $word (@list)
{
  if(grep(/\b$word\b/, $dict))       
  {
    print "\$word:$word  Matched with     \$dict :$dict \n";
  }
  else
  {
   print "\$word:$word  Not Matched with \$dict :$dict \n";
  }
}
  • 在列表上下文中读取 整个文件
  • map函数计算块的值 (括号中的内容)每行
  • 块将删除换行符和 返回两个元素的列表 包含单词和“1”
  • 整个返回的列表用于 初始化散列
  • 对散列的引用存储在 $dict

  • 我对藏文一无所知。下面的示例假设您的词典由单词后跟等号以及每行单词的定义组成

    $dict ="squirrel in my pocket ";
    
    @list =(squirrel,in,me,poc);
    
    foreach my $word (@list)
    {
      if(grep(/\b$word\b/, $dict))       
      {
        print "\$word:$word  Matched with     \$dict :$dict \n";
      }
      else
      {
       print "\$word:$word  Not Matched with \$dict :$dict \n";
      }
    }
    
    它使用以行列表的形式高效地清除文件,对每行进行排序并将其拆分,以将单词作为键,将定义作为
    %dict
    哈希中的值

    $dict ="squirrel in my pocket ";
    
    @list =(squirrel,in,me,poc);
    
    foreach my $word (@list)
    {
      if(grep(/\b$word\b/, $dict))       
      {
        print "\$word:$word  Matched with     \$dict :$dict \n";
      }
      else
      {
       print "\$word:$word  Not Matched with \$dict :$dict \n";
      }
    }
    
    它假设
    @words
    已经包含单个单词,并且不需要从任意文本中识别单词,例如
    “a.a.b.a.b.b.a.a.b.a”
    (请参阅我的评论,指出单词在藏文中不分隔,只分隔音节)

    $dict ="squirrel in my pocket ";
    
    @list =(squirrel,in,me,poc);
    
    foreach my $word (@list)
    {
      if(grep(/\b$word\b/, $dict))       
      {
        print "\$word:$word  Matched with     \$dict :$dict \n";
      }
      else
      {
       print "\$word:$word  Not Matched with \$dict :$dict \n";
      }
    }
    
    要修改代码以从外部文件读取词典,请将
    \*DATA
    替换为文件名

    #!/usr/bin/perl
    
    use strict;
    use warnings;
    
    use File::Slurp;
    
    my @words = qw( a b a.b b.a a.a b.a.b);
    
    my %dict = map { chomp; split /\s*=\s*/ } read_file \*DATA;
    
    for my $word ( @words ) {
        if ( defined(my $defn = $dict{$word}) ) {
            print "'$word' means $defn\n";
        }
        else {
            print "'$word' not found\n";
        }
    }
    
    __DATA__
    a = Letter 1
    b = Letter 2
    a.b = Letter 1 and Letter 2
    b.a = Letter 2 and Letter 1
    a.b.a = Letter 1 and Letter 2 and Letter 1
    b.a.b = Letter 2 and Letter 1 and Letter 2
    
    $dict ="squirrel in my pocket ";
    
    @list =(squirrel,in,me,poc);
    
    foreach my $word (@list)
    {
      if(grep(/\b$word\b/, $dict))       
      {
        print "\$word:$word  Matched with     \$dict :$dict \n";
      }
      else
      {
       print "\$word:$word  Not Matched with \$dict :$dict \n";
      }
    }
    
    输出:

    'a' means Letter 1 'b' means Letter 2 'a.b' means Letter 1 and Letter 2 'b.a' means Letter 2 and Letter 1 'a.a' not found 'b.a.b' means Letter 2 and Letter 1 and Letter 2
    $dict ="squirrel in my pocket ";
    
    @list =(squirrel,in,me,poc);
    
    foreach my $word (@list)
    {
      if(grep(/\b$word\b/, $dict))       
      {
        print "\$word:$word  Matched with     \$dict :$dict \n";
      }
      else
      {
       print "\$word:$word  Not Matched with \$dict :$dict \n";
      }
    }
    
    $word:squirrel  Matched     with  $dict :squirrel in my pocket
    $word:in        Matched     with  $dict :squirrel in my pocket
    $word:me        Not Matched with  $dict :squirrel in my pocket
    $word:poc       Not Matched with  $dict :squirrel in my pocket
    
    “a”指字母1 “b”指字母2 “a.b”指字母1和字母2 “b.a”指字母2和字母1 找不到“a.a” “b.a.b”指字母2、字母1和字母2 您可以使用单词边界来匹配字典中的单词。(一个单词被一个或多个空格包围,页面的开头和结尾除外)

    $dict ="squirrel in my pocket ";
    
    @list =(squirrel,in,me,poc);
    
    foreach my $word (@list)
    {
      if(grep(/\b$word\b/, $dict))       
      {
        print "\$word:$word  Matched with     \$dict :$dict \n";
      }
      else
      {
       print "\$word:$word  Not Matched with \$dict :$dict \n";
      }
    }
    
    你可以用这个

    $dict ="squirrel in my pocket ";
    
    @list =(squirrel,in,me,poc);
    
    foreach my $word (@list)
    {
      if(grep(/\b$word\b/, $dict))       
      {
        print "\$word:$word  Matched with     \$dict :$dict \n";
      }
      else
      {
       print "\$word:$word  Not Matched with \$dict :$dict \n";
      }
    }
    
    输出:

    'a' means Letter 1 'b' means Letter 2 'a.b' means Letter 1 and Letter 2 'b.a' means Letter 2 and Letter 1 'a.a' not found 'b.a.b' means Letter 2 and Letter 1 and Letter 2
    $dict ="squirrel in my pocket ";
    
    @list =(squirrel,in,me,poc);
    
    foreach my $word (@list)
    {
      if(grep(/\b$word\b/, $dict))       
      {
        print "\$word:$word  Matched with     \$dict :$dict \n";
      }
      else
      {
       print "\$word:$word  Not Matched with \$dict :$dict \n";
      }
    }
    
    $word:squirrel  Matched     with  $dict :squirrel in my pocket
    $word:in        Matched     with  $dict :squirrel in my pocket
    $word:me        Not Matched with  $dict :squirrel in my pocket
    $word:poc       Not Matched with  $dict :squirrel in my pocket
    

    这将是下一个合乎逻辑的步骤。是的。。。这就是我的问题。。。我只想要“sex”,它显示了所有包含“sex”的单词。我有一个单词列表,每个单词都在$wordlist中搜索。。。。感谢您的回答…为什么您认为这样可以更轻松地编辑条件?我更喜欢块形式,但只有当您想要有多个语句时才更容易。我可以做一些事情,比如将其更改为
    eq
    运算符,或者调用复杂的匹配类型函数。如果我只使用一般的块运算符,它会更简单。我正在尝试提取完整和精确的单词。。。我甚至试过/^word$/。。。。有一件事:单词/字符不是英文字母表。。。我在用我的本地语言