Ruby 检查数组元素是否部分存在于给定字符串中_Ruby_Arrays

Ruby 检查数组元素是否部分存在于给定字符串中

ruby arrays

Ruby 检查数组元素是否部分存在于给定字符串中,ruby,arrays,Ruby,Arrays,我有一行文字 this is the line 如果该数组中的一个元素： ['hey', 'format', 'qouting', 'this'] 是上面给出的字符串的一部分因此，对于上面的行，它应该返回true 对于这一行你好，我的名字是马丁不应该我知道包含？，但如果它有帮助的话，我不知道如何在这里使用它。您可以将strling拆分为一个数组，并检查数组和新拆分的数组之间的交集，如下所示 >> s = "this is the line" => "this is th

我有一行文字

this is the line

如果该数组中的一个元素：

['hey', 'format', 'qouting', 'this']

是上面给出的字符串的一部分

因此，对于上面的行，它应该返回

true

对于这一行

你好，我的名字是马丁

不应该

我知道

包含？

，但如果它有帮助的话，我不知道如何在这里使用它。

您可以将strling拆分为一个数组，并检查数组和新拆分的数组之间的交集，如下所示

>> s = "this is the line"
=> "this is the line"
>> ['hey', 'format', 'qouting', 'this'].any? { |w| s =~ /#{w}/ }
=> true
>> ['hey', 'format', 'qouting', 'that'].any? { |w| s =~ /#{w}/ }
=> false
>> s2 = 'hello my name is martin'
=> "hello my name is martin"
>> ['hey', 'format', 'qouting', 'this'].any? { |w| s2 =~ /#{w}/ }
=> false

这很方便，因为它将为您提供的不仅仅是真假，它将为您提供匹配的字符串

> "this is the line".split(' ') & ["hey", "format", "quoting", "this"]
=> ["this"]

如果你需要一个真/假，你可以很容易地做到：

> s = "this is the line"
=> "this is the line" 
> intersection = s.split(' ') & ["hey", "format", "quoting", "this"]
=> ["this"] 
> intersection.empty?
=> false

我所知道的测试一个字符串是否包含在另一个字符串中的最简单方法是：

text = 'this is the line'
words = ['hey', 'format', 'qouting', 'this']

words.any? { |w| text[w] }  #=> true

不需要正则表达式或任何复杂的东西

require 'benchmark'

n = 200_000
Benchmark.bm(3) do |x|
  x.report("1:") { n.times { words.any? { |w| text =~ /#{w}/ } } }
  x.report("2:") { n.times { text.split(" ").find { |item| words.include? item } } }
  x.report("3:") { n.times { text.split(' ') & words } }
  x.report("4:") { n.times { words.any? { |w| text[w] } } }
  x.report("5:") { n.times { words.any? { |w| text.include?(w) } } }
end

>>          user     system      total        real
>> 1:   4.170000   0.160000   4.330000 (  4.495925)
>> 2:   0.500000   0.010000   0.510000 (  0.567667)
>> 3:   0.780000   0.030000   0.810000 (  0.869931)
>> 4:   0.480000   0.020000   0.500000 (  0.534697)
>> 5:   0.390000   0.010000   0.400000 (  0.476251)

嗯，那太快了，谢谢。你能解释一下这里发生了什么吗？

s=~/{w}/

？这是一个正则表达式匹配。由于Ruby中的正则表达式支持字符串插值，因此我使用它来从数组中的字符串中创建一个。我应该补充一点，如果一个单词是字符串中较长单词的一部分，那么这也将返回true，因此如果您不想这样做，则必须匹配单词边界

/\b#[w}\b/

。为什么不使用

include？

而不是regex。它会更可读（不知道性能）…像这样

['hey'，'format'，'qouting'，'this']。有吗？{w | s.include？w}

@rubyprince:我发现正则表达式非常可读，特别是像这样简单的正则表达式。当有人谈论匹配字符串中的某个内容时，我几乎总是沿着正则表达式的路线走，因为这就是他们的目的。你也可以走另一条路

arr.any？{| item | str include？item}

是的，这很好：）。我刚开始ruby…仍然在用c#的方式做ruby：（我不觉得

include？

比

text[w]

可读性更好。但是速度要快一点。

text.include？w

表明无论

是否包含在

text[w]中，它都会返回一个布尔值

乍一看可能会被解释为在

文本中给出w
的起始值。如果单词中的单词是字符串文本的一部分，则这也将返回true。与正则表达式解决方案不同，无法使其仅匹配整个单词-我可以看到：）正则表达式模式是测试完整单词匹配的唯一方法，但是该解决方案可能会导致测试运行极慢，除非所使用的模式被正确写入。如果模式锚定到字符串的开头或结尾，则引擎可以进行极快的搜索。如果无法锚定，则引擎会显著减慢速度一个简单的子字符串匹配将击败它。而且，模式越复杂，运行速度就越慢；尝试使用“向前看/向后看”会让情况变得更糟。聪明的程序员将使用基准测试找出最快的路径。我喜欢这种单字检查方法（例如“嘿”）.但是，如果你需要短语检查（例如“hey you”），事情就会分崩离析。
require 'benchmark'

n = 200_000
Benchmark.bm(3) do |x|
  x.report("1:") { n.times { words.any? { |w| text =~ /#{w}/ } } }
  x.report("2:") { n.times { text.split(" ").find { |item| words.include? item } } }
  x.report("3:") { n.times { text.split(' ') & words } }
  x.report("4:") { n.times { words.any? { |w| text[w] } } }
  x.report("5:") { n.times { words.any? { |w| text.include?(w) } } }
end

>>          user     system      total        real
>> 1:   4.170000   0.160000   4.330000 (  4.495925)
>> 2:   0.500000   0.010000   0.510000 (  0.567667)
>> 3:   0.780000   0.030000   0.810000 (  0.869931)
>> 4:   0.480000   0.020000   0.500000 (  0.534697)
>> 5:   0.390000   0.010000   0.400000 (  0.476251)