Ruby搜索大型json中的匹配项_Json_Ruby_Regex_Search_Ruby On Rails 5

Ruby搜索大型json中的匹配项

json ruby regex search

Ruby搜索大型json中的匹配项,json,ruby,regex,search,ruby-on-rails-5,Json,Ruby,Regex,Search,Ruby On Rails 5,我有一个相当大的json文件，里面有一个剧本中的短句。我试图将关键字与json文件中的关键字进行匹配，以便从json文件中抽出一行 json文件结构如下所示： [ "Yeah, well I wasn't looking for a long term relationship. I was on TV. ", "Ok, yeah, you guys got to put a negative spin on everything. ", "No no I'm not ready, thi

我有一个相当大的json文件，里面有一个剧本中的短句。我试图将关键字与json文件中的关键字进行匹配，以便从json文件中抽出一行

json文件结构如下所示：

[
 "Yeah, well I wasn't looking for a long term relationship. I was on TV. ",
 "Ok, yeah, you guys got to put a negative spin on everything. ",
 "No no I'm not ready, things are starting to happen. ",
 "Ok, it's forgotten. ",
 "Yeah, ok. ",
 "Hey hey, whoa come on give me a hug... "
]

（加上更多……总共2444行）

到目前为止，我有这个，但它没有任何匹配

# screenplay is read in from a json file
@screenplay_lines = JSON.parse(@jsonfile.read)
@text_to_find = ["relationship","negative","hug"]

@matching_results = []
@screenplay_lines.each do |line|
  if line.match(Regexp.union(@text_to_find))
    @matching_results << line
  end
end

puts "found #{@matching_results.length} matches..."
puts @matching_results

#剧本是从json文件读入的
@剧本=JSON.parse（@jsonfile.read）
@text_to_find=[“关系”、“消极”、“拥抱”]
@匹配结果=[]
@剧本每行|
if line.match（Regexp.union（@text\u to\u find））
@匹配结果有一个可能的解决方案，请尝试以下方法：
是的，Regexp匹配比只检查字符串是否包含在文本行中要慢。但这也取决于关键字的数量和行的长度等等。所以最好至少运行一个微基准测试
lines = [
 "Yeah, well I wasn't looking for a long term relationship. I was on TV. ",
 "Ok, yeah, you guys got to put a negative spin on everything. ",
 "No no I'm not ready, things are starting to happen. ",
 "Ok, it's forgotten. ",
 "Yeah, ok. ",
 "Hey hey, whoa come on give me a hug... "
]
keywords = ["relationship","negative","hug"]


def find1(lines, keywords)
  regexp = Regexp.union(keywords)

  lines.select { |line| regexp.match(line) }
end


def find2(lines, keywords)
  lines.select { |line| keywords.any? { |keyword| line.include?(keyword) } }
end

def find3(lines, keywords)
  regexp = Regexp.union(keywords)

  lines.select { |line| regexp.match?(line) }
end

require 'benchmark/ips'

Benchmark.ips do |x|
  x.compare!
  x.report('match') { find1(lines, keywords) }
  x.report('include?') { find2(lines, keywords) }
  x.report('match?') { find3(lines, keywords) }
end

在此设置中，include？
变体的速度要快得多：
Comparison:
            include?:   288083.4 i/s
              match?:    91505.7 i/s - 3.15x  slower
               match:    65866.7 i/s - 4.37x  slower

请注意:

我已经将regexp的创建移出了循环。不需要为每一行创建它。创建regexp是一项昂贵的操作（您的变体在循环外以regexp速度的1/5计时）
match？
仅在Ruby 2.4+中可用，速度更快，因为它不分配任何匹配结果（无副作用）

我不会太担心2500行文本的性能。如果速度足够快，那么停止寻找更好的解决方案
 谢谢，这看起来很有趣，但我想在我求助于第三方gems之前，看看是否有一个代码更少的解决方案是可行的。谢谢。伟大的洞察力。find1（）方法工作正常