Warning: file_get_contents(/data/phpspider/zhask/data//catemap/5/ruby/23.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Ruby搜索大型json中的匹配项_Json_Ruby_Regex_Search_Ruby On Rails 5 - Fatal编程技术网

Ruby搜索大型json中的匹配项

Ruby搜索大型json中的匹配项,json,ruby,regex,search,ruby-on-rails-5,Json,Ruby,Regex,Search,Ruby On Rails 5,我有一个相当大的json文件,里面有一个剧本中的短句。我试图将关键字与json文件中的关键字进行匹配,以便从json文件中抽出一行 json文件结构如下所示: [ "Yeah, well I wasn't looking for a long term relationship. I was on TV. ", "Ok, yeah, you guys got to put a negative spin on everything. ", "No no I'm not ready, thi

我有一个相当大的json文件,里面有一个剧本中的短句。我试图将关键字与json文件中的关键字进行匹配,以便从json文件中抽出一行

json文件结构如下所示:

[
 "Yeah, well I wasn't looking for a long term relationship. I was on TV. ",
 "Ok, yeah, you guys got to put a negative spin on everything. ",
 "No no I'm not ready, things are starting to happen. ",
 "Ok, it's forgotten. ",
 "Yeah, ok. ",
 "Hey hey, whoa come on give me a hug... "
]
(加上更多……总共2444行)

到目前为止,我有这个,但它没有任何匹配

# screenplay is read in from a json file
@screenplay_lines = JSON.parse(@jsonfile.read)
@text_to_find = ["relationship","negative","hug"]

@matching_results = []
@screenplay_lines.each do |line|
  if line.match(Regexp.union(@text_to_find))
    @matching_results << line
  end
end

puts "found #{@matching_results.length} matches..."
puts @matching_results
#剧本是从json文件读入的
@剧本=JSON.parse(@jsonfile.read)
@text_to_find=[“关系”、“消极”、“拥抱”]
@匹配结果=[]
@剧本每行|
if line.match(Regexp.union(@text\u to\u find))

@匹配结果有一个可能的解决方案,请尝试以下方法:


是的,Regexp匹配比只检查字符串是否包含在文本行中要慢。但这也取决于关键字的数量和行的长度等等。所以最好至少运行一个微基准测试

lines = [
 "Yeah, well I wasn't looking for a long term relationship. I was on TV. ",
 "Ok, yeah, you guys got to put a negative spin on everything. ",
 "No no I'm not ready, things are starting to happen. ",
 "Ok, it's forgotten. ",
 "Yeah, ok. ",
 "Hey hey, whoa come on give me a hug... "
]
keywords = ["relationship","negative","hug"]


def find1(lines, keywords)
  regexp = Regexp.union(keywords)

  lines.select { |line| regexp.match(line) }
end


def find2(lines, keywords)
  lines.select { |line| keywords.any? { |keyword| line.include?(keyword) } }
end

def find3(lines, keywords)
  regexp = Regexp.union(keywords)

  lines.select { |line| regexp.match?(line) }
end

require 'benchmark/ips'

Benchmark.ips do |x|
  x.compare!
  x.report('match') { find1(lines, keywords) }
  x.report('include?') { find2(lines, keywords) }
  x.report('match?') { find3(lines, keywords) }
end
在此设置中,
include?
变体的速度要快得多:

Comparison:
            include?:   288083.4 i/s
              match?:    91505.7 i/s - 3.15x  slower
               match:    65866.7 i/s - 4.37x  slower
请注意:

  • 我已经将regexp的创建移出了循环。不需要为每一行创建它。创建regexp是一项昂贵的操作(您的变体在循环外以regexp速度的1/5计时)
  • match?
    仅在Ruby 2.4+中可用,速度更快,因为它不分配任何匹配结果(无副作用)

我不会太担心2500行文本的性能。如果速度足够快,那么停止寻找更好的解决方案

谢谢,这看起来很有趣,但我想在我求助于第三方gems之前,看看是否有一个代码更少的解决方案是可行的。谢谢。伟大的洞察力。find1()方法工作正常