Ruby搜索大型json中的匹配项
我有一个相当大的json文件,里面有一个剧本中的短句。我试图将关键字与json文件中的关键字进行匹配,以便从json文件中抽出一行 json文件结构如下所示:Ruby搜索大型json中的匹配项,json,ruby,regex,search,ruby-on-rails-5,Json,Ruby,Regex,Search,Ruby On Rails 5,我有一个相当大的json文件,里面有一个剧本中的短句。我试图将关键字与json文件中的关键字进行匹配,以便从json文件中抽出一行 json文件结构如下所示: [ "Yeah, well I wasn't looking for a long term relationship. I was on TV. ", "Ok, yeah, you guys got to put a negative spin on everything. ", "No no I'm not ready, thi
[
"Yeah, well I wasn't looking for a long term relationship. I was on TV. ",
"Ok, yeah, you guys got to put a negative spin on everything. ",
"No no I'm not ready, things are starting to happen. ",
"Ok, it's forgotten. ",
"Yeah, ok. ",
"Hey hey, whoa come on give me a hug... "
]
(加上更多……总共2444行)
到目前为止,我有这个,但它没有任何匹配
# screenplay is read in from a json file
@screenplay_lines = JSON.parse(@jsonfile.read)
@text_to_find = ["relationship","negative","hug"]
@matching_results = []
@screenplay_lines.each do |line|
if line.match(Regexp.union(@text_to_find))
@matching_results << line
end
end
puts "found #{@matching_results.length} matches..."
puts @matching_results
#剧本是从json文件读入的
@剧本=JSON.parse(@jsonfile.read)
@text_to_find=[“关系”、“消极”、“拥抱”]
@匹配结果=[]
@剧本每行|
if line.match(Regexp.union(@text\u to\u find))
@匹配结果有一个可能的解决方案,请尝试以下方法:
是的,Regexp匹配比只检查字符串是否包含在文本行中要慢。但这也取决于关键字的数量和行的长度等等。所以最好至少运行一个微基准测试
lines = [
"Yeah, well I wasn't looking for a long term relationship. I was on TV. ",
"Ok, yeah, you guys got to put a negative spin on everything. ",
"No no I'm not ready, things are starting to happen. ",
"Ok, it's forgotten. ",
"Yeah, ok. ",
"Hey hey, whoa come on give me a hug... "
]
keywords = ["relationship","negative","hug"]
def find1(lines, keywords)
regexp = Regexp.union(keywords)
lines.select { |line| regexp.match(line) }
end
def find2(lines, keywords)
lines.select { |line| keywords.any? { |keyword| line.include?(keyword) } }
end
def find3(lines, keywords)
regexp = Regexp.union(keywords)
lines.select { |line| regexp.match?(line) }
end
require 'benchmark/ips'
Benchmark.ips do |x|
x.compare!
x.report('match') { find1(lines, keywords) }
x.report('include?') { find2(lines, keywords) }
x.report('match?') { find3(lines, keywords) }
end
在此设置中,include?
变体的速度要快得多:
Comparison:
include?: 288083.4 i/s
match?: 91505.7 i/s - 3.15x slower
match: 65866.7 i/s - 4.37x slower
请注意:
- 我已经将regexp的创建移出了循环。不需要为每一行创建它。创建regexp是一项昂贵的操作(您的变体在循环外以regexp速度的1/5计时)
match?
仅在Ruby 2.4+中可用,速度更快,因为它不分配任何匹配结果(无副作用)
我不会太担心2500行文本的性能。如果速度足够快,那么停止寻找更好的解决方案 谢谢,这看起来很有趣,但我想在我求助于第三方gems之前,看看是否有一个代码更少的解决方案是可行的。谢谢。伟大的洞察力。find1()方法工作正常