Ruby 正则表达式从文本中解析出数据
我使用Ruby 2.2解析以下文本:Ruby 正则表达式从文本中解析出数据,ruby,regex,Ruby,Regex,我使用Ruby 2.2解析以下文本: [key1: this is a bunch of text that can span multiple lines. key2: foo key2: bar key3: this can span multiple lines as well ] 放入如下所示的哈希数组: [ key1: "this is a bunch of text that can span multiple lines." key2: ["foo",
[key1: this is a bunch of text that can
span multiple lines.
key2: foo
key2: bar
key3: this can span multiple lines
as well
]
放入如下所示的哈希数组:
[
key1: "this is a bunch of text that can span multiple lines."
key2: ["foo", "bar"]
key3: "this can span multiple lines as well"
]
我的第一个目标是使用正则表达式解析出键/值对,这就是我一直坚持的目标:
/\[((key1|key2|key3): (.+?))+(?=(?:key1:|key2:|key3:|\]))/m
它不起作用,因为我用来查找下一个键或结束括号的先行词似乎与文本匹配。我的理解是,事实并非如此
如有任何建议,将不胜感激。谢谢。要更改的一件事是使第二组键不被捕获:
\[((key1 | key2 | key3):(.+)(?=(?:key1:| key2:| key3:|\])
没问题!如果您也可以使外部组不捕获(除非您需要“key:whatever text”\[(?:(key1 | key2 | key3):(.+)(?=(?:key1:| key2:| key3:|\])
听起来不错,但无论哪种方式,rubular都是测试regexp的好网站。)您对此感兴趣吗@小心点,我假设结束括号有自己的行。但有一点变化:我喜欢这个解决方案的简单性。谢谢
data = %Q|[key1: this is a bunch of text that can
span multiple lines.
key2: foo
key2: bar
key3: this can span multiple lines
as well
]|
p data[1..-2] #Remove square brackets [...]
.split(/(key\d):\s+/)[1..-1] #regexp out keys and values. (And get rid of initial empty string)
.each_slice(2) #Group into key-value lists
.group_by(&:shift) # Group by first values