Ruby Regexp多个匹配项

Ruby Regexp多个匹配项,ruby,regex,Ruby,Regex,我有一行内容如下: "word1 word2 word3 (compound word) ..." 我需要一个正则表达式将单词分隔成一个数组,将括号中的单词视为单个单词,其余单词用空格分隔。尝试以下方法: 由于拆分也可以使用RegExp,因此很容易根据请求拆分字符串: irb> "word1 word2 word3 (compound word)".split(/ *\((.*)\) *| /) => ["word1", "word2", "word3", "compound w

我有一行内容如下:

"word1 word2 word3 (compound word) ..."
我需要一个正则表达式将单词分隔成一个数组,将括号中的单词视为单个单词,其余单词用空格分隔。

尝试以下方法:


由于拆分也可以使用RegExp,因此很容易根据请求拆分字符串:

irb> "word1 word2 word3 (compound word)".split(/ *\((.*)\) *| /)
=> ["word1", "word2", "word3", "compound word"]
也就是说,被任意数量的空间或单个空间包围的排列分割

BARE_WORD     = /([^\(\s]\S*)/
COMPOUND_WORD = /\(([^\)]*)\)/
SCANNER       = /(?:#{BARE_WORD})|(?:#{COMPOUND_WORD})/

def split_bare_and_parenthesized_words str
  str.scan(SCANNER).flat_map(&:compact)
end

split_bare_and_parenthesized_words "word1 word2 word3 (compound word) ..."
#=> ["word1", "word2", "word3", "compound word", "..."]
此实现不会处理嵌套的参数。这样的条件对于常规语言来说本质上是困难的


(编辑:@DavidUnric指出OP暗示他不想在结果中使用paren。因此,我们添加了捕获和平面映射,以减少到匹配的替代项。)

复合词可以嵌套吗?即
“word1(复合词(另一种复合词))word2”
否,仅一级。括号中的葡萄牙语单词及其翻译列表。结果中应该没有括号,只有它们周围的内容。
BARE_WORD     = /([^\(\s]\S*)/
COMPOUND_WORD = /\(([^\)]*)\)/
SCANNER       = /(?:#{BARE_WORD})|(?:#{COMPOUND_WORD})/

def split_bare_and_parenthesized_words str
  str.scan(SCANNER).flat_map(&:compact)
end

split_bare_and_parenthesized_words "word1 word2 word3 (compound word) ..."
#=> ["word1", "word2", "word3", "compound word", "..."]
"word1 word2 word3 (compound word) ...".scan(/\(.*?\)|\S+/)