Regex 如何改进正则表达式中字符与数字的分离
我对正则表达式有点生疏,有以下问题。下面是我想分开的文本Regex 如何改进正则表达式中字符与数字的分离,regex,ruby,Regex,Ruby,我对正则表达式有点生疏,有以下问题。下面是我想分开的文本 INVOICE # 2599 INVOICE 0185570 INVOICE: 1739 INVOICE- 45441 INVOICE:# 1234 INVOICE :# 5678 我需要做的是找到两个完全匹配的。例如,我想得到以下信息: [INVOICE#, 2599] [INVOICE, 0185570] [INVOICE:, 1739] [INVOICE-, 45441] [INVOICE:#, 45441] [INVOICE:#
INVOICE # 2599
INVOICE 0185570
INVOICE: 1739
INVOICE- 45441
INVOICE:# 1234
INVOICE :# 5678
我需要做的是找到两个完全匹配的。例如,我想得到以下信息:
[INVOICE#, 2599]
[INVOICE, 0185570]
[INVOICE:, 1739]
[INVOICE-, 45441]
[INVOICE:#, 45441]
[INVOICE:#, 5678]
到目前为止,我遇到了这些字符的麻烦。:
以及任何其他可以将发票和#分开的字符
数字很简单。我所需要的就是这个
(\d+)
,但是如何获得第一部分?我知道我需要这个(\w+)
,但是这些非单词字符会让我感到厌烦。我可以按正确的方向推吗?这可以解决问题:删除(“”)。扫描(/\d+\d+/)
输出:
[
[0] [
[0] "INVOICE#",
[1] "2599"
],
[1] [
[0] "INVOICE",
[1] "0185570"
],
[2] [
[0] "INVOICE:",
[1] "1739"
],
[3] [
[0] "INVOICE-",
[1] "45441"
],
[4] [
[0] "INVOICE:#",
[1] "1234"
],
[5] [
[0] "INVOICE:#",
[1] "5678"
]
]
您可以使用
\D
匹配非数字。捕获第一个组中的单词和非数字以及第二个组中的数字,然后删除第一个捕获组中的空格。下面是一个可能的外观示例:
text.scan(/(\w+\D+)(\d+)/).each { |group_1,| group_1.delete!(' ') }
#=> [["INVOICE#", "2599"], ["INVOICE", "0185570"], ["INVOICE:", "1739"], ["INVOICE-", "45441"], ["INVOICE:#", "1234"], ["INVOICE:#", "5678"]]
您也可以使用或代替。将\D
替换为\W
(非单词字符)也可以
请记住,\w
等于[A-Za-z0-9\
,也可以匹配数字和下划线
text =<<~END
INVOICE # 2599
INVOICE 0185570
INVOICE: 1739
INVOICE- 45441
INVOICE:# 1234
INVOICE :# 5678
END
gsub
使用的正则表达式为“匹配一个或多个不后跟数字的空格”,(?!\d)
为负前瞻。这与s.gsub(/\s+(?=\D)/,“”)
,“匹配一个或多个空格,后跟一个非数字”略有不同,因为前者会删除每行末尾的换行符,而后者则不会
步骤如下:
enum1 = text.each_line
#=> #<Enumerator: "INVOICE # 2599\nINVOICE 0185570\nINVOICE: 1739\n
# INVOICE- 45441\nINVOICE:# 1234\nINVOICE :#5678\n":each_line>
等等
另一种方法是在字符串被分成几行之前删除后面没有数字的空格:
text.gsub(/\s+(?!\d)/, '').each_line.map(&:split)
#=> [["INVOICE#", "2599"], ["INVOICE", "0185570"], ["INVOICE:", "1739"],
# ["INVOICE-", "45441"], ["INVOICE:#", "1234"], ["INVOICE:#", "5678"]]
谢谢你的评论,但这并不能解决我的问题。我需要
#
和:
字符以及发票。没错,它更优雅。谢谢@CarySwovelandI我明白你的意思。现在将进行编辑。谢谢。您希望获取的数组中的元素[INVOICE:#,45441]
应该是[INVOICE:#,1234]
,因为您选择了一个答案,该答案作为数组的一部分返回。我绞尽脑汁想找出产生这种现象的规则。
enum1 = text.each_line
#=> #<Enumerator: "INVOICE # 2599\nINVOICE 0185570\nINVOICE: 1739\n
# INVOICE- 45441\nINVOICE:# 1234\nINVOICE :#5678\n":each_line>
enum2 = enum1.map
#=> #<Enumerator: #<Enumerator: "INVOICE # 2599\nINVOICE 0185570\nINVOICE: 1739\n
# INVOICE- 45441\nINVOICE:# 1234\nINVOICE :# 5678\n":each_line>:map>
s = enum2.next
#=> "INVOICE # 2599\n"
t = s.gsub(/\s+(?!\d)/,'')
#=> "INVOICE# 2599"
t.split
#=> ["INVOICE#", "2599"]
s = enum2.next
#=> "INVOICE 0185570\n"
t = s.gsub(/\s+(?!\d)/,'')
#=> "INVOICE 0185570"
t.split
#=> ["INVOICE", "0185570"]
text.gsub(/\s+(?!\d)/, '').each_line.map(&:split)
#=> [["INVOICE#", "2599"], ["INVOICE", "0185570"], ["INVOICE:", "1739"],
# ["INVOICE-", "45441"], ["INVOICE:#", "1234"], ["INVOICE:#", "5678"]]