Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/regex/20.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/meteor/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Regex 如何改进正则表达式中字符与数字的分离_Regex_Ruby - Fatal编程技术网

Regex 如何改进正则表达式中字符与数字的分离

Regex 如何改进正则表达式中字符与数字的分离,regex,ruby,Regex,Ruby,我对正则表达式有点生疏,有以下问题。下面是我想分开的文本 INVOICE # 2599 INVOICE 0185570 INVOICE: 1739 INVOICE- 45441 INVOICE:# 1234 INVOICE :# 5678 我需要做的是找到两个完全匹配的。例如,我想得到以下信息: [INVOICE#, 2599] [INVOICE, 0185570] [INVOICE:, 1739] [INVOICE-, 45441] [INVOICE:#, 45441] [INVOICE:#

我对正则表达式有点生疏,有以下问题。下面是我想分开的文本

INVOICE # 2599
INVOICE 0185570
INVOICE: 1739
INVOICE- 45441
INVOICE:# 1234
INVOICE :# 5678
我需要做的是找到两个完全匹配的。例如,我想得到以下信息:

[INVOICE#, 2599]
[INVOICE, 0185570]
[INVOICE:, 1739]
[INVOICE-, 45441]
[INVOICE:#, 45441]
[INVOICE:#, 5678]
到目前为止,我遇到了这些字符的麻烦。
以及任何其他可以将发票和#分开的字符


数字很简单。我所需要的就是这个
(\d+)
,但是如何获得第一部分?我知道我需要这个
(\w+)
,但是这些非单词字符会让我感到厌烦。我可以按正确的方向推吗?

这可以解决问题:
删除(“”)。扫描(/\d+\d+/)

输出:

[
    [0] [
        [0] "INVOICE#",
        [1] "2599"
    ],
    [1] [
        [0] "INVOICE",
        [1] "0185570"
    ],
    [2] [
        [0] "INVOICE:",
        [1] "1739"
    ],
    [3] [
        [0] "INVOICE-",
        [1] "45441"
    ],
    [4] [
        [0] "INVOICE:#",
        [1] "1234"
    ],
    [5] [
        [0] "INVOICE:#",
        [1] "5678"
    ]
]

您可以使用
\D
匹配非数字。捕获第一个组中的单词和非数字以及第二个组中的数字,然后删除第一个捕获组中的空格。下面是一个可能的外观示例:

text.scan(/(\w+\D+)(\d+)/).each { |group_1,| group_1.delete!(' ') }
#=> [["INVOICE#", "2599"], ["INVOICE", "0185570"], ["INVOICE:", "1739"], ["INVOICE-", "45441"], ["INVOICE:#", "1234"], ["INVOICE:#", "5678"]]
您也可以使用或代替。将
\D
替换为
\W
(非单词字符)也可以

请记住,
\w
等于
[A-Za-z0-9\
,也可以匹配数字和下划线

text =<<~END
INVOICE # 2599
INVOICE 0185570
INVOICE: 1739
INVOICE- 45441
INVOICE:# 1234
INVOICE :# 5678
END
gsub
使用的正则表达式为“匹配一个或多个不后跟数字的空格”,
(?!\d)
为负前瞻。这与
s.gsub(/\s+(?=\D)/,“”)
,“匹配一个或多个空格,后跟一个非数字”略有不同,因为前者会删除每行末尾的换行符,而后者则不会

步骤如下:

enum1 = text.each_line
  #=> #<Enumerator: "INVOICE # 2599\nINVOICE 0185570\nINVOICE: 1739\n
  #     INVOICE- 45441\nINVOICE:# 1234\nINVOICE :#5678\n":each_line>
等等

另一种方法是在字符串被分成几行之前删除后面没有数字的空格:

text.gsub(/\s+(?!\d)/, '').each_line.map(&:split)
 #=> [["INVOICE#", "2599"], ["INVOICE", "0185570"], ["INVOICE:", "1739"],
 #    ["INVOICE-", "45441"], ["INVOICE:#", "1234"], ["INVOICE:#", "5678"]]

谢谢你的评论,但这并不能解决我的问题。我需要
#
字符以及发票。没错,它更优雅。谢谢@CarySwovelandI我明白你的意思。现在将进行编辑。谢谢。您希望获取的数组中的元素
[INVOICE:#,45441]
应该是
[INVOICE:#,1234]
,因为您选择了一个答案,该答案作为数组的一部分返回。我绞尽脑汁想找出产生这种现象的规则。
enum1 = text.each_line
  #=> #<Enumerator: "INVOICE # 2599\nINVOICE 0185570\nINVOICE: 1739\n
  #     INVOICE- 45441\nINVOICE:# 1234\nINVOICE :#5678\n":each_line>
enum2 = enum1.map
  #=> #<Enumerator: #<Enumerator: "INVOICE # 2599\nINVOICE 0185570\nINVOICE: 1739\n
  #     INVOICE- 45441\nINVOICE:# 1234\nINVOICE :# 5678\n":each_line>:map> 

s = enum2.next
  #=> "INVOICE # 2599\n" 
t = s.gsub(/\s+(?!\d)/,'')
  #=> "INVOICE# 2599" 
t.split 
  #=> ["INVOICE#", "2599"] 

s = enum2.next
  #=> "INVOICE 0185570\n" 
t = s.gsub(/\s+(?!\d)/,'')
  #=> "INVOICE 0185570" 
t.split 
  #=> ["INVOICE", "0185570"] 
text.gsub(/\s+(?!\d)/, '').each_line.map(&:split)
 #=> [["INVOICE#", "2599"], ["INVOICE", "0185570"], ["INVOICE:", "1739"],
 #    ["INVOICE-", "45441"], ["INVOICE:#", "1234"], ["INVOICE:#", "5678"]]