Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/regex/20.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python/Regex-Match.#,#。串_Python_Regex - Fatal编程技术网

Python/Regex-Match.#,#。串

Python/Regex-Match.#,#。串,python,regex,Python,Regex,我可以使用什么正则表达式来匹配字符串中的“.#,#.”。它可能存在于字符串中,也可能不存在于字符串中。预期产出的一些例子可能是: Test1.0,0.csv -> ('Test1', '0,0', 'csv') (Basic Example) Test2.wma -> ('Test2', 'wma') (No Match) Test3.1100,456.jpg -> ('Test3', '1100,456

我可以使用什么正则表达式来匹配字符串中的“.#,#.”。它可能存在于字符串中,也可能不存在于字符串中。预期产出的一些例子可能是:

Test1.0,0.csv      -> ('Test1', '0,0', 'csv')         (Basic Example)
Test2.wma          -> ('Test2', 'wma')                (No Match)
Test3.1100,456.jpg -> ('Test3', '1100,456', 'jpg')    (Basic with Large Number)
T.E.S.T.4.5,6.png  -> ('T.E.S.T.4', '5,6', 'png')     (Doesn't strip all periods)
Test5,7,8.sss      -> ('Test5,7,8', 'sss')            (No Match)
Test6.2,3,4.png    -> ('Test6.2,3,4', 'png')          (No Match, to many commas)
Test7.5,6.7,8.test -> ('Test7', '5,6', '7,8', 'test') (Double Match?)
最后一个不太重要,我只希望如此。会出现一次。我正在处理的大多数文件都属于第一个到第四个示例,所以我对这些最感兴趣

谢谢你的帮助

使用正则表达式模式
^([^,]+)\(\d+,\d+)\([^,]+)$

检查>>







第三个捕获组应该包含这对数字。如果你有多对,你应该得到多个匹配。第三次捕获总是包含这对

^(.*?)\.(\d+,\d+)\.(.*?)$
这通过了您的测试,至少在以下模式中:


您可以使用正则表达式
\.\d+,\d+\.
查找该模式的所有匹配项,但是您需要做一些额外的工作才能获得预期的输出,特别是因为您希望将
.5,6.7,8.
视为两个匹配项

以下是一个潜在的解决方案:

def transform(s):
    s = re.sub(r'(\.\d+,\d+)+\.', lambda m: m.group(0).replace('.', '\n'), s)
    return tuple(s.split('\n'))
示例:

>>> transform('Test1.0,0.csv')
('Test1', '0,0', 'csv')
>>> transform('Test2.wma')
('Test2.wma',)
>>> transform('Test3.1100,456.jpg')
('Test3', '1100,456', 'jpg')
>>> transform('T.E.S.T.4.5,6.png')
('T.E.S.T.4', '5,6', 'png')
>>> transform('Test5,7,8.sss')
('Test5,7,8.sss',)
>>> transform('Test6.2,3,4.png')
('Test6.2,3,4.png',)
>>> transform('Test7.5,6.7,8.test')
('Test7', '5,6', '7,8', 'test')
若要在没有匹配项时分离文件扩展名,可以使用以下命令:

def transform(s):
    s = re.sub(r'(\.\d+,\d+)+\.', lambda m: m.group(0).replace('.', '\n'), s)
    groups = s.split('\n')
    groups[-1:] = groups[-1].rsplit('.', 1)
    return tuple(groups)

除了
'Test2.wma'
变成
('Test2','wma')
,与
'Test5,7,8.sss'
'Test5,7,8.sss'
行为类似之外,这非常接近,python支持命名组吗

^.*(?P<group1>\d+(?:,\d+)?)\.(?P<group2>\d+(?:,\d+)?).*\..+$
^.*(\P\d+(?:,\d+))\(\P\d+(?:,\d+)?。*\+$

要允许多个连续匹配,请使用前向/后向:

r'(?<=\.)\d+,\d+(?=\.)'
测试:

>>> print split_it('Test1.0,0.csv')
['Test1', '0,0', 'csv']
>>> print split_it('Test2.wma')
['Test2', 'wma']
>>> print split_it('Test3.1100,456.jpg')
['Test3', '1100,456', 'jpg']
>>> print split_it('T.E.S.T.4.5,6.png')
['T.E.S.T.4', '5,6', 'png']
>>> print split_it('Test5,7,8.sss')
['Test5,7,8', 'sss']
>>> print split_it('Test6.2,3,4.png')
['Test6.2,3,4', 'png']
>>> print split_it('Test7.5,6.7,8.test')
['Test7', '5,6', '7,8', 'test']

啊,男人。如果每个人都能提供如此广泛的匹配示例和失败示例列表…。@m.buettner我知道,与99%的正则表达式问题相比,这是很好的:
Test.xx,yz.csv
?命名组语法是
(?Ppattern)
此外,如果最后一个组包含多个
则最后一个组将被拆分几次。只需将其修改为使用
\n
而不是空格,您还可以使用类似
\x00的内容,以确保它不会包含在有效字符串中。
转换('.a.a.a.a.a.a.')==('''a','a','a','a','a','')
@nneonneo啊,我明白了,我忘记了
rsplit的count参数,谢谢。谢谢你给我举个例子!我真的需要花点时间来学习更多关于regex的知识,它太强大了。
'/^(.+)\.((\d+,\d+)\.)?(.+)$/'
^(.*?)\.(\d+,\d+)\.(.*?)$
def transform(s):
    s = re.sub(r'(\.\d+,\d+)+\.', lambda m: m.group(0).replace('.', '\n'), s)
    return tuple(s.split('\n'))
>>> transform('Test1.0,0.csv')
('Test1', '0,0', 'csv')
>>> transform('Test2.wma')
('Test2.wma',)
>>> transform('Test3.1100,456.jpg')
('Test3', '1100,456', 'jpg')
>>> transform('T.E.S.T.4.5,6.png')
('T.E.S.T.4', '5,6', 'png')
>>> transform('Test5,7,8.sss')
('Test5,7,8.sss',)
>>> transform('Test6.2,3,4.png')
('Test6.2,3,4.png',)
>>> transform('Test7.5,6.7,8.test')
('Test7', '5,6', '7,8', 'test')
def transform(s):
    s = re.sub(r'(\.\d+,\d+)+\.', lambda m: m.group(0).replace('.', '\n'), s)
    groups = s.split('\n')
    groups[-1:] = groups[-1].rsplit('.', 1)
    return tuple(groups)
^.*(?P<group1>\d+(?:,\d+)?)\.(?P<group2>\d+(?:,\d+)?).*\..+$
r'(?<=\.)\d+,\d+(?=\.)'
>>> re.findall(r'(?<=\.)\d+,\d+(?=\.)', 'Test7.5,6.7,8.test')
['5,6', '7,8']
import re
def split_it(s):
    pieces = re.split(r'\.(?=\d+,\d+\.)', s)
    pieces[-1:] = pieces[-1].rsplit('.', 1) # split off extension
    return pieces
>>> print split_it('Test1.0,0.csv')
['Test1', '0,0', 'csv']
>>> print split_it('Test2.wma')
['Test2', 'wma']
>>> print split_it('Test3.1100,456.jpg')
['Test3', '1100,456', 'jpg']
>>> print split_it('T.E.S.T.4.5,6.png')
['T.E.S.T.4', '5,6', 'png']
>>> print split_it('Test5,7,8.sss')
['Test5,7,8', 'sss']
>>> print split_it('Test6.2,3,4.png')
['Test6.2,3,4', 'png']
>>> print split_it('Test7.5,6.7,8.test')
['Test7', '5,6', '7,8', 'test']