在python正则表达式中添加可选部分_Python

在python正则表达式中添加可选部分

python

在python正则表达式中添加可选部分,python,Python,我想在python表达式中添加一个可选部分： myExp = re.compile("(.*)_(\d+)\.(\w+)") 所以如果我的字符串是abc_34.txt，则result.group（2）是34 如果我的字符串是abc_2034.txt，那么results.group（2）仍然是34 我尝试了myExp=re.compile（（.*）[20]（\d+）\（\w+）但是我的results.groups（2）对于abc_2034.txt来说是034 谢谢F.J 但我想扩展您的解决方

我想在python表达式中添加一个可选部分：

myExp = re.compile("(.*)_(\d+)\.(\w+)")

所以如果我的字符串是abc_34.txt，则result.group（2）是34 如果我的字符串是abc_2034.txt，那么results.group（2）仍然是34

我尝试了

myExp=re.compile（（.*）[20]（\d+）\（\w+）

但是我的results.groups（2）对于abc_2034.txt来说是034

谢谢F.J

但我想扩展您的解决方案并添加一个后缀

所以如果我放abc_203422.txt，results.group（2）仍然是34

我试过了“（.*）（20）？（\d+）（22）？（\w+）” 但我得到的是3422而不是34

myExp = re.compile("(.*)_(?:20)?(\d+)\.(\w+)")

包含

的组开头的

？：

使其成为非捕获组，该组后面的

？

使其成为可选组。所以

（？：20）？

表示“可选匹配

”

不确定您是否正在查找此项，但

？

是0或1次的re符号。或者{0,2}，这对于最多两个可选的[0-9]来说有点麻烦。我会仔细考虑的。

您总是希望第二组正好匹配两位数字吗？额外的两位数字总是“20”吗？你总是想要第二组的最后两位吗？帮你自己一个忙，用命名组来代替。“当你有编程问题时，你会想，“我会使用正则表达式。”现在你有两个问题了。”--顺便说一句，有些著名的人总是在正则表达式使用的字符串前面加上一个r，例如，

r“…”

一些正则表达式转义与字符串转义相同，例如。\b。您不希望双引号字符串构造函数将\b转换为“bell”的ascii代码。发生这种情况时，正则表达式引擎将永远看不到字符\b，因此模式中的\b与单词边界不匹配。在试图找出\b与单词边界不匹配的原因时，您可能会受到自杀的诱惑。因此，在模式字符串之前使用r实际上对您是有益的。

strings = [
    "abc_34.txt", 
    "abc_2034.txt",  
]


for string in strings:
    first_part, ext = string.split(".")
    prefix, number = first_part.split("_")

    print prefix, number[-2:], ext


--output:--
abc 34 txt
abc 34 txt



import re

strings = [
    "abc_34.txt", 
    "abc_2034.txt",  
]

pattern = r"""
    ([^_]*)     #Match not an underscore, 0 or more times, captured in group 1
    _           #followed by an underscore
    \d*         #followed by a digit, 0 or more times, greedy
    (\d{2})     #followed by a digit, twice, captured in group 2
    [.]         #followed by a period
    (.*)        #followed by any character, 0 or more times, captured in group 3
"""


regex = re.compile(pattern, flags=re.X)  #ignore whitespace and comments in regex

for string in strings:
    md = re.match(regex, string)
    if md:
        print md.group(1), md.group(2), md.group(3)

--output:--
abc 34 txt
abc 34 txt