Python 正则表达式匹配字符串中去年的最后一次出现_Python_Regex_String_Date_Filenames

Python 正则表达式匹配字符串中去年的最后一次出现

python regex string date

Python 正则表达式匹配字符串中去年的最后一次出现,python,regex,string,date,filenames,Python,Regex,String,Date,Filenames,我编写了一个带有以下函数的python脚本，它将包含多个日期的文件名作为输入代码 import re from datetime import datetime def ExtractReleaseYear(title): rg = re.compile('.*?([\[\(]?((?:19[0-9]|20[01])[0-9])[\]\)]?)', re.IGNORECASE|re.DOTALL) match = rg.search(title) # Using non-gr

我编写了一个带有以下函数的python脚本，它将包含多个日期的文件名作为输入

代码

import re
from datetime import datetime

def ExtractReleaseYear(title):
    rg = re.compile('.*?([\[\(]?((?:19[0-9]|20[01])[0-9])[\]\)]?)', re.IGNORECASE|re.DOTALL)
    match = rg.search(title) # Using non-greedy match on filler
    if match:
        releaseYear = match.group(1)
        try:
            if int(releaseYear) >= 1900 and int(releaseYear) <= int(datetime.now().year) and int(releaseYear) <= 2099: # Film between 1900-2099
                return releaseYear
        except ValueError:
            print("ERROR: The film year in the file name could not be converted to an integer for comparison.")
            return ""

print(ExtractReleaseYear('2012.(2009).3D.1080p.BRRip.SBS.x264'))
print(ExtractReleaseYear('Into.The.Storm.2012.1080p.WEB-DL.AAC2.0.H264'))
print(ExtractReleaseYear('2001.A.Space.Odyssey.1968.1080p.WEB-DL.AAC2.0.H264'))

重新导入
从日期时间导入日期时间
def ExtractReleaseYear（标题）：
rg=re.compile（'..*（[\[\（]）（（（？：19[0-9]| 20[01]）[0-9]）[\]\]）]？），re.IGNORECASE | re.DOTALL）
match=rg.search（title）#在filler上使用非贪婪匹配
如果匹配：
releaseYear=匹配组（1）
尝试：
如果int（releaseYear）>=1900和int（releaseYear）有两件事需要更改：
第一个*？
惰性模式必须转换为贪婪*
（在这种情况下，*
之后的子模式将匹配字符串中的最后一个匹配项）
您需要使用的组是组2，而不是组1（因为它是存储年份数据的组）。或者使第一个捕获组不捕获
见：
或：
考虑使用findall（）而不是search（）
它将把所有找到的值从左到右放入一个列表中，只需访问最右边的值就可以得到你想要的
import re
from datetime import datetime

def ExtractReleaseYear(title):
    rg = re.compile('.*?([\[\(]?((?:19[0-9]|20[01])[0-9])[\]\)]?)', re.IGNORECASE|re.DOTALL)
    match = rg.findall(title)

    if match:
        try:
            releaseYear = match[-1][-1]
            if int(releaseYear) >= 1900 and int(releaseYear) <= int(datetime.now().year) and int(releaseYear) <= 2099: # Film between 1900-2099
                return releaseYear
        except ValueError:
            print("ERROR: The film year in the file name could not be converted to an integer for comparison.")
            return ""

print(ExtractReleaseYear('2012.(2009).3D.1080p.BRRip.SBS.x264'))
print(ExtractReleaseYear('Into.The.Storm.2012.1080p.WEB-DL.AAC2.0.H264'))
print(ExtractReleaseYear('2001.A.Space.Odyssey.1968.1080p.WEB-DL.AAC2.0.H264'))

重新导入
从日期时间导入日期时间
def ExtractReleaseYear（标题）：
rg=re.compile（'..*（[\[\（]）（（（？：19[0-9]| 20[01]）[0-9]）[\]\]）]？），re.IGNORECASE | re.DOTALL）
匹配=rg.findall（标题）
如果匹配：
尝试：
releaseYear=匹配[-1][-1]
如果int（releaseYear）>=1900，且int（releaseYear）符合@kenyanke答案，则选择findall（）
oversearch（）
将是更好的选择，因为前者返回所有非重叠匹配模式。您可以选择最后一个匹配模式作为releaseYear
。这是我要查找的正则表达式releaseYear

rg = re.compile(r'[^a-z](\d{4})[^a-z]', re.IGNORECASE)
match = rg.findall(title)
if match:
        releaseYear = match[-1]

上面的正则表达式是假设releaseYear
之前或之后的直接字母是非字母字符。三个字符串的结果（match
）为
['2009']
['2012']
['1968']

因为你总是检查第（1）组，所以它会给你第一个匹配。检查组长度，如果大于1，则取最后一个组的匹配值。没有真正的解决方案，特别是因为文件名可以有多种形式，例如，几乎不可能所有文件名在“1080”之后都有一个“p”@ProGrammer:如果电影标题包含一年，而文件名中缺少发行年份，该怎么办？同样，你也应该选择1895年（路易和奥古斯特·卢米埃出击的发行年份）或1888年作为路易·勒·普林斯电影的发行年份。@Casimirithippolyte I已将下限范围扩展到1888
。假设通过函数运行标题2018.3D.1080p.BRRip.SBS.x264.AAC
，它返回2018
（请注意，电影标题包含一年，而文件名中缺少发行年份）。在这种情况下，最后一次出现也是第一次出现。如果最后一次出现的是某种上传/记录日期，例如2018.3D.1080p.BRRip.SBS.x264.AAC.01-01-2018，这确实会成为一个问题，我希望看到一种智能的方法来过滤此类实例。目前，我相信这在电影标题中是一种罕见的格式。
rg = re.compile(r'[^a-z](\d{4})[^a-z]', re.IGNORECASE)
match = rg.findall(title)
if match:
        releaseYear = match[-1]

['2009']
['2012']
['1968']