Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/333.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/0/email/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 正则表达式匹配字符串中去年的最后一次出现_Python_Regex_String_Date_Filenames - Fatal编程技术网

Python 正则表达式匹配字符串中去年的最后一次出现

Python 正则表达式匹配字符串中去年的最后一次出现,python,regex,string,date,filenames,Python,Regex,String,Date,Filenames,我编写了一个带有以下函数的python脚本,它将包含多个日期的文件名作为输入 代码 import re from datetime import datetime def ExtractReleaseYear(title): rg = re.compile('.*?([\[\(]?((?:19[0-9]|20[01])[0-9])[\]\)]?)', re.IGNORECASE|re.DOTALL) match = rg.search(title) # Using non-gr

我编写了一个带有以下函数的python脚本,它将包含多个日期的文件名作为输入

代码

import re
from datetime import datetime

def ExtractReleaseYear(title):
    rg = re.compile('.*?([\[\(]?((?:19[0-9]|20[01])[0-9])[\]\)]?)', re.IGNORECASE|re.DOTALL)
    match = rg.search(title) # Using non-greedy match on filler
    if match:
        releaseYear = match.group(1)
        try:
            if int(releaseYear) >= 1900 and int(releaseYear) <= int(datetime.now().year) and int(releaseYear) <= 2099: # Film between 1900-2099
                return releaseYear
        except ValueError:
            print("ERROR: The film year in the file name could not be converted to an integer for comparison.")
            return ""

print(ExtractReleaseYear('2012.(2009).3D.1080p.BRRip.SBS.x264'))
print(ExtractReleaseYear('Into.The.Storm.2012.1080p.WEB-DL.AAC2.0.H264'))
print(ExtractReleaseYear('2001.A.Space.Odyssey.1968.1080p.WEB-DL.AAC2.0.H264'))
重新导入
从日期时间导入日期时间
def ExtractReleaseYear(标题):
rg=re.compile('..*([\[\(])(((?:19[0-9]| 20[01])[0-9])[\]\])]?),re.IGNORECASE | re.DOTALL)
match=rg.search(title)#在filler上使用非贪婪匹配
如果匹配:
releaseYear=匹配组(1)
尝试:

如果int(releaseYear)>=1900和int(releaseYear)有两件事需要更改:

  • 第一个
    *?
    惰性模式必须转换为贪婪
    *
    (在这种情况下,
    *
    之后的子模式将匹配字符串中的最后一个匹配项)
  • 您需要使用的组是组2,而不是组1(因为它是存储年份数据的组)。或者使第一个捕获组不捕获
  • 见:

    或:

    考虑使用findall()而不是search()

    它将把所有找到的值从左到右放入一个列表中,只需访问最右边的值就可以得到你想要的

    import re
    from datetime import datetime
    
    def ExtractReleaseYear(title):
        rg = re.compile('.*?([\[\(]?((?:19[0-9]|20[01])[0-9])[\]\)]?)', re.IGNORECASE|re.DOTALL)
        match = rg.findall(title)
    
        if match:
            try:
                releaseYear = match[-1][-1]
                if int(releaseYear) >= 1900 and int(releaseYear) <= int(datetime.now().year) and int(releaseYear) <= 2099: # Film between 1900-2099
                    return releaseYear
            except ValueError:
                print("ERROR: The film year in the file name could not be converted to an integer for comparison.")
                return ""
    
    print(ExtractReleaseYear('2012.(2009).3D.1080p.BRRip.SBS.x264'))
    print(ExtractReleaseYear('Into.The.Storm.2012.1080p.WEB-DL.AAC2.0.H264'))
    print(ExtractReleaseYear('2001.A.Space.Odyssey.1968.1080p.WEB-DL.AAC2.0.H264'))
    
    重新导入
    从日期时间导入日期时间
    def ExtractReleaseYear(标题):
    rg=re.compile('..*([\[\(])(((?:19[0-9]| 20[01])[0-9])[\]\])]?),re.IGNORECASE | re.DOTALL)
    匹配=rg.findall(标题)
    如果匹配:
    尝试:
    releaseYear=匹配[-1][-1]
    
    如果int(releaseYear)>=1900,且int(releaseYear)符合@kenyanke答案,则选择
    findall()
    over
    search()
    将是更好的选择,因为前者返回所有非重叠匹配模式。您可以选择最后一个匹配模式作为
    releaseYear
    。这是我要查找的正则表达式
    releaseYear

    rg = re.compile(r'[^a-z](\d{4})[^a-z]', re.IGNORECASE)
    match = rg.findall(title)
    if match:
            releaseYear = match[-1]
    
    上面的正则表达式是假设
    releaseYear
    之前或之后的直接字母是非字母字符。三个字符串的结果(
    match
    )为

    ['2009']
    ['2012']
    ['1968']
    

    因为你总是检查第(1)组,所以它会给你第一个匹配。检查组长度,如果大于1,则取最后一个组的匹配值。没有真正的解决方案,特别是因为文件名可以有多种形式,例如,几乎不可能所有文件名在“1080”之后都有一个“p”@ProGrammer:如果电影标题包含一年,而文件名中缺少发行年份,该怎么办?同样,你也应该选择1895年(路易和奥古斯特·卢米埃出击的发行年份)或1888年作为路易·勒·普林斯电影的发行年份。@Casimirithippolyte I已将下限范围扩展到
    1888
    。假设通过函数运行标题
    2018.3D.1080p.BRRip.SBS.x264.AAC
    ,它返回
    2018
    (请注意,电影标题包含一年,而文件名中缺少发行年份)。在这种情况下,最后一次出现也是第一次出现。如果最后一次出现的是某种上传/记录日期,例如
    2018.3D.1080p.BRRip.SBS.x264.AAC.01-01-2018
    ,这确实会成为一个问题,我希望看到一种智能的方法来过滤此类实例。目前,我相信这在电影标题中是一种罕见的格式。
    rg = re.compile(r'[^a-z](\d{4})[^a-z]', re.IGNORECASE)
    match = rg.findall(title)
    if match:
            releaseYear = match[-1]
    
    ['2009']
    ['2012']
    ['1968']