Python 从单个字符串中分离两个日期时间值_Python_Datetime_Python Dateutil

Python 从单个字符串中分离两个日期时间值

python datetime

Python 从单个字符串中分离两个日期时间值,python,datetime,python-dateutil,Python,Datetime,Python Dateutil,我需要编写一个方法来接收包含两个datetime值的字符串，并将这些值分离出来。这些日期时间值可以是任何有效的ISO-8601格式，这意味着我不能只在字符索引上拆分。这些值将用连字符分隔，这也意味着我不能只使用str.split（）我已经使用一些Reg-Ex编写了这个函数，但是客户端要求我使用python dateutil def split_范围（次）： regex=re.compile（“[0-9]{4}-？[0-9]{2}-？[0-9]{2}（[T]（[0-9]{2}:？）{2,3}（\

我需要编写一个方法来接收包含两个datetime值的字符串，并将这些值分离出来。这些日期时间值可以是任何有效的ISO-8601格式，这意味着我不能只在字符索引上拆分。这些值将用连字符分隔，这也意味着我不能只使用str.split（）

我已经使用一些Reg-Ex编写了这个函数，但是客户端要求我使用python dateutil

def split_范围（次）：
regex=re.compile（“[0-9]{4}-？[0-9]{2}-？[0-9]{2}（[T]（[0-9]{2}:？）{2,3}（\[0-9]{3}）？）？Z？”）
split_times=regex.finditer（次）
最终_次=[]
对于拆分时间中的时间：
时间=时间。组（0）
datetime_值=datetime.fromisoformat（时间）
final_times.append（datetime_value.isoformat（））
返回最后的\u次

此函数应接收如下字符串：（这些是我在测试中使用的所有字符串）

200809-20080815

2008-08-08-2008-08-09

2008-08-08T17:21-2008-08-09T17:31

2008-08-08T17:21-2008-08-09T17:31

2008-08-08T17:21:000-2008-08-09T17:31:000

2008-08-08T17:21:000-2008-08-09T17:310:00

2008-08-08T17:21:000.000-2008-08-09T17:31:000.000

并将其拆分为两个单独的值

例如

2019-08-08

和

2019-08-09

客户不太喜欢这里使用正则表达式，希望我用dateutil替换它，但我还没有看到它能满足我的需要。是否有一个dateutil方法可以用来实现这一点，如果没有，是否还有另一个库具有某种功能？

使用

re.findall（）

输出：

['2019-08-03', '2019-08-09']

['2019-08-03', '2019-08-09', '2017-01-01']

例如：

import re

text = "2019-08-03-2019-08-09xxxxxThis is test xxxxx---2017-01-01"
match = re.findall(r'\d{4}-\d{2}-\d{2}', text)

print (match)

输出：

['2019-08-03', '2019-08-09']

['2019-08-03', '2019-08-09', '2017-01-01']

使用

re.findall（）

输出：

['2019-08-03', '2019-08-09']

['2019-08-03', '2019-08-09', '2017-01-01']

例如：

import re

text = "2019-08-03-2019-08-09xxxxxThis is test xxxxx---2017-01-01"
match = re.findall(r'\d{4}-\d{2}-\d{2}', text)

print (match)

输出：

['2019-08-03', '2019-08-09']

['2019-08-03', '2019-08-09', '2017-01-01']

我认为最好的办法可能是要求您的客户将分隔符从

更改为其他类似空格、制表符或不会在ISO 8601字符串中显示并拆分的内容，但如果您必须使用

作为分隔符，并且必须支持任何有效的ISO 8601字符串，最好的选择是尝试查找模式

-（-|\d{4}）

，因为所有有效的ISO 8601日期时间要么以4位数字开头，要么以

--

开头。如果您发现一个破折号后跟4位数字，则表示您找到了一个负时区或下一个ISO 8601日期时间的开始

此外，没有有效的ISO 8601日期时间格式包含

\d{4}-\d{4}

，如果您发现表示时区偏移的

-（\d{4}）

，则它必须位于第一个ISO 8601字符串的末尾，因此使用负前瞻就足以确保模式不会重复，因此，总而言之：

import re
from dateutil.parser import isoparse


def parse_iso8601_pairs(isostr):
    # In a string containing two ISO 8601 strings delimited by -, the substring
    # "-\d{4}" is only found at the beginning of the second datetime or the
    # end of *either* datetime. If it is found at the end of the first datetime,
    # it will always be followed by `-\d{4}`, so we can use negative lookahead
    # to find the beginning of the next string.
    #
    # Note: ISO 8601 datetimes can also begin with `--`, but parsing these is
    # not supported yet in dateutil.parser.isoparse, as of verison 2.8.0. The
    # regex includes this type of string in order to make at least the splitting
    # method work even if the parsing method doesn't support "missing year"
    # ISO 8601 strings.
    m = re.search(r"-(--|\d{4})(?!-(--|\d{4}))", isostr)
    dt1 = None
    dt2 = None

    if m is None:
        raise ValueError(f"String does not contain two ISO 8601 datetimes " +
                         "delimited by -: {isostr}")

    split_on = m.span()[0]
    str1 = isostr[0:split_on]
    str2 = isostr[split_on + 1:]

    # You may want to wrap the error handling here with a nicer message
    dt1 = isoparse(str1)
    dt2 = isoparse(str2)

    return dt1, dt2

据我所知，这将适用于由

分隔的任何一对符合ISO 8601的字符串，但模糊的“年份缺失”格式除外：

--MM-？DD

。代码的拆分部分即使在字符串（如

--04-01

）中也可以工作，但当前不支持该格式，因此解析将失败。可能更成问题的是，

--MMDD

也是一种有效的ISO8601格式，它将匹配

-\d{4}

，并给出错误的拆分。如果您想支持这种格式，并且有一个经过修改的解析器可以处理

--MMDD

，我相信您可以制作一个更复杂的正则表达式来处理

--MMDD

的情况（如果有人想这样做，我很乐意将其编辑到文章中），或者您可以简单地“猜测并检查”通过使用

re.finditer

对匹配项进行迭代，直到找到拆分字符串的位置，从而在分隔符的两侧生成有效的ISO 8601日期时间

注意：如果将

dateutil.parser.isoparse

替换为

datetime.datetime.fromisoformat

，此方法也会起作用。不同之处在于，

datetime.datetime.fromisoformat

解析的字符串主要是

dateutil.parser.isoparse

处理的字符串的子集-它与

datetime.datetime.isoformat

相反，并将解析通过调用datetime对象上的

isoformat

方法可以创建的任何内容，其中，se

isoparse

用于解析任何有效的ISO 8601字符串。如果您知道日期时间是通过调用

isoformat（）

方法生成的，然后，ISO8601解析器的更好选择是

来自ISOFORMAT

。

我认为最好的办法是让您的客户机将分隔符从

更改为其他类似空格、制表符或不会显示在ISO8601字符串中的内容，然后在此基础上进行拆分，但是，如果必须使用

作为分隔符，并且必须支持任何有效的ISO 8601字符串，那么最好的选择是尝试查找模式

-（-\d{4}）