Python 3.x 如何使用python从文本中提取或验证日期格式?

Python 3.x 如何使用python从文本中提取或验证日期格式?,python-3.x,pandas,data-science,Python 3.x,Pandas,Data Science,我正在尝试执行以下代码: import datefinder string_with_dates = 'The stock has a 04/30/2009 great record of positive Sept 1st, 2005 earnings surprises, having beaten the trade Consensus EPS estimate in each of the last four quarters. In its last earnings report

我正在尝试执行以下代码:

import datefinder

string_with_dates = 'The stock has a 04/30/2009 great record of positive Sept 1st, 2005 earnings surprises, having beaten the trade Consensus EPS estimate in each of the last four quarters. In its last earnings report on May 8, 2018, Triple-S Management reported EPS of $0.6 vs.the trade Consensus of $0.24 while it beat the consensus revenue estimate by 4.93%.'

matches = datefinder.find_dates(string_with_dates)

for match in matches:
    print(match)
输出为:

2009-04-30 00:00:00

2005-09-01 00:00:00

2018-05-08 00:00:00

2019-02-04 00:00:00

由于百分比值为4.93%,最后一个日期已经到来。。。如何克服这种情况?

我无法解决datefinder模块的问题。你说你需要一个解决方案,所以我为你准备了这个。这是一项正在进行的工作,这意味着您可以根据需要进行调整。另外,一些正则表达式本可以合并,但我想为您将它们分解出来。希望这个答案能帮助你,直到你找到另一个更适合你需要的解决方案

import re

string_with_dates = 'The stock has a 04/30/2009 great record of positive Sept 1st, 2005 earnings surprises having beaten the trade Consensus EPS estimate in each of the last ' \
                'four quarters In its last earnings report on March 8, 2018, Triple-S Management reported EPS of $0.6 vs.the trade Consensus of $0.24 while it beat the ' \
                'consensus revenue estimate by 4.93%. The next trading day will occur at 2019-02-15T12:00:00-06:30'


def find_dates(input):
  '''
  This function is used to extract date strings from provide text.

  Symbol references:
  YYYY = four-digit year
    MM = two-digit month (01=January, etc.)
    DD = two-digit day of month (01 through 31)
    hh = two digits of hour (00 through 23) (am/pm NOT allowed)
    mm = two digits of minute (00 through 59)
    ss = two digits of second (00 through 59)
     s = one or more digits representing a decimal fraction of a second
   TZD = time zone designator (Z or +hh:mm or -hh:mm)

  :param input: text
  :return: date string

 '''

  date_formats = [ 
                # Matches date format MM/DD/YYYY
                '(\d{2}\/\d{2}\/\d{4})',

                # Matches date format MM-DD-YYYY
                '(\d{2}-\d{2}-\d{4})',

                # Matches date format YYYY/MM/DD
                '(\d{4}\/\d{1,2}\/\d{1,2})',

                # Matches ISO 8601 format (YYYY-MM-DD)
                '(\d{4}-\d{1,2}-\d{1,2})',

                # Matches ISO 8601 format YYYYMMDD
                '(\d{4}\d{2}\d{2})',

                # Matches full_month_name dd, YYYY or full_month_name dd[suffixes], YYYY
                '(January|February|March|April|May|June|July|August|September|October|November|December)(\s\d{1,2}\W\s\d{4}|\s\d(st|nd|rd|th)\W\s\d{4})',

                # Matches abbreviated_month_name dd, YYYY or abbreviated_month_name dd[suffixes], YYYY
                '(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sept|Oct|Nov|Dec)(\s\d{1,2}\W\s\d{4}|\s\d(st|nd|rd|th)\W\s\d{4})',

                # Matches ISO 8601 format with time and time zone
                # yyyy-mm-ddThh:mm:ss.nnnnnn+|-hh:mm
                '\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}(\+|-)\d{2}:\d{2}',

                # Matches ISO 8601 format Datetime with timezone
                # yyyymmddThhmmssZ
                '\d{8}T\d{6}Z',

                # Matches ISO 8601 format Datetime with timezone
                # yyyymmddThhmmss+|-hhmm
                '\d{8}T\d{6}(\+|-)\d{4}'
                ]

  for item in date_formats:
    date_format = re.compile(r'\b{}\b'.format(item), re.IGNORECASE|re.MULTILINE)
    find_date = re.search(date_format, input)
    if find_date:
        print (find_date.group(0))



find_dates(string_with_dates)

# outputs
04/30/2009
March 8, 2018
Sept 1st, 2005
2019-02-15T12:00:00-06:30

2019-02-04
与4.93%有什么关系?
datefinder
来自哪里?此模块有问题。谢谢你指出这一点。我认为这个模块有问题。。。但我需要克服这种情况。。。您可以使用此代码或任何其他代码。。。。请建议。。。我需要此文本中出现的日期计数,但日期可能以不同的格式显示。模块的代码似乎具有此两次分隔符\u模式,这可能导致问题。另外,在分隔符模式中有一个“.”