Warning: file_get_contents(/data/phpspider/zhask/data//catemap/5/date/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 正在分析具有多天的日期字符串_Python_Date_Datetime_Parsing - Fatal编程技术网

Python 正在分析具有多天的日期字符串

Python 正在分析具有多天的日期字符串,python,date,datetime,parsing,Python,Date,Datetime,Parsing,我有一些日期包含我试图解析的多天。似乎datetime.strtime函数不支持正则表达式,因此我无法让它一次忽略一天。有没有一个简单的解决办法,我错过了 以下是一些例子: 2011年3月20日和6月8日 2010年9月4日和27日 2013年2月15日、12月5日和6日 我知道这些例子中的每一个都有很大的不同,但我希望能为其中的一个找到一个解决方案。如果有一种方法可以很容易地使用一些格式化参数在大范围内工作,那就太棒了 此外,可能存在日期格式不同的情况,我认为这应该更容易处理: 2011年7月

我有一些日期包含我试图解析的多天。似乎datetime.strtime函数不支持正则表达式,因此我无法让它一次忽略一天。有没有一个简单的解决办法,我错过了

以下是一些例子:

2011年3月20日和6月8日

2010年9月4日和27日

2013年2月15日、12月5日和6日

我知道这些例子中的每一个都有很大的不同,但我希望能为其中的一个找到一个解决方案。如果有一种方法可以很容易地使用一些格式化参数在大范围内工作,那就太棒了

此外,可能存在日期格式不同的情况,我认为这应该更容易处理:

2011年7月2日和2011年8月9日


首先,我将日期字符串拆分为有效日期:

import re
def split_date(d):
    return re.split(‘[,|&]’, d)

首先,我将日期字符串拆分为有效日期:

import re
def split_date(d):
    return re.split(‘[,|&]’, d)

可能不是最好的方法,但这是我的尝试:

import re

date1 = "March 20 & June 8, 2011"
date2 = "September 4 & 27, 2010"
date3 = "February 15, December 5 & 6, 2013"
date_group = [date1,date2,date3]

for date in date_group:
    result = re.findall(r"\d{4}|[A-Z][a-z]+ \d{1,2} & \d{1,2}|[A-Z][a-z]+ \d{1,2}", date)
    year = result[-1]
    for i in range(len(result)-1):
        d = result[i].split(" ")
        try:
            d.remove("&")
        except ValueError:
            pass
        finally:
            for a in range(1,len(d)):
                date = d[0]+'{:02d}'.format(int(d[a]))+year
                time_date = datetime.strptime(date,"%B%d%Y")
                print (time_date)
结果:

2011-03-20 00:00:00
2011-06-08 00:00:00
2010-09-04 00:00:00
2010-09-27 00:00:00
2013-02-15 00:00:00
2013-12-05 00:00:00
2013-12-06 00:00:00

基本上只是先提取年份,然后提取日期。但是,如果有多年的时间,这将不起作用。

可能不是最好的方法,但这是我的尝试:

import re

date1 = "March 20 & June 8, 2011"
date2 = "September 4 & 27, 2010"
date3 = "February 15, December 5 & 6, 2013"
date_group = [date1,date2,date3]

for date in date_group:
    result = re.findall(r"\d{4}|[A-Z][a-z]+ \d{1,2} & \d{1,2}|[A-Z][a-z]+ \d{1,2}", date)
    year = result[-1]
    for i in range(len(result)-1):
        d = result[i].split(" ")
        try:
            d.remove("&")
        except ValueError:
            pass
        finally:
            for a in range(1,len(d)):
                date = d[0]+'{:02d}'.format(int(d[a]))+year
                time_date = datetime.strptime(date,"%B%d%Y")
                print (time_date)
结果:

2011-03-20 00:00:00
2011-06-08 00:00:00
2010-09-04 00:00:00
2010-09-27 00:00:00
2013-02-15 00:00:00
2013-12-05 00:00:00
2013-12-06 00:00:00

基本上只是先提取年份,然后提取日期。但是,如果存在多年,则将不起作用。

这是一种使用
datetime
模块的方法

演示:

import datetime
d1 = "March 20 & June 8, 2011"
d2 = "February 15, December 5 & 6, 2013"


def getDate(in_value):
    result = []
    in_value = in_value.split(",")
    year = in_value.pop(-1)
    for dateV in in_value:
        if "&" in dateV:
            temp = []
            val = dateV.split()
            month = val.pop(0)
            for i in val:
                if i.isdigit():
                    temp.append(datetime.datetime.strptime("{}-{}-{}".format(year, month, i).strip(), "%Y-%B-%d").strftime("%m/%d/%Y"))
            result.append(" & ".join(temp))
        else:
            result.append(datetime.datetime.strptime(dateV.strip() + year, "%B %d %Y").strftime("%m/%d/%Y"))
    return ", ".join(result)

print( getDate(d1) )    
print( getDate(d2) )
03/20/2011 & 03/08/2011
02/15/2013, 12/05/2013 & 12/06/2013
输出:

import datetime
d1 = "March 20 & June 8, 2011"
d2 = "February 15, December 5 & 6, 2013"


def getDate(in_value):
    result = []
    in_value = in_value.split(",")
    year = in_value.pop(-1)
    for dateV in in_value:
        if "&" in dateV:
            temp = []
            val = dateV.split()
            month = val.pop(0)
            for i in val:
                if i.isdigit():
                    temp.append(datetime.datetime.strptime("{}-{}-{}".format(year, month, i).strip(), "%Y-%B-%d").strftime("%m/%d/%Y"))
            result.append(" & ".join(temp))
        else:
            result.append(datetime.datetime.strptime(dateV.strip() + year, "%B %d %Y").strftime("%m/%d/%Y"))
    return ", ".join(result)

print( getDate(d1) )    
print( getDate(d2) )
03/20/2011 & 03/08/2011
02/15/2013, 12/05/2013 & 12/06/2013

这是一种使用
datetime
模块的方法

演示:

import datetime
d1 = "March 20 & June 8, 2011"
d2 = "February 15, December 5 & 6, 2013"


def getDate(in_value):
    result = []
    in_value = in_value.split(",")
    year = in_value.pop(-1)
    for dateV in in_value:
        if "&" in dateV:
            temp = []
            val = dateV.split()
            month = val.pop(0)
            for i in val:
                if i.isdigit():
                    temp.append(datetime.datetime.strptime("{}-{}-{}".format(year, month, i).strip(), "%Y-%B-%d").strftime("%m/%d/%Y"))
            result.append(" & ".join(temp))
        else:
            result.append(datetime.datetime.strptime(dateV.strip() + year, "%B %d %Y").strftime("%m/%d/%Y"))
    return ", ".join(result)

print( getDate(d1) )    
print( getDate(d2) )
03/20/2011 & 03/08/2011
02/15/2013, 12/05/2013 & 12/06/2013
输出:

import datetime
d1 = "March 20 & June 8, 2011"
d2 = "February 15, December 5 & 6, 2013"


def getDate(in_value):
    result = []
    in_value = in_value.split(",")
    year = in_value.pop(-1)
    for dateV in in_value:
        if "&" in dateV:
            temp = []
            val = dateV.split()
            month = val.pop(0)
            for i in val:
                if i.isdigit():
                    temp.append(datetime.datetime.strptime("{}-{}-{}".format(year, month, i).strip(), "%Y-%B-%d").strftime("%m/%d/%Y"))
            result.append(" & ".join(temp))
        else:
            result.append(datetime.datetime.strptime(dateV.strip() + year, "%B %d %Y").strftime("%m/%d/%Y"))
    return ", ".join(result)

print( getDate(d1) )    
print( getDate(d2) )
03/20/2011 & 03/08/2011
02/15/2013, 12/05/2013 & 12/06/2013

以上所有答案都很好,我想出了另一种方法,可以使用多年:

from datetime import datetime
import re

date1 = "March 20 & June 8, 2011"
date2 = "September 4 & 27, 2010"
date3 = "February 15, December 5 & 6, 2013"


def extract_dates(date):
    dates = []
    last_index = None
    for year in re.finditer('\d{4}', date):
        if last_index is None:
            text = date[:year.span(0)[0]]
        else:
            text = date[last_index:year.span(0)[0]]
        last_index = year.span(0)[1]

        months = [match for match in re.finditer('[A-z]+', text)]
        for m, month in enumerate(months):
            if m == len(months) - 1:
                text_days = text[month.span(0)[1]:]
            else:
                text_days = text[month.span(0)[1]:months[m + 1].span(0)[0]]

            for day in re.finditer('\d{1,2}', text_days):
                dates.append(datetime.strptime(month.group(0) + ' ' + day.group(0) + ', ' + year.group(0), '%B %d, %Y'))

    return dates


print(extract_dates(date1))
print(extract_dates(date2))
print(extract_dates(date3))

以上所有答案都很好,我想出了另一种方法,可以使用多年:

from datetime import datetime
import re

date1 = "March 20 & June 8, 2011"
date2 = "September 4 & 27, 2010"
date3 = "February 15, December 5 & 6, 2013"


def extract_dates(date):
    dates = []
    last_index = None
    for year in re.finditer('\d{4}', date):
        if last_index is None:
            text = date[:year.span(0)[0]]
        else:
            text = date[last_index:year.span(0)[0]]
        last_index = year.span(0)[1]

        months = [match for match in re.finditer('[A-z]+', text)]
        for m, month in enumerate(months):
            if m == len(months) - 1:
                text_days = text[month.span(0)[1]:]
            else:
                text_days = text[month.span(0)[1]:months[m + 1].span(0)[0]]

            for day in re.finditer('\d{1,2}', text_days):
                dates.append(datetime.strptime(month.group(0) + ' ' + day.group(0) + ', ' + year.group(0), '%B %d, %Y'))

    return dates


print(extract_dates(date1))
print(extract_dates(date2))
print(extract_dates(date3))

Pyparsing是一个方便的Python模块,用于解析这样的字符串。下面是一个带注释的解析器,它可以破解您的输入字符串,并为每个字符串提供月份、天数和年份:

import pyparsing as pp
import calendar

COMMA = pp.Suppress(',')
AMP = pp.Suppress('&')
DASH = pp.Suppress('-')

# use pyparsing-defined integer expression, which auto-converts parsed str's to int's
day_number = pp.pyparsing_common.integer()
# day numbers only go from 1-31
day_number.addCondition(lambda t: 1 <= t[0] <= 31)

# not in the spec, but let's support day ranges, too!
day_range = day_number("first") + DASH + day_number("last")
# parse-time conversion from "4-6" to [4, 5, 6]
day_range.addParseAction(lambda t: list(range(t.first, t.last+1)))

# this function will come in handy to build list parsers of day numbers and month-day
expr_list = lambda expr: expr + pp.ZeroOrMore(COMMA + expr) + pp.Optional(AMP + expr)

# support "10", "10 & 11", "10, 11, & 12"
day_list = expr_list(day_range | day_number)

# get the month names from the calendar module
month_name = pp.oneOf(calendar.month_name[1:])

# an expression containing a month name and a list of 1 or more day numbers
date_expr = pp.Group(month_name("month") + day_list("days"))

# use expr_list again to support multiple date_exprs separated by commas and ampersands
date_list = expr_list(date_expr)

year_number = pp.pyparsing_common.integer()
# year numbers start with 2000
year_number.addCondition(lambda t: t[0] >= 2000)

# put all together into a single parser expression
full_date = date_list("dates") + COMMA + year_number("year")

tests = """\
March 20 & June 8, 2011
September 4 & 27, 2010
February 15, December 5 & 6, 2013
September 4-6, 2010
"""

full_date.runTests(tests)
为了获得(年、月、日)元组,我们添加了另一个解析操作并重新运行测试:

print("convert parsed fields into (year, month-name, date) tuples")
def expand_dates(t):
    return [(t.year, d.month, dy) for d in t.dates for dy in d.days]
full_date.addParseAction(expand_dates)

full_date.runTests(tests)
印刷品:

March 20 & June 8, 2011
[['March', 20], ['June', 8], 2011]
- dates: [['March', 20], ['June', 8]]
  [0]:
    ['March', 20]
    - days: [20]
    - month: 'March'
  [1]:
    ['June', 8]
    - days: [8]
    - month: 'June'
- year: 2011


September 4 & 27, 2010
[['September', 4, 27], 2010]
- dates: [['September', 4, 27]]
  [0]:
    ['September', 4, 27]
    - days: [4, 27]
    - month: 'September'
- year: 2010


February 15, December 5 & 6, 2013
[['February', 15], ['December', 5, 6], 2013]
- dates: [['February', 15], ['December', 5, 6]]
  [0]:
    ['February', 15]
    - days: [15]
    - month: 'February'
  [1]:
    ['December', 5, 6]
    - days: [5, 6]
    - month: 'December'
- year: 2013


September 4-6, 2010
[['September', 4, 5, 6], 2010]
- dates: [['September', 4, 5, 6]]
  [0]:
    ['September', 4, 5, 6]
    - days: [4, 5, 6]
    - month: 'September'
- year: 2010
convert parsed fields into (year, month-name, date) tuples

March 20 & June 8, 2011
[(2011, 'March', 20), (2011, 'June', 8)]


September 4 & 27, 2010
[(2010, 'September', 4), (2010, 'September', 27)]


February 15, December 5 & 6, 2013
[(2013, 'February', 15), (2013, 'December', 5), (2013, 'December', 6)]


September 4-6, 2010
[(2010, 'September', 4), (2010, 'September', 5), (2010, 'September', 6)]
convert (year, month-name, date) tuples into datetime.date's

March 20 & June 8, 2011
[datetime.date(2011, 3, 20), datetime.date(2011, 6, 8)]


September 4 & 27, 2010
[datetime.date(2010, 9, 4), datetime.date(2010, 9, 27)]


February 15, December 5 & 6, 2013
[datetime.date(2013, 2, 15), datetime.date(2013, 12, 5), datetime.date(2013, 12, 6)]


September 4-6, 2010
[datetime.date(2010, 9, 4), datetime.date(2010, 9, 5), datetime.date(2010, 9, 6)]
最后,使用另一个解析操作将它们变成
datetime.date
对象:

print("convert (year, month-name, date) tuples into datetime.date's")
# define mapping of month-name to month number 1-12
month_map = {name: num for num,name in enumerate(calendar.month_name[1:], start=1)}
from datetime import date
full_date.addParseAction(pp.tokenMap(lambda t: date(t[0], month_map[t[1]], t[2])))
full_date.runTests(tests)
印刷品:

March 20 & June 8, 2011
[['March', 20], ['June', 8], 2011]
- dates: [['March', 20], ['June', 8]]
  [0]:
    ['March', 20]
    - days: [20]
    - month: 'March'
  [1]:
    ['June', 8]
    - days: [8]
    - month: 'June'
- year: 2011


September 4 & 27, 2010
[['September', 4, 27], 2010]
- dates: [['September', 4, 27]]
  [0]:
    ['September', 4, 27]
    - days: [4, 27]
    - month: 'September'
- year: 2010


February 15, December 5 & 6, 2013
[['February', 15], ['December', 5, 6], 2013]
- dates: [['February', 15], ['December', 5, 6]]
  [0]:
    ['February', 15]
    - days: [15]
    - month: 'February'
  [1]:
    ['December', 5, 6]
    - days: [5, 6]
    - month: 'December'
- year: 2013


September 4-6, 2010
[['September', 4, 5, 6], 2010]
- dates: [['September', 4, 5, 6]]
  [0]:
    ['September', 4, 5, 6]
    - days: [4, 5, 6]
    - month: 'September'
- year: 2010
convert parsed fields into (year, month-name, date) tuples

March 20 & June 8, 2011
[(2011, 'March', 20), (2011, 'June', 8)]


September 4 & 27, 2010
[(2010, 'September', 4), (2010, 'September', 27)]


February 15, December 5 & 6, 2013
[(2013, 'February', 15), (2013, 'December', 5), (2013, 'December', 6)]


September 4-6, 2010
[(2010, 'September', 4), (2010, 'September', 5), (2010, 'September', 6)]
convert (year, month-name, date) tuples into datetime.date's

March 20 & June 8, 2011
[datetime.date(2011, 3, 20), datetime.date(2011, 6, 8)]


September 4 & 27, 2010
[datetime.date(2010, 9, 4), datetime.date(2010, 9, 27)]


February 15, December 5 & 6, 2013
[datetime.date(2013, 2, 15), datetime.date(2013, 12, 5), datetime.date(2013, 12, 6)]


September 4-6, 2010
[datetime.date(2010, 9, 4), datetime.date(2010, 9, 5), datetime.date(2010, 9, 6)]

Pyparsing是一个方便的Python模块,用于解析这样的字符串。下面是一个带注释的解析器,它可以破解您的输入字符串,并为每个字符串提供月份、天数和年份:

import pyparsing as pp
import calendar

COMMA = pp.Suppress(',')
AMP = pp.Suppress('&')
DASH = pp.Suppress('-')

# use pyparsing-defined integer expression, which auto-converts parsed str's to int's
day_number = pp.pyparsing_common.integer()
# day numbers only go from 1-31
day_number.addCondition(lambda t: 1 <= t[0] <= 31)

# not in the spec, but let's support day ranges, too!
day_range = day_number("first") + DASH + day_number("last")
# parse-time conversion from "4-6" to [4, 5, 6]
day_range.addParseAction(lambda t: list(range(t.first, t.last+1)))

# this function will come in handy to build list parsers of day numbers and month-day
expr_list = lambda expr: expr + pp.ZeroOrMore(COMMA + expr) + pp.Optional(AMP + expr)

# support "10", "10 & 11", "10, 11, & 12"
day_list = expr_list(day_range | day_number)

# get the month names from the calendar module
month_name = pp.oneOf(calendar.month_name[1:])

# an expression containing a month name and a list of 1 or more day numbers
date_expr = pp.Group(month_name("month") + day_list("days"))

# use expr_list again to support multiple date_exprs separated by commas and ampersands
date_list = expr_list(date_expr)

year_number = pp.pyparsing_common.integer()
# year numbers start with 2000
year_number.addCondition(lambda t: t[0] >= 2000)

# put all together into a single parser expression
full_date = date_list("dates") + COMMA + year_number("year")

tests = """\
March 20 & June 8, 2011
September 4 & 27, 2010
February 15, December 5 & 6, 2013
September 4-6, 2010
"""

full_date.runTests(tests)
为了获得(年、月、日)元组,我们添加了另一个解析操作并重新运行测试:

print("convert parsed fields into (year, month-name, date) tuples")
def expand_dates(t):
    return [(t.year, d.month, dy) for d in t.dates for dy in d.days]
full_date.addParseAction(expand_dates)

full_date.runTests(tests)
印刷品:

March 20 & June 8, 2011
[['March', 20], ['June', 8], 2011]
- dates: [['March', 20], ['June', 8]]
  [0]:
    ['March', 20]
    - days: [20]
    - month: 'March'
  [1]:
    ['June', 8]
    - days: [8]
    - month: 'June'
- year: 2011


September 4 & 27, 2010
[['September', 4, 27], 2010]
- dates: [['September', 4, 27]]
  [0]:
    ['September', 4, 27]
    - days: [4, 27]
    - month: 'September'
- year: 2010


February 15, December 5 & 6, 2013
[['February', 15], ['December', 5, 6], 2013]
- dates: [['February', 15], ['December', 5, 6]]
  [0]:
    ['February', 15]
    - days: [15]
    - month: 'February'
  [1]:
    ['December', 5, 6]
    - days: [5, 6]
    - month: 'December'
- year: 2013


September 4-6, 2010
[['September', 4, 5, 6], 2010]
- dates: [['September', 4, 5, 6]]
  [0]:
    ['September', 4, 5, 6]
    - days: [4, 5, 6]
    - month: 'September'
- year: 2010
convert parsed fields into (year, month-name, date) tuples

March 20 & June 8, 2011
[(2011, 'March', 20), (2011, 'June', 8)]


September 4 & 27, 2010
[(2010, 'September', 4), (2010, 'September', 27)]


February 15, December 5 & 6, 2013
[(2013, 'February', 15), (2013, 'December', 5), (2013, 'December', 6)]


September 4-6, 2010
[(2010, 'September', 4), (2010, 'September', 5), (2010, 'September', 6)]
convert (year, month-name, date) tuples into datetime.date's

March 20 & June 8, 2011
[datetime.date(2011, 3, 20), datetime.date(2011, 6, 8)]


September 4 & 27, 2010
[datetime.date(2010, 9, 4), datetime.date(2010, 9, 27)]


February 15, December 5 & 6, 2013
[datetime.date(2013, 2, 15), datetime.date(2013, 12, 5), datetime.date(2013, 12, 6)]


September 4-6, 2010
[datetime.date(2010, 9, 4), datetime.date(2010, 9, 5), datetime.date(2010, 9, 6)]
最后,使用另一个解析操作将它们变成
datetime.date
对象:

print("convert (year, month-name, date) tuples into datetime.date's")
# define mapping of month-name to month number 1-12
month_map = {name: num for num,name in enumerate(calendar.month_name[1:], start=1)}
from datetime import date
full_date.addParseAction(pp.tokenMap(lambda t: date(t[0], month_map[t[1]], t[2])))
full_date.runTests(tests)
印刷品:

March 20 & June 8, 2011
[['March', 20], ['June', 8], 2011]
- dates: [['March', 20], ['June', 8]]
  [0]:
    ['March', 20]
    - days: [20]
    - month: 'March'
  [1]:
    ['June', 8]
    - days: [8]
    - month: 'June'
- year: 2011


September 4 & 27, 2010
[['September', 4, 27], 2010]
- dates: [['September', 4, 27]]
  [0]:
    ['September', 4, 27]
    - days: [4, 27]
    - month: 'September'
- year: 2010


February 15, December 5 & 6, 2013
[['February', 15], ['December', 5, 6], 2013]
- dates: [['February', 15], ['December', 5, 6]]
  [0]:
    ['February', 15]
    - days: [15]
    - month: 'February'
  [1]:
    ['December', 5, 6]
    - days: [5, 6]
    - month: 'December'
- year: 2013


September 4-6, 2010
[['September', 4, 5, 6], 2010]
- dates: [['September', 4, 5, 6]]
  [0]:
    ['September', 4, 5, 6]
    - days: [4, 5, 6]
    - month: 'September'
- year: 2010
convert parsed fields into (year, month-name, date) tuples

March 20 & June 8, 2011
[(2011, 'March', 20), (2011, 'June', 8)]


September 4 & 27, 2010
[(2010, 'September', 4), (2010, 'September', 27)]


February 15, December 5 & 6, 2013
[(2013, 'February', 15), (2013, 'December', 5), (2013, 'December', 6)]


September 4-6, 2010
[(2010, 'September', 4), (2010, 'September', 5), (2010, 'September', 6)]
convert (year, month-name, date) tuples into datetime.date's

March 20 & June 8, 2011
[datetime.date(2011, 3, 20), datetime.date(2011, 6, 8)]


September 4 & 27, 2010
[datetime.date(2010, 9, 4), datetime.date(2010, 9, 27)]


February 15, December 5 & 6, 2013
[datetime.date(2013, 2, 15), datetime.date(2013, 12, 5), datetime.date(2013, 12, 6)]


September 4-6, 2010
[datetime.date(2010, 9, 4), datetime.date(2010, 9, 5), datetime.date(2010, 9, 6)]

你能用
&
分开字符串先做一些初始解析吗?你能用
&
分开字符串先做一些初始解析吗?
\w
标志会捕捉
和数字,这样它就可以识别像
M4r\u h
这样的东西。我把它改成了
[A-Za-z]
最好是
[A-Z][A-Z]+
-它只允许第一个字符是大写的
\w
标志将捕捉
\uuu
和数字,这样它就可以识别像
M4r\uh
这样的东西。我将它改为
[A-Za-Z]
最好是
[A-Z][A-Z]+
-它只允许第一个字符是大写的