Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/308.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 人类格式日期范围解析_Python_Pandas - Fatal编程技术网

Python 人类格式日期范围解析

Python 人类格式日期范围解析,python,pandas,Python,Pandas,我有人类格式的日期范围: dt = pd.Series(['27.02-11.03.2014', '10-11.06.2014']) 我想获得带有事件开始结束日期的数据帧,我目前使用: tmp = dt.str.split('-').apply(lambda x: pd.Series(x, index=['start', 'end'])).apply(lambda x: pd.to_datetime(x, dayfirst=True)) def dt_parse(dt): x, y

我有人类格式的日期范围:

dt = pd.Series(['27.02-11.03.2014', '10-11.06.2014'])
我想获得带有事件开始结束日期的数据帧,我目前使用:

tmp = dt.str.split('-').apply(lambda x: pd.Series(x, index=['start', 'end'])).apply(lambda x: pd.to_datetime(x, dayfirst=True))

def dt_parse(dt):
    x, y = dt
    if len(x) > 2:
        t = x.split('.')
        r = pd.to_datetime('-'.join([t[0], t[1], str(y.year)]), dayfirst = True)
    else:
        r = pd.to_datetime('-'.join([x, str(y.month), str(y.year)]), dayfirst = True)
    return r

tmp['start'] = tmp.apply(dt_parse, axis = 1)
得到

    start   end
0   2014-02-27  2014-03-11
1   2014-06-10  2014-06-11
还有其他(更有效/更有说服力的)想法吗


BR

您可以使用
dt.str.extract
来使用正则表达式选择值:

In [108]: df = dt.str.extract(r'(?P<start_day>\d+)(?:\.(?P<start_month>\d+))?-(?P<end_day>\d+)\.(?P<end_month>\d+)\.(?P<year>\d+)')

In [109]: df
Out[109]: 
  start_day start_month end_day end_month  year
0        27          02      11        03  2014
1        10         NaN      11        06  2014
然后使用
combine64
函数(如下)将各个数字组合成np.datetime64值:

import numpy as np
import pandas as pd

def combine64(years, months=1, days=1, weeks=None, hours=None, minutes=None,
              seconds=None, milliseconds=None, microseconds=None, nanoseconds=None):
    years = np.asarray(years) - 1970
    months = np.asarray(months) - 1
    days = np.asarray(days) - 1
    types = ('<M8[Y]', '<m8[M]', '<m8[D]', '<m8[W]', '<m8[h]',
             '<m8[m]', '<m8[s]', '<m8[ms]', '<m8[us]', '<m8[ns]')
    vals = (years, months, days, weeks, hours, minutes, seconds,
            milliseconds, microseconds, nanoseconds)
    return sum(np.asarray(v, dtype=t) for t, v in zip(types, vals)
               if v is not None)

dt = pd.Series(['27.02-11.03.2014', '10-11.06.2014'])

df = dt.str.extract(r'(?P<start_day>\d+)(?:\.(?P<start_month>\d+))?-(?P<end_day>\d+)\.(?P<end_month>\d+)\.(?P<year>\d+)')
df = df.astype('float')
df['start_month'] = df['start_month'].fillna(value=df['end_month'])
df['start'] = combine64(df['year'], df['start_month'], df['start_day'])
df['end'] = combine64(df['year'], df['end_month'], df['end_day'])
df = df[['start', 'end']]
print(df)
import numpy as np
import pandas as pd

def combine64(years, months=1, days=1, weeks=None, hours=None, minutes=None,
              seconds=None, milliseconds=None, microseconds=None, nanoseconds=None):
    years = np.asarray(years) - 1970
    months = np.asarray(months) - 1
    days = np.asarray(days) - 1
    types = ('<M8[Y]', '<m8[M]', '<m8[D]', '<m8[W]', '<m8[h]',
             '<m8[m]', '<m8[s]', '<m8[ms]', '<m8[us]', '<m8[ns]')
    vals = (years, months, days, weeks, hours, minutes, seconds,
            milliseconds, microseconds, nanoseconds)
    return sum(np.asarray(v, dtype=t) for t, v in zip(types, vals)
               if v is not None)

dt = pd.Series(['27.02-11.03.2014', '10-11.06.2014'])

df = dt.str.extract(r'(?P<start_day>\d+)(?:\.(?P<start_month>\d+))?-(?P<end_day>\d+)\.(?P<end_month>\d+)\.(?P<year>\d+)')
df = df.astype('float')
df['start_month'] = df['start_month'].fillna(value=df['end_month'])
df['start'] = combine64(df['year'], df['start_month'], df['start_day'])
df['end'] = combine64(df['year'], df['end_month'], df['end_day'])
df = df[['start', 'end']]
print(df)
       start        end
0 2014-02-27 2014-03-11
1 2014-06-10 2014-06-11