Python 如何最有效地拆分列中以字符串表示的日期?

Python 如何最有效地拆分列中以字符串表示的日期?,python,pandas,Python,Pandas,我的熊猫数据框中有一个名为start\u date的列,格式为字符串: 开始日期 '20120212' '20120514' '20121124' '20120604' 要提取和创建月、年和日的单独列,这就是我目前正在做的。有没有更好的方法可以做到这一点 df['start\u month']=df['start\u date'].应用(lambda x:str(x)[4:6]) df['start\u year']=df['start\u date']。应用(lambda x:str(x)[0

我的熊猫数据框中有一个名为
start\u date
的列,格式为字符串:

开始日期
'20120212'

'20120514'

'20121124'

'20120604'

要提取和创建月、年和日的单独列,这就是我目前正在做的。有没有更好的方法可以做到这一点

df['start\u month']=df['start\u date'].应用(lambda x:str(x)[4:6])

df['start\u year']=df['start\u date']。应用(lambda x:str(x)[0:4])

df['start\u day']=df['start\u date']。应用(lambda x:str(x)[6:8])
使用,然后提取年、月和日:

a = pd.to_datetime(df['start_date'], format='%Y%m%d')
df['start_month'] = a.dt.month
df['start_year'] = a.dt.year
df['start_day'] = a.dt.day
或者通过
str[]
进行切片并转换为
int

df['start_date'] = df['start_date'].astype(str)
df['start_month'] = df['start_date'].str[4:6].astype(int)
df['start_year']=df['start_date'].str[:4].astype(int)
df['start_day']=df['start_date'].str[6:8].astype(int)
print (df)
  start_date  start_month  start_year  start_day
0   20120212            2        2012         12
1   20120514            5        2012         14
2   20121124           11        2012         24
3   20120604            6        2012          4
比较解决方案:

[40000 rows x 1 columns]
df = pd.concat([df]*10000).reset_index(drop=True)

def orig(df):
    df['start_month']=df['start_date'].apply(lambda x:str(x)[4:6]).astype(int)
    df['start_year']=df['start_date'].apply(lambda x:str(x)[0:4]).astype(int)
    df['start_day']=df['start_date'].apply(lambda x:str(x)[6:8]).astype(int)
    return df

def a(df):
    a = pd.to_datetime(df['start_date'], format='%Y%m%d')
    df['start_month'] = a.dt.month
    df['start_year'] = a.dt.year
    df['start_day'] = a.dt.day
    return df

def b(df):
    df['start_month'] = df['start_date'].str[4:6].astype(int)
    df['start_year']=df['start_date'].str[:4].astype(int)
    df['start_day']=df['start_date'].str[6:8].astype(int)
    return df

使用并提取年、月和日:

a = pd.to_datetime(df['start_date'], format='%Y%m%d')
df['start_month'] = a.dt.month
df['start_year'] = a.dt.year
df['start_day'] = a.dt.day
或者通过
str[]
进行切片并转换为
int

df['start_date'] = df['start_date'].astype(str)
df['start_month'] = df['start_date'].str[4:6].astype(int)
df['start_year']=df['start_date'].str[:4].astype(int)
df['start_day']=df['start_date'].str[6:8].astype(int)
print (df)
  start_date  start_month  start_year  start_day
0   20120212            2        2012         12
1   20120514            5        2012         14
2   20121124           11        2012         24
3   20120604            6        2012          4
比较解决方案:

[40000 rows x 1 columns]
df = pd.concat([df]*10000).reset_index(drop=True)

def orig(df):
    df['start_month']=df['start_date'].apply(lambda x:str(x)[4:6]).astype(int)
    df['start_year']=df['start_date'].apply(lambda x:str(x)[0:4]).astype(int)
    df['start_day']=df['start_date'].apply(lambda x:str(x)[6:8]).astype(int)
    return df

def a(df):
    a = pd.to_datetime(df['start_date'], format='%Y%m%d')
    df['start_month'] = a.dt.month
    df['start_year'] = a.dt.year
    df['start_day'] = a.dt.day
    return df

def b(df):
    df['start_month'] = df['start_date'].str[4:6].astype(int)
    df['start_year']=df['start_date'].str[:4].astype(int)
    df['start_day']=df['start_date'].str[6:8].astype(int)
    return df