Python 修改Pandas dataframe以列出年、月和日期_Python_Pandas

Python 修改Pandas dataframe以列出年、月和日期

python pandas

Python 修改Pandas dataframe以列出年、月和日期,python,pandas,Python,Pandas,我想在下面修改我正在创建的数据框： from datetime import date from dateutil.rrule import rrule, DAILY, YEARLY from dateutil.relativedelta import * import pandas START_YR = 2010 END_YR = 2013 strt_date = datetime.date(START_YR, 1, 1) end_date = datetime.date(END_YR,

我想在下面修改我正在创建的数据框：

from datetime import date
from dateutil.rrule import rrule, DAILY, YEARLY
from dateutil.relativedelta import *
import pandas

START_YR = 2010
END_YR = 2013

strt_date = datetime.date(START_YR, 1, 1)
end_date  = datetime.date(END_YR, 12, 31)

dt = rrule(DAILY, dtstart=strt_date, until=end_date)

serie_1 = pandas.Series(np.random.randn(dt.count()), \
        index = pandas.date_range(strt_date, end_date))

如何创建以年、月和日期为独立列的数据框？

将序列转换为数据框，然后将新列添加为周期。如果您只想将月份作为整数，请参见“month_int”示例

df = pd.DataFrame(serie_1)
df['month'] = [ts.to_period('M') for ts in df.index]
df['year'] = [ts.to_period('Y') for ts in df.index]
df['month_int'] = [ts.month for ts in df.index]

>>> df
Out[16]: 
                   0   month   year  month_int

2010-01-01  0.332370  2010-01  2010          1
2010-01-02 -0.036814  2010-01  2010          1
2010-01-03  1.751511  2010-01  2010          1
...              ...      ...   ...        ...
2013-12-29  0.345707  2013-12  2013         12
2013-12-30 -0.395924  2013-12  2013         12
2013-12-31 -0.614565  2013-12  2013         12

将序列转换为数据帧，然后添加新列作为句点。如果您只想将月份作为整数，请参见“month_int”示例

df = pd.DataFrame(serie_1)
df['month'] = [ts.to_period('M') for ts in df.index]
df['year'] = [ts.to_period('Y') for ts in df.index]
df['month_int'] = [ts.month for ts in df.index]

>>> df
Out[16]: 
                   0   month   year  month_int

2010-01-01  0.332370  2010-01  2010          1
2010-01-02 -0.036814  2010-01  2010          1
2010-01-03  1.751511  2010-01  2010          1
...              ...      ...   ...        ...
2013-12-29  0.345707  2013-12  2013         12
2013-12-30 -0.395924  2013-12  2013         12
2013-12-31 -0.614565  2013-12  2013         12

仅访问属性将大大加快速度：

df['date'] = df.index.date
df['year'] = df.index.year
df['month'] = df.index.month

将计时与列表理解方法进行比较：

In [25]:

%%timeit
df['month'] = [ts.to_period('M') for ts in df.index]
df['year'] = [ts.to_period('Y') for ts in df.index]
df['month_int'] = [ts.month for ts in df.index]
1 loops, best of 3: 664 ms per loop
In [26]:

%%timeit
df['date'] = df.index.date
df['year'] = df.index.year
df['month'] = df.index.month

100 loops, best of 3: 5.96 ms per loop

因此，使用datetime属性的速度快了100倍以上

仅访问属性的速度将大大加快：

df['date'] = df.index.date
df['year'] = df.index.year
df['month'] = df.index.month

将计时与列表理解方法进行比较：

In [25]:

%%timeit
df['month'] = [ts.to_period('M') for ts in df.index]
df['year'] = [ts.to_period('Y') for ts in df.index]
df['month_int'] = [ts.month for ts in df.index]
1 loops, best of 3: 664 ms per loop
In [26]:

%%timeit
df['date'] = df.index.date
df['year'] = df.index.year
df['month'] = df.index.month

100 loops, best of 3: 5.96 ms per loop

因此，使用datetime属性快100倍以上

是的，属性的执行速度更快，但周期在熊猫生态系统中发挥的作用更好（例如，绘图、分组等）。句点存储为对象（因此占用更多内存），不容易保存到数据库。两种方法之间的使用取决于偏好和数据量。是的，属性执行速度更快，但周期对熊猫生态系统的影响更大（例如，绘图、分组等）。句点存储为对象（因此占用更多内存），不容易保存到数据库。两种方法之间的使用取决于偏好和数据量。