Python 多个日期行在df中以2列的形式显示,间隔为开始日期和结束日期
我有以下建议:Python 多个日期行在df中以2列的形式显示,间隔为开始日期和结束日期,python,Python,我有以下建议: time_series date sales q1 q2 q3 store_0025_item_85011885 2020-07-19 4.0 0.0 2.0 1.0 store_0025_item_85011885 2020-07-26 4.0 0.0 2.0 1.0 store_0025_item_85011885 2020-08-09 6.0 0.0 2.0 1.0 store_0025_item_85
time_series date sales q1 q2 q3
store_0025_item_85011885 2020-07-19 4.0 0.0 2.0 1.0
store_0025_item_85011885 2020-07-26 4.0 0.0 2.0 1.0
store_0025_item_85011885 2020-08-09 6.0 0.0 2.0 1.0
store_0025_item_85011885 2020-08-16 4.0 0.0 2.0 1.0
store_0053_item_85011885 2020-12-06 7.0 0.0 8.0 1.0
store_0053_item_85011885 2020-12-13 7.0 0.0 8.0 1.0
store_0053_item_85011885 2020-12-20 6.0 0.0 8.0 1.0
store_0053_item_85011885 2020-12-27 5.0 0.0 8.0 1.0
我尝试将pivot_表与以下代码一起使用:
df_p = pd.pivot_table(df_m, values='q2', index=['time_series'],
columns=['date'], fill_value=0)
但是,返回带有日期的列。我想要的是返回以下df:
time_series start_date end_date quantity
store_0025_item_85011885 2020-07-19 2020-07-26 2.0
store_0025_item_85011885 2020-08-09 2020-08-16 2.0
store_0053_item_85011885 2020-12-06 2020-12-27 8.0
看,那个‘时间序列’=store\u 0025\u item\u 85011885,我们有两个连续的星期间隔,所以我们需要两行,但是‘时间序列’=store\u 0053\u item\u 85011885,我们只有一个连续的间隔,所以我们需要一行。我们需要复制的数量是'q2'列。我怎样才能做到这一点呢?我是按连续一年一周进行分组的。有关按连续元素分组的详细说明,请参见: 试试看:
import numpy as np
df.date = pd.to_datetime(df.date, format='%Y-%m-%d')
u = df.date.dt.strftime('%U').astype(int)
d = {'amin':'start_date','amax':'end_date','last':'quantity'}
df = df.groupby(['time_series', (u != u.shift()+1).cumsum()]).agg({'date' : [np.min, np.max], 'q2': 'last'}).rename(columns=d)
df.columns = df.columns.droplevel(0)
start_date end_date quantity
time_series date
store_0025_item_85011885 1 2020-07-19 2020-07-26 2.0
2 2020-08-09 2020-08-16 2.0
store_0053_item_85011885 3 2020-12-06 2020-12-27 8.0
df:
import numpy as np
df.date = pd.to_datetime(df.date, format='%Y-%m-%d')
u = df.date.dt.strftime('%U').astype(int)
d = {'amin':'start_date','amax':'end_date','last':'quantity'}
df = df.groupby(['time_series', (u != u.shift()+1).cumsum()]).agg({'date' : [np.min, np.max], 'q2': 'last'}).rename(columns=d)
df.columns = df.columns.droplevel(0)
start_date end_date quantity
time_series date
store_0025_item_85011885 1 2020-07-19 2020-07-26 2.0
2 2020-08-09 2020-08-16 2.0
store_0053_item_85011885 3 2020-12-06 2020-12-27 8.0
使用:
df=df.reset_index()