Python 多个日期行在df中以2列的形式显示,间隔为开始日期和结束日期

Python 多个日期行在df中以2列的形式显示,间隔为开始日期和结束日期,python,Python,我有以下建议: time_series date sales q1 q2 q3 store_0025_item_85011885 2020-07-19 4.0 0.0 2.0 1.0 store_0025_item_85011885 2020-07-26 4.0 0.0 2.0 1.0 store_0025_item_85011885 2020-08-09 6.0 0.0 2.0 1.0 store_0025_item_85

我有以下建议:

time_series                    date   sales  q1  q2  q3
store_0025_item_85011885    2020-07-19  4.0 0.0 2.0 1.0
store_0025_item_85011885    2020-07-26  4.0 0.0 2.0 1.0
store_0025_item_85011885    2020-08-09  6.0 0.0 2.0 1.0
store_0025_item_85011885    2020-08-16  4.0 0.0 2.0 1.0
store_0053_item_85011885    2020-12-06  7.0 0.0 8.0 1.0
store_0053_item_85011885    2020-12-13  7.0 0.0 8.0 1.0
store_0053_item_85011885    2020-12-20  6.0 0.0 8.0 1.0
store_0053_item_85011885    2020-12-27  5.0 0.0 8.0 1.0
我尝试将pivot_表与以下代码一起使用:

df_p = pd.pivot_table(df_m, values='q2', index=['time_series'],
                    columns=['date'], fill_value=0)
但是,返回带有日期的列。我想要的是返回以下df:

time_series                 start_date   end_date   quantity
store_0025_item_85011885    2020-07-19   2020-07-26  2.0
store_0025_item_85011885    2020-08-09   2020-08-16  2.0
store_0053_item_85011885    2020-12-06   2020-12-27  8.0

看,那个‘时间序列’=store\u 0025\u item\u 85011885,我们有两个连续的星期间隔,所以我们需要两行,但是‘时间序列’=store\u 0053\u item\u 85011885,我们只有一个连续的间隔,所以我们需要一行。我们需要复制的数量是'q2'列。我怎样才能做到这一点呢?

我是按连续一年一周进行分组的。有关按连续元素分组的详细说明,请参见:

试试看:

import numpy as np
df.date = pd.to_datetime(df.date, format='%Y-%m-%d')
u = df.date.dt.strftime('%U').astype(int)
d = {'amin':'start_date','amax':'end_date','last':'quantity'}
df = df.groupby(['time_series', (u != u.shift()+1).cumsum()]).agg({'date' : [np.min, np.max], 'q2': 'last'}).rename(columns=d)
df.columns = df.columns.droplevel(0)
                                start_date  end_date    quantity
time_series date            
store_0025_item_85011885    1   2020-07-19  2020-07-26  2.0
                            2   2020-08-09  2020-08-16  2.0
store_0053_item_85011885    3   2020-12-06  2020-12-27  8.0

df:

import numpy as np
df.date = pd.to_datetime(df.date, format='%Y-%m-%d')
u = df.date.dt.strftime('%U').astype(int)
d = {'amin':'start_date','amax':'end_date','last':'quantity'}
df = df.groupby(['time_series', (u != u.shift()+1).cumsum()]).agg({'date' : [np.min, np.max], 'q2': 'last'}).rename(columns=d)
df.columns = df.columns.droplevel(0)
                                start_date  end_date    quantity
time_series date            
store_0025_item_85011885    1   2020-07-19  2020-07-26  2.0
                            2   2020-08-09  2020-08-16  2.0
store_0053_item_85011885    3   2020-12-06  2020-12-27  8.0

使用:
df=df.reset_index()