Python 3.x 要在两列之间分解的数据
我当前的数据帧如下所示:Python 3.x 要在两列之间分解的数据,python-3.x,pandas,Python 3.x,Pandas,我当前的数据帧如下所示: existing_data = {'STORE_ID': ['1234','5678','9876','3456','6789'], 'FULFILLMENT_TYPE': ['DELIVERY','DRIVE','DELIVERY','DRIVE','DELIVERY'], 'FORECAST_DATE':['2020-08-01','2020-08-02','2020-08-03','2020-08-04','2020-08-05'
existing_data = {'STORE_ID': ['1234','5678','9876','3456','6789'],
'FULFILLMENT_TYPE': ['DELIVERY','DRIVE','DELIVERY','DRIVE','DELIVERY'],
'FORECAST_DATE':['2020-08-01','2020-08-02','2020-08-03','2020-08-04','2020-08-05'],
'DAY_OF_WEEK':['SATURDAY','SUNDAY','MONDAY','TUESDAY','WEDNESDAY'],
'START_HOUR':[8,8,6,7,9],
'END_HOUR':[19,19,18,19,17]}
existing = pd.DataFrame(data=existing_data)
我需要在开始和结束时间之间分解数据,以便每个小时都是不同的一行,如下所示:
needed_data = {'STORE_ID': ['1234','1234','1234','1234','1234'],
'FULFILLMENT_TYPE': ['DELIVERY','DELIVERY','DELIVERY','DELIVERY','DELIVERY'],
'FORECAST_DATE':['2020-08-01','2020-08-01','2020-08-01','2020-08-01','2020-08-01'],
'DAY_OF_WEEK': ['SATURDAY','SATURDAY','SATURDAY','SATURDAY','SATURDAY'],
'HOUR':[8,9,10,11,12]}
required = pd.DataFrame(data=needed_data)
不确定如何实现这一点。我知道应该使用explode()实现,但无法实现。如果小数据帧或性能不重要,请在两列中使用
范围:
如果性能很重要,则将两列相减,然后将计数器添加到开始时间:
s = existing["END_HOUR"].sub(existing["START_HOUR"]) + 1
df = existing.loc[existing.index.repeat(s)].copy()
add = df.groupby(level=0).cumcount()
df['HOUR'] = df["START_HOUR"].add(add)
df = df.reset_index(drop=True).drop(['START_HOUR','END_HOUR'], axis=1)
s = existing["END_HOUR"].sub(existing["START_HOUR"]) + 1
df = existing.loc[existing.index.repeat(s)].copy()
add = df.groupby(level=0).cumcount()
df['HOUR'] = df["START_HOUR"].add(add)
df = df.reset_index(drop=True).drop(['START_HOUR','END_HOUR'], axis=1)