Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/python-3.x/17.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 3.x 要在两列之间分解的数据_Python 3.x_Pandas - Fatal编程技术网

Python 3.x 要在两列之间分解的数据

Python 3.x 要在两列之间分解的数据,python-3.x,pandas,Python 3.x,Pandas,我当前的数据帧如下所示: existing_data = {'STORE_ID': ['1234','5678','9876','3456','6789'], 'FULFILLMENT_TYPE': ['DELIVERY','DRIVE','DELIVERY','DRIVE','DELIVERY'], 'FORECAST_DATE':['2020-08-01','2020-08-02','2020-08-03','2020-08-04','2020-08-05'

我当前的数据帧如下所示:

existing_data = {'STORE_ID': ['1234','5678','9876','3456','6789'],
        'FULFILLMENT_TYPE': ['DELIVERY','DRIVE','DELIVERY','DRIVE','DELIVERY'], 
        'FORECAST_DATE':['2020-08-01','2020-08-02','2020-08-03','2020-08-04','2020-08-05'],
        'DAY_OF_WEEK':['SATURDAY','SUNDAY','MONDAY','TUESDAY','WEDNESDAY'],
        'START_HOUR':[8,8,6,7,9],
        'END_HOUR':[19,19,18,19,17]}

existing = pd.DataFrame(data=existing_data)
我需要在开始和结束时间之间分解数据,以便每个小时都是不同的一行,如下所示:

needed_data = {'STORE_ID': ['1234','1234','1234','1234','1234'],
        'FULFILLMENT_TYPE': ['DELIVERY','DELIVERY','DELIVERY','DELIVERY','DELIVERY'], 
        'FORECAST_DATE':['2020-08-01','2020-08-01','2020-08-01','2020-08-01','2020-08-01'],
        'DAY_OF_WEEK': ['SATURDAY','SATURDAY','SATURDAY','SATURDAY','SATURDAY'],
        'HOUR':[8,9,10,11,12]}

required = pd.DataFrame(data=needed_data)

不确定如何实现这一点。我知道应该使用explode()实现,但无法实现。

如果小数据帧或性能不重要,请在两列中使用
范围

如果性能很重要,则将两列相减,然后将计数器添加到开始时间:

s = existing["END_HOUR"].sub(existing["START_HOUR"]) + 1
df = existing.loc[existing.index.repeat(s)].copy()

add = df.groupby(level=0).cumcount()
df['HOUR'] = df["START_HOUR"].add(add)
df = df.reset_index(drop=True).drop(['START_HOUR','END_HOUR'], axis=1)
s = existing["END_HOUR"].sub(existing["START_HOUR"]) + 1
df = existing.loc[existing.index.repeat(s)].copy()

add = df.groupby(level=0).cumcount()
df['HOUR'] = df["START_HOUR"].add(add)
df = df.reset_index(drop=True).drop(['START_HOUR','END_HOUR'], axis=1)