Python 如何使用“打开事件数据帧”；完"；及；“开始”；按事件重新分组的数据帧中的行？_Python_Pandas_Datetime_Data Science_Data Analysis

Python 如何使用“打开事件数据帧”；完"；及；“开始”；按事件重新分组的数据帧中的行？

python pandas datetime

Python 如何使用“打开事件数据帧”；完"；及；“开始”；按事件重新分组的数据帧中的行？,python,pandas,datetime,data-science,data-analysis,Python,Pandas,Datetime,Data Science,Data Analysis,我有一个按时间顺序排列的事件数据集。我使用数据帧。这就是数据帧的外观： Time Event Location ID 2020-05-22 21:22:04.784622 start UK 50 2020-05-22 21:43:07.060629 end UK 50 2020-05-25 23:22:04.784622 start UK 50 2020-0

我有一个按时间顺序排列的事件数据集。我使用数据帧。这就是数据帧的外观：

Time                         Event   Location    ID
2020-05-22 21:22:04.784622   start   UK          50
2020-05-22 21:43:07.060629   end     UK          50
2020-05-25 23:22:04.784622   start   UK          50
2020-05-25 23:43:07.060629   end     UK          50
2020-05-25 23:44:15.000566   start   US          30
2020-05-25 23:48:23.416348   start   Italy       70
2020-05-26 00:48:06.820164   end     US          30
2020-05-26 01:33:42.454450   end     Italy       70
2020-05-27 20:48:23.416348   start   Italy       30
2020-05-27 00:33:42.454450   end     Italy       30
etc

这就是我想说的：

Start_Time                   End_Time                    Location    ID
2020-05-22 21:22:04.784622   2020-05-22 21:43:07.060629  UK          50
2020-05-25 23:22:04.784622   2020-05-25 23:43:07.060629  UK          50
2020-05-25 23:44:15.000566   2020-05-26 00:48:06.820164  US          30
2020-05-25 23:48:23.416348   2020-05-26 01:33:42.45445   Italy       70
2020-05-27 20:48:23.416348   2020-05-27 00:33:42.454450  Italy       30
etc

我尝试过制作单独的数据帧（一个用于开始，一个用于结束），并在位置和ID上合并它们，但显然不起作用。我也研究过类似的问题，但无法从中找出答案。有人知道我是怎么做到的吗

编辑：此外，数据帧中会有多个具有相同位置或ID的事件。编辑示例中的数据以更准确地反映我的数据集

一种方法是在最后三列上设置索引，并在后面设置事件列

df = pd.read_clipboard(sep='\s{2,}', engine='python', parse_dates=['Time'])

res = (df
       #appending Event,Location and ID with current index
       #prevents duplicate values when unstacking
       .set_index(['Event','Location','ID'], append=True)
       #get Event index as column
       .unstack('Event')
       #topmost column level redundant ... remove
       .droplevel(0,axis=1)
       #fill upwards on the end to align the dates to 
       #the appropriate positions
       .assign(end = lambda x: x['end'].bfill())
       .dropna()
       .add_suffix("_time")
       .reset_index()
       .drop("level_0", axis=1)
       .reindex(['start_time','end_time','Location','ID'], axis=1)
       .rename_axis(None,axis=1)
      )

res



          start_time                      end_time      Location    ID
0   2020-05-22 21:22:04.784622  2020-05-22 21:43:07.060629  UK      50
1   2020-05-25 23:22:04.784622  2020-05-25 23:43:07.060629  UK      50
2   2020-05-25 23:44:15.000566  2020-05-26 00:48:06.820164  US      30
3   2020-05-25 23:48:23.416348  2020-05-26 00:48:06.820164  Italy   70
4   2020-05-27 20:48:23.416348  2020-05-27 00:33:42.454450  Italy   30

我得到了这个错误，可能是因为我的数据集在相同的位置上有很多事件/id

ValueError:Index包含重复的条目，无法重塑

你得到这个错误是基于你共享的数据还是其他一些数据？是的，我忘了提到会有几个相同位置的事件，或在dataframe@sammywemmy我认为如果您添加

.assign（key=df.groupby（['Event']）.cumcount（）+1）

然后将key添加到索引中，您可以处理重复事件issue@sammywemmy它起作用了！我设法找出了错误（这是我自己愚蠢的结果）。非常感谢您的帮助和详细的回答：D