Pandas Dataframe Python |如何将复制数据帧的一个单元格与另一个单元格进行比较？_Python_Python 3.x_Pandas_Dataframe_Compare

Pandas Dataframe Python |如何将复制数据帧的一个单元格与另一个单元格进行比较？

python python-3.x pandas dataframe

Pandas Dataframe Python |如何将复制数据帧的一个单元格与另一个单元格进行比较？,python,python-3.x,pandas,dataframe,compare,Python,Python 3.x,Pandas,Dataframe,Compare,我有两个名称不同的相同数据帧（df_1和df_2）假设数据帧有两列类别和时间。例如类别时间 A. 2020-02-02 05:05:05.0000 A. 2020-02-02 06:06:06.0000 A. 2020-02-02 07:07:07.0000 B 2020-02-02 05:05:05.0000 B 2020-02-02 06:06:06.0000 C 2020-02-02 05:05:05.0000 C 2020-02-02 06:06:06.0000 原始答案的解决方

我有两个名称不同的相同数据帧（df_1和df_2）

假设数据帧有两列类别和时间。例如

类别时间 A. 2020-02-02 05:05:05.0000 A. 2020-02-02 06:06:06.0000 A. 2020-02-02 07:07:07.0000 B 2020-02-02 05:05:05.0000 B 2020-02-02 06:06:06.0000 C 2020-02-02 05:05:05.0000 C 2020-02-02 06:06:06.0000 原始答案的解决方案解释

首先，按类别对每个数据帧进行分组，以保持其每个值的最小值和最大值。这也会将索引设置为类别

grouped_1 = df_1.groupby("CATEGORY").agg([min, max])
grouped_2 = df_2.groupby("CATEGORY").agg([min, max])

然后，执行一个内部联接，只保留df_1和df_2中的类别。默认情况下，内部连接是在索引上完成的，这就是我们在这里想要的（原始数据帧中的列类别）。水平连接，得到4列：每行两个最小值和两个最大值

grouped_both = pd.concat([grouped_1, grouped_2], join="inner", axis=1)

保留每行的最小值和最大值，并重命名列

final_df = grouped_both.apply([min, max], axis=1)
    .rename(columns={"min":"START TIME", "max":"END TIME"})

注意：我假设您想要合并两个数据帧的第一个和最后一个时间戳。如果您真的希望从dfu 1开始，从dfu 2结束，那么这将是一个稍微不同的解决方案

一个数据帧和添加持续时间的解决方案如果我理解正确，那么您不需要复制原始数据帧

# Group the DataFrame by CATEGORY and keep the min and max values
# We also need to get rid of the newly created MultiIndex level "TIME"
joined_df = df_1.groupby("CATEGORY").agg([min, max])["TIME"]
# Keep only rows where the min is different than the max
joined_df = joined_df[joined_df["min"]!= joined_df["max"]]
# Calculate the time deltas between min and max
# then cast it to a number value of the minutes
joined_df["DURATION"] = (joined_df[ "max"]- joined_df["min"]).astype('timedelta64[m]')
# We rename the columns min and max
joined_df = joined_df.rename(columns={"min":"START TIME", "max":"END TIME"})

非常感谢，这使得事情变得非常简单，只需几个步骤，无需for循环和if条件。是否有方法添加另一列（持续时间以分钟为单位），时间戳的差异？我的方法：“2个相同的数据帧”实际上是一个数据帧，另一个被复制（df_1.copy（））。这样做的目的是从一个人那里得到第一个时间戳，从另一个人那里得到最后一个时间戳。然后，在一个新的数据框中，检查一个类别条目是否存在，若它不存在，那个么用df_1的行填充新的数据框，否则用df_2迭代的当前时间戳替换结束时间。我刚刚编辑了我的答案。我不确定它是否正确地满足了您评论中的需求，因此请让我知道。这完美地回答了我的问题！非常感谢。

# Group the DataFrame by CATEGORY and keep the min and max values
# We also need to get rid of the newly created MultiIndex level "TIME"
joined_df = df_1.groupby("CATEGORY").agg([min, max])["TIME"]
# Keep only rows where the min is different than the max
joined_df = joined_df[joined_df["min"]!= joined_df["max"]]
# Calculate the time deltas between min and max
# then cast it to a number value of the minutes
joined_df["DURATION"] = (joined_df[ "max"]- joined_df["min"]).astype('timedelta64[m]')
# We rename the columns min and max
joined_df = joined_df.rename(columns={"min":"START TIME", "max":"END TIME"})