Python 通过分组和聚合熊猫中的多个列来创建新列_Python_Pandas_Dataframe

Python 通过分组和聚合熊猫中的多个列来创建新列

python pandas dataframe

Python 通过分组和聚合熊猫中的多个列来创建新列,python,pandas,dataframe,Python,Pandas,Dataframe,我有一个大约50列的数据帧，其中一些是周期、开始时间、id、速度、吞吐量等。数据帧示例： id period_start_time speed_througput ... 0 1 2017-06-14 20:00:00 6 1 1 2017-06-14 20:00:00 10 2 1 2017-06-14 21:00:00 2 3

我有一个大约50列的数据帧，其中一些是周期、开始时间、id、速度、吞吐量等。数据帧示例：

    id     period_start_time         speed_througput    ...
0    1     2017-06-14 20:00:00              6
1    1     2017-06-14 20:00:00              10
2    1     2017-06-14 21:00:00              2
3    1     2017-06-14 21:00:00              5
4    2     2017-06-14 20:00:00              8
5    2     2017-06-14 20:00:00              12
...

我尝试通过分组两列（id和period_start_time）来创建两个新列，并找到速度的平均值和最小值。我尝试过的代码：

df['Throughput_avg']=df.sort_values(['period_start_time'],ascending=False).groupby(['period_start_time','id'])[['speed_trhoughput']].max()
df['Throughput_min'] = df.groupby(['period_start_time', 'id'])[['speed_trhoughput']].min()

正如你所看到的，我试过两种方法，但都不管用。我在两次尝试中都收到了错误消息：

 TypeError:incompatible index of inserted column with frame index

我想你知道我的输出需要什么，所以没有必要发布它

选项1
在

groupby

和

join

中使用

agg

连接到主数据帧

df.join(
    df.groupby(['id', 'period_start_time']).speed_througput.agg(
        ['mean', 'min']
    ).rename(columns={'mean': 'avg'}).add_prefix('Throughput_'),
    on=['id', 'period_start_time']
)

   id    period_start_time  speed_througput  Throughput_avg  Throughput_min
0   1  2017-06-14 20:00:00                6             8.0               6
1   1  2017-06-14 20:00:00               10             8.0               6
2   1  2017-06-14 21:00:00                2             3.5               2
3   1  2017-06-14 21:00:00                5             3.5               2
4   2  2017-06-14 20:00:00                8            10.0               8
5   2  2017-06-14 20:00:00               12            10.0               8

选项2
在

groupby

上下文中使用

transform

，并使用

assign

添加新列

g = df.groupby(['id', 'period_start_time']).speed_througput.transform
df.assign(Throughput_avg=g('mean'), Throughput_min=g('min'))

   id    period_start_time  speed_througput  Throughput_avg  Throughput_min
0   1  2017-06-14 20:00:00                6             8.0               6
1   1  2017-06-14 20:00:00               10             8.0               6
2   1  2017-06-14 21:00:00                2             3.5               2
3   1  2017-06-14 21:00:00                5             3.5               2
4   2  2017-06-14 20:00:00                8            10.0               8
5   2  2017-06-14 20:00:00               12            10.0               8