Pandas 使用groupby值创建新列_Pandas_Group By_Multiple Columns

Pandas 使用groupby值创建新列

pandas

Pandas 使用groupby值创建新列,pandas,group-by,multiple-columns,Pandas,Group By,Multiple Columns,我有一个DF： Col1 Col2 Label 0 0 5345 1 0 7574 2 0 3445 0 1 2126 1 1 4653 2 1 9566 因此，我尝试按Col1和Col2分组，以获得基于标签列的索引值，如下所示： df_gb = df.groupby(['Col1','Col2'])['Label'].agg(['sum'

我有一个DF：

Col1   Col2    Label
0      0        5345
1      0        7574
2      0        3445
0      1        2126
1      1        4653
2      1        9566

因此，我尝试按Col1和Col2分组，以获得基于标签列的索引值，如下所示：

df_gb = df.groupby(['Col1','Col2'])['Label'].agg(['sum', 'count']) 
df_gb['sum_count'] = df_gb['sum'] / df_gb['count']
sum_count_total = df_gb['sum_count'].sum() 
index = df_gb['sum_count'] / 10 

Col2  Col1       
0     0          2.996036
      1          3.030063
      2          3.038579

1     0          2.925314
      1          2.951295
      2          2.956083

2     0          2.875549
      1          2.899254
      2          2.905063

到目前为止，一切都如我所料。但是现在我想根据这两个groupby列将这个'index'groupby df分配给我的原始'df'。若它只是一列，那个么它将使用map（）函数，但若我想根据两列的顺序分配索引值，那个么它就不起作用了

df_index = df.copy()
df_index['index'] = df.groupby([]).apply(index)
TypeError: 'Series' objects are mutable, thus they cannot be hashed

尝试使用agg（）和transform（）但未成功。有什么办法吗

提前谢谢。人力资源部。

我相信您需要：

或：

如果

Label

列中没有

NaN

s，请使用建议中的解决方案，谢谢：

df.groupby(['Col1','Col2'])['Label'].transform('mean') / 10

如果需要只计算非

NaN

s值，则使用

transform

和join的解决方案可以很好地工作。我们也将使用GroupBy.transform（）进行尝试。谢谢你，朋友！：）是的，第二种解决方案应该更快。不客气！第二个可以是

df.groupby（['Col1'，'Col2']）['Label'].transform（'mean'）/10

？@Zero-谢谢，添加到答案中。

df['new']=df.groupby(['Col1','Col2'])['Label'].transform(lambda x: x.sum() / x.count()) / 10
print (df)
   Col1  Col2  Label    new
0     0     0   5345  534.5
1     1     0   7574  757.4
2     2     0   3445  344.5
3     0     1   2126  212.6
4     1     1   4653  465.3
5     2     1   9566  956.6

df.groupby(['Col1','Col2'])['Label'].transform('mean') / 10