Python 熊猫按条件分组和变换,并应用于整个列
我有以下数据帧:Python 熊猫按条件分组和变换,并应用于整个列,python,pandas,Python,Pandas,我有以下数据帧: import pandas as pd df = pd.DataFrame({'Value': [0, 1, 2,3, 4,5,6,7,8,9],'Name': ['John', 'John', 'John','John', 'John','John','John','John','John','John'] ,'City': ['A', 'B', 'A','B', 'A','B','B','A','B','A'],'City2': ['C
import pandas as pd
df = pd.DataFrame({'Value': [0, 1, 2,3, 4,5,6,7,8,9],'Name': ['John', 'John', 'John','John', 'John','John','John','John','John','John']
,'City': ['A', 'B', 'A','B', 'A','B','B','A','B','A'],'City2': ['C', 'D', 'C','D', 'C','D','D','C','D','C']})
df
Value Name City City2
0 0 John A C
1 1 John B D
2 2 John A C
3 3 John B D
4 4 John A C
5 5 John B D
6 6 John B D
7 7 John A C
8 8 John B D
9 9 John A C
我试图在City2
equald'C'时取平均值,但将其应用于整个新列:
我试过:
df['C_Average'] = df[df['City2'] == 'C'].groupby(['Name','City'])['Value'].transform(lambda v: v.nsmallest(11).mean())
df
Value Name City City2 C_Average
0 0 John A C 4.4
1 1 John B D NaN
2 2 John A C 4.4
3 3 John B D NaN
4 4 John A C 4.4
5 5 John B D NaN
6 6 John B D NaN
7 7 John A C 4.4
8 8 John B D NaN
9 9 John A C 4.4
如您所见,添加了新列,但我希望将其应用于整个列,而不仅仅是City2
等于C的行。即整个列显示4.4。有什么想法吗
谢谢 一个技巧是将不匹配的值替换为缺少的值,而不是过滤:
print (df.assign(Value = df['Value'].where(df['City2']== 'C')))
Value Name City City2
0 0.0 John A C
1 NaN John B D
2 2.0 John A C
3 NaN John B D
4 4.0 John A C
5 NaN John B D
6 NaN John B D
7 7.0 John A C
8 NaN John B D
9 9.0 John A C
但样本数据中的问题是,在组John
,B
中没有C
的组,因此得到相同的输出:
df['C_Average'] = (df.assign(Value = df['Value'].where(df['City2']== 'C'))
.groupby(['Name','City'])['Value']
.transform(lambda v: v.nsmallest(11).mean()))
print (df)
Value Name City City2 C_Average
0 0 John A C 4.4
1 1 John B D NaN
2 2 John A C 4.4
3 3 John B D NaN
4 4 John A C 4.4
5 5 John B D NaN
6 6 John B D NaN
7 7 John A C 4.4
8 8 John B D NaN
9 9 John A C 4.4
如果更改数据工作正常:
df = pd.DataFrame({'Value': [0, 1, 2,3, 4,5,6,7,8,9],'Name': ['John', 'John', 'John','John', 'John','John','John','John','John','John']
,'City': ['A', 'B', 'A','B', 'A','B','B','A','B','A'],'City2': ['C', 'C', 'C','D', 'C','D','D','C','D','C']})
@SOK-我试着解释为什么在样本数据中得到NaN-或者需要用组
John,A
的值替换组John,B
?啊,是的,非常感谢!有道理现在给我的例子是有点过哈哈@SOK-Ya,首先我很惊讶为什么不工作太多,但小样本数据,所以没有问题找到这里发生了什么再次感谢,非常感谢@SOK-我认为如果可能的话,升级应该可以解决这个问题
print (df)
Value Name City City2
0 0 John A C
1 1 John B C <- one row for C for group John, B
2 2 John A C
3 3 John B D
4 4 John A C
5 5 John B D
6 6 John B D
7 7 John A C
8 8 John B D
9 9 John A C
df['C_Average'] = (df.assign(Value = df['Value'].where(df['City2']== 'C'))
.groupby(['Name','City'])['Value']
.transform(lambda v: v.nsmallest(11).mean()))
print (df)
Value Name City City2 C_Average
0 0 John A C 4.4
1 1 John B C 1.0
2 2 John A C 4.4
3 3 John B D 1.0
4 4 John A C 4.4
5 5 John B D 1.0
6 6 John B D 1.0
7 7 John A C 4.4
8 8 John B D 1.0
9 9 John A C 4.4
df['C_Average'] = df[df['City2'] == 'C'].groupby(['Name','City'])['Value'].transform(lambda v: v.nsmallest(11).mean())
print (df)
Value Name City City2 C_Average
0 0 John A C 4.4
1 1 John B C 1.0
2 2 John A C 4.4
3 3 John B D NaN
4 4 John A C 4.4
5 5 John B D NaN
6 6 John B D NaN
7 7 John A C 4.4
8 8 John B D NaN
9 9 John A C 4.4