Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/325.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 熊猫按条件分组和变换,并应用于整个列_Python_Pandas - Fatal编程技术网

Python 熊猫按条件分组和变换,并应用于整个列

Python 熊猫按条件分组和变换,并应用于整个列,python,pandas,Python,Pandas,我有以下数据帧: import pandas as pd df = pd.DataFrame({'Value': [0, 1, 2,3, 4,5,6,7,8,9],'Name': ['John', 'John', 'John','John', 'John','John','John','John','John','John'] ,'City': ['A', 'B', 'A','B', 'A','B','B','A','B','A'],'City2': ['C

我有以下数据帧:

import pandas as pd

df = pd.DataFrame({'Value': [0, 1, 2,3, 4,5,6,7,8,9],'Name': ['John', 'John', 'John','John', 'John','John','John','John','John','John']
                  ,'City': ['A', 'B', 'A','B', 'A','B','B','A','B','A'],'City2': ['C', 'D', 'C','D', 'C','D','D','C','D','C']})
df



     Value  Name  City  City2
    0   0   John    A   C
    1   1   John    B   D
    2   2   John    A   C
    3   3   John    B   D
    4   4   John    A   C
    5   5   John    B   D
    6   6   John    B   D
    7   7   John    A   C
    8   8   John    B   D
    9   9   John    A   C
我试图在
City2
equald'C'时取平均值,但将其应用于整个新列:

我试过:

df['C_Average'] = df[df['City2'] == 'C'].groupby(['Name','City'])['Value'].transform(lambda v: v.nsmallest(11).mean())
df
     Value  Name  City City2 C_Average
    0   0   John    A   C   4.4
    1   1   John    B   D   NaN
    2   2   John    A   C   4.4
    3   3   John    B   D   NaN
    4   4   John    A   C   4.4
    5   5   John    B   D   NaN
    6   6   John    B   D   NaN
    7   7   John    A   C   4.4
    8   8   John    B   D   NaN
    9   9   John    A   C   4.4
如您所见,添加了新列,但我希望将其应用于整个列,而不仅仅是
City2
等于C的行。即整个列显示4.4。有什么想法吗


谢谢

一个技巧是将不匹配的值替换为缺少的值,而不是过滤:

print (df.assign(Value = df['Value'].where(df['City2']== 'C')))
   Value  Name City City2
0    0.0  John    A     C
1    NaN  John    B     D
2    2.0  John    A     C
3    NaN  John    B     D
4    4.0  John    A     C
5    NaN  John    B     D
6    NaN  John    B     D
7    7.0  John    A     C
8    NaN  John    B     D
9    9.0  John    A     C
但样本数据中的问题是,在组
John
B
中没有
C
的组,因此得到相同的输出:

df['C_Average'] = (df.assign(Value = df['Value'].where(df['City2']== 'C'))
                     .groupby(['Name','City'])['Value']
                     .transform(lambda v: v.nsmallest(11).mean()))

print (df)
   Value  Name City City2  C_Average
0      0  John    A     C        4.4
1      1  John    B     D        NaN
2      2  John    A     C        4.4
3      3  John    B     D        NaN
4      4  John    A     C        4.4
5      5  John    B     D        NaN
6      6  John    B     D        NaN
7      7  John    A     C        4.4
8      8  John    B     D        NaN
9      9  John    A     C        4.4
如果更改数据工作正常:

df = pd.DataFrame({'Value': [0, 1, 2,3, 4,5,6,7,8,9],'Name': ['John', 'John', 'John','John', 'John','John','John','John','John','John']
                  ,'City': ['A', 'B', 'A','B', 'A','B','B','A','B','A'],'City2': ['C', 'C', 'C','D', 'C','D','D','C','D','C']})


@SOK-我试着解释为什么在样本数据中得到NaN-或者需要用组
John,A
的值替换组
John,B
?啊,是的,非常感谢!有道理现在给我的例子是有点过哈哈@SOK-Ya,首先我很惊讶为什么不工作太多,但小样本数据,所以没有问题找到这里发生了什么再次感谢,非常感谢@SOK-我认为如果可能的话,升级应该可以解决这个问题
print (df)
   Value  Name City City2
0      0  John    A     C
1      1  John    B     C <- one row for C for group John, B
2      2  John    A     C
3      3  John    B     D
4      4  John    A     C
5      5  John    B     D
6      6  John    B     D
7      7  John    A     C
8      8  John    B     D
9      9  John    A     C

df['C_Average'] = (df.assign(Value = df['Value'].where(df['City2']== 'C'))
                     .groupby(['Name','City'])['Value']
                     .transform(lambda v: v.nsmallest(11).mean()))

print (df)
   Value  Name City City2  C_Average
0      0  John    A     C        4.4
1      1  John    B     C        1.0
2      2  John    A     C        4.4
3      3  John    B     D        1.0
4      4  John    A     C        4.4
5      5  John    B     D        1.0
6      6  John    B     D        1.0
7      7  John    A     C        4.4
8      8  John    B     D        1.0
9      9  John    A     C        4.4
df['C_Average'] = df[df['City2'] == 'C'].groupby(['Name','City'])['Value'].transform(lambda v: v.nsmallest(11).mean())

print (df)
   Value  Name City City2  C_Average
0      0  John    A     C        4.4
1      1  John    B     C        1.0
2      2  John    A     C        4.4
3      3  John    B     D        NaN
4      4  John    A     C        4.4
5      5  John    B     D        NaN
6      6  John    B     D        NaN
7      7  John    A     C        4.4
8      8  John    B     D        NaN
9      9  John    A     C        4.4