Python 计算加权平均值时如何处理Nan值_Python_Pandas

Python 计算加权平均值时如何处理Nan值

python pandas

Python 计算加权平均值时如何处理Nan值,python,pandas,Python,Pandas,我有一个权重系列，如下所示： a 0.2 b 0.3 c 0.5 和数据帧： a b c 1 1 2 2 2 Nan 2 2 3 Nan 1 Nan ... np.ma.average(np.ma.array(df.values, mask=df.isnull().values), weights=s.values, axis=1) 我想计算数据帧的加权平均值，如下所示： (dataframe * weight

我有一个权重系列，如下所示：

a  0.2
b  0.3
c  0.5

和数据帧：

   a    b   c
1  1    2   2
2  Nan  2   2
3  Nan  1   Nan
...

np.ma.average(np.ma.array(df.values, mask=df.isnull().values), 
              weights=s.values, axis=1)

我想计算数据帧的加权平均值，如下所示：

(dataframe * weights).sum(axis=1)

问题是，当dataframe的值为Nan时，我希望相应的权重与其他权重相等。例如，对于第二行，b的权重应为0.4，c的权重应为0.6。对于第三行，b的权重矩阵应为1。

首先，您可以制作调整后的权重矩阵

df2 = dataframe.copy()
df2[pd.notnull(df2)] = 1
df2 = df2 * weight
df2 = df2.multiply(1/df2.sum(axis=1), axis=0)
df2

结果在这个Weights矩阵中

    a   b   c
row             
1   0.2     0.300   0.500
2   NaN     0.375   0.625
3   NaN     1.000   NaN

然后

（df2*数据帧）.sum（轴=1）

生成

row
1    1.8
2    2.0
3    1.0
dtype: float64

这可以通过使用

您可以使用numpy，它是专为该用例设计的。设

为重量系列，而

df

为数据帧：

   a    b   c
1  1    2   2
2  Nan  2   2
3  Nan  1   Nan
...

np.ma.average(np.ma.array(df.values, mask=df.isnull().values), 
              weights=s.values, axis=1)

.data

属性包含以下结果：

array([ 1.8,  2. ,  1. ])

编辑：根据评论中的建议，您可以将结果转换为一个系列：

pd.Series(np.ma.average(np.ma.array(df.values, mask=df.isnull().values), 
                        weights=s.values, axis=1).data, index=df.index)

在第二行，作为

a=Nan

，您做了

weight（a）/2=0.1

，并将其添加到

和

？@ChuckM-year-lynice-shot+1；）谢谢@jezrael:）谢谢你的努力。我接受另一个答案只是因为它是第一个+回答得好。也许可以澄清一下，

df

是

dataframe

和

是

weight

在原始帖子中，我添加了一个使用

dataframe的较短方法。where

感谢您的回答