Python 如何找到基于权重的定制平均值,包括熊猫nan值的处理?
我有一个数据帧df_ss_g asPython 如何找到基于权重的定制平均值,包括熊猫nan值的处理?,python,pandas,numpy,dataframe,data-science,Python,Pandas,Numpy,Dataframe,Data Science,我有一个数据帧df_ss_g as ent_id,WA,WB,WC,WD 123,0.045251836,0.614582906,0.225930615,0.559766482 124,0.722324239,0.057781167,,0.123603561 125,,0.361074325,0.768542766,0.080434134 126,0.085781742,0.698045853,0.763116684,0.029084545 127,0.909758657,,0.76099375
ent_id,WA,WB,WC,WD
123,0.045251836,0.614582906,0.225930615,0.559766482
124,0.722324239,0.057781167,,0.123603561
125,,0.361074325,0.768542766,0.080434134
126,0.085781742,0.698045853,0.763116684,0.029084545
127,0.909758657,,0.760993759,0.998406211
128,,0.32961283,,0.90038336
129,0.714585519,,0.671905291,
130,0.151888772,0.279261613,0.641133263,0.188231227
现在我必须计算基于权重的平均值(平均权重),即=(WA*0.5+WB*1+WC*0.5+WD*1)/(0.5+1+0.5+1)
但是当我用下面的方法计算它的时候
df_ss_g['AVG_WEIGHTAGE']= df_ss_g.apply(lambda x:((x['WA']*0.5)+(x['WB']*1)+(x['WC']*0.5)+(x['WD']*1))/(0.5+1+0.5+1) , axis=1)
它输出为,即对于NaN值,它给出的NaN为平均值,权重为null,这是错误的。
我想要的是,在分母和分子中不应该考虑null
e、 g
IIUC:
使用点积尝试此方法-
def av(t):
#Define weights
wt = [0.5, 1, 0.5, 1]
#Create a vector with 0 for null and 1 for non null
nulls = [int(i) for i in ~t.isna()]
#Take elementwise products of the nulls vector with both weights and t.fillna(0)
wt_new = np.dot(nulls, wt)
t_new = np.dot(nulls, t.fillna(0))
#return division
return np.divide(t_new,wt_new)
df['WEIGHTED AVG'] = df.apply(av, axis=1)
df = df.reset_index()
print(df)
它归结为用
0
屏蔽nan
值,这样它们就不会对权重或总和产生影响:
# this is the weights
weights = np.array([0.5,1,0.5,1])
# the columns of interest
s = df.iloc[:,1:]
# where the valid values are
mask = s.notnull()
# use `fillna` and then `@` for matrix multiplication
df['AVG_WEIGHTAGE'] = (s.fillna(0) @ weights) / (mask@weights)
如果您使用
fillna()
并将所有NaN填充为0会怎么样?@user13802115它将不起作用,因为通过使用fillna()它被视为分母…这使得平均值错误当我实现您的逻辑时,我得到了以下错误值错误:int()的文本无效,以10为底:“WA”
def av(t):
#Define weights
wt = [0.5, 1, 0.5, 1]
#Create a vector with 0 for null and 1 for non null
nulls = [int(i) for i in ~t.isna()]
#Take elementwise products of the nulls vector with both weights and t.fillna(0)
wt_new = np.dot(nulls, wt)
t_new = np.dot(nulls, t.fillna(0))
#return division
return np.divide(t_new,wt_new)
df['WEIGHTED AVG'] = df.apply(av, axis=1)
df = df.reset_index()
print(df)
ent_id WA WB WC WD WEIGHTED AVG
0 123 0.045252 0.614583 0.225931 0.559766 0.481844
1 124 0.722324 0.057781 NaN 0.123604 0.361484
2 125 NaN 0.361074 0.768543 0.080434 0.484020
3 126 0.085782 0.698046 0.763117 0.029085 0.525343
4 127 0.909759 NaN 0.760994 0.998406 1.334579
5 128 NaN 0.329613 NaN 0.900383 0.614998
6 129 0.714586 NaN 0.671905 NaN 1.386491
7 130 0.151889 0.279262 0.641133 0.188231 0.420172
# this is the weights
weights = np.array([0.5,1,0.5,1])
# the columns of interest
s = df.iloc[:,1:]
# where the valid values are
mask = s.notnull()
# use `fillna` and then `@` for matrix multiplication
df['AVG_WEIGHTAGE'] = (s.fillna(0) @ weights) / (mask@weights)