Python 基于列创建列，列中可能包含Nan值_Python_Pandas_Numpy_Dataframe

Python 基于列创建列，列中可能包含Nan值

python pandas numpy dataframe

Python 基于列创建列，列中可能包含Nan值,python,pandas,numpy,dataframe,Python,Pandas,Numpy,Dataframe,我有3列应该加权和求和。但是，有时这些列中存在Nan值，这会影响正在加权和求和的最后一组列。进一步说明示例df： import numpy as np import pandas as pd f = { 'A': [1, np.nan, 2, np.nan, 5, 6, np.nan], 'B': [np.nan, np.nan, 1, 1, 1, np.nan, 7], 'C': [np.nan, 2, 3, 6, np.nan, 5, np.nan]} fd = pd.DataFrame

我有3列应该加权和求和。但是，有时这些列中存在Nan值，这会影响正在加权和求和的最后一组列。进一步说明示例df：

import numpy as np
import pandas as pd

f = { 'A': [1, np.nan, 2, np.nan, 5, 6, np.nan],
'B': [np.nan, np.nan, 1, 1, 1, np.nan, 7], 
'C': [np.nan, 2, 3, 6, np.nan, 5, np.nan]}
fd = pd.DataFrame(data = f)
fd.head(10)

      A  B   C
0   1.0 NaN NaN
1   NaN NaN 2.0
2   2.0 1.0 3.0
3   NaN 1.0 6.0
4   5.0 1.0 NaN
5   6.0 NaN 5.0
6   NaN 7.0 NaN

此示例演示了列中Nan的所有可能组合。然后我想创建F列，这是a、B和C列的加权和，当它们不是Nan时。这是我的密码：

def scaler(df):
"Scaling and summing"
if (pd.notnull(df['A']) == True & pd.notnull(df['B']) == True & pd.notnull(df['C']) == True):
    return df['A']*0.5+df['B']*0.25+df['C']*0.25
elif (pd.notnull(df['A']) == True & pd.notnull(df['B']) == False & pd.notnull(df['C']) == False):
    return df['A']*1
elif (pd.notnull(df['A']) == True & pd.notnull(df['B']) == True & pd.notnull(df['C']) == False):
    return df['A']*0.75+df['B']*0.25
elif (pd.notnull(df['A']) == True & pd.notnull(df['B']) == False & pd.notnull(df['C']) == True):
    return df['A']*0.75+df['C']*0.25
elif (pd.notnull(df['A']) == False & pd.notnull(df['B']) == True & pd.notnull(df['C']) == True):
    return df['B']*0.5+df['C']*0.5
elif (pd.notnull(df['A']) == False & pd.notnull(df['B']) == True & pd.notnull(df['C']) == False):
    return df['B']*1
else: 
    return df['C']*1

fd['F'] =fd.apply(scaler, axis = 'columns')
fd.head(10)

     A   B   C   F
0   1.0 NaN NaN NaN
1   NaN NaN 2.0 NaN
2   2.0 1.0 3.0 2.0
3   NaN 1.0 6.0 6.0
4   5.0 1.0 NaN NaN
5   6.0 NaN 5.0 5.0
6   NaN 7.0 NaN 7.0

因此，我得到了一个df，其中正确地加权和求和了所有三个非Nan值的列。如果其中一列中至少有一个Nan，我在F列中得到Nan或不正确的结果值

为了克服这个问题，我将原始df中的所有Nan值替换为一些浮点值，这些浮点值超出了所有列的范围，然后给出了上面的代码逻辑。我的问题是：

1）为什么会发生这种情况（尽管包含这些值的列没有直接参与重新调整的公式，但所有Nan值都会围绕结果翻转）

2）我克服这个问题的方法有点马虎。有没有更优雅的解决方案

您误解了

pd.DataFrame.apply

的工作原理。沿着

轴=1

，每行被传递给函数，而不是整个数据帧。相应地命名函数参数很有用
您使用的标量不是函数中的序列，应该使用常规的
和
而不是
&
。还要注意，
pd.isnull
以及
pd.notnull
都存在。因此，您可以重写如下：

def scaler(row): "Scaling and summing" if pd.notnull(row['A']) and pd.notnull(row['B']) and pd.notnull(row['C']): return row['A']*0.5 + row['B']*0.25 + row['C']*0.25 elif pd.notnull(row['A']) and pd.isnull(row['B']) and pd.isnull(row['C']): return row['A'] elif pd.notnull(row['A']) and pd.notnull(row['B']) and pd.isnull(row['C']): return row['A']*0.75 + row['B']*0.25 elif pd.notnull(row['A']) and pd.isnull(row['B']) and pd.notnull(row['C']): return row['A']*0.75 + row['C']*0.25 elif pd.isnull(row['A']) and pd.notnull(row['B']) and pd.notnull(row['C']): return row['B']*0.5 + row['C']*0.5 elif pd.isnull(row['A']) and pd.notnull(row['B']) and pd.isnull(row['C']): return row['B'] else: return row['C'] df['F'] = df.apply(scaler, axis=1)
但这对于大量行来说是低效的。使用
np的解决方案效率更高，可读性更强。选择。这些仅使用矢量化操作。注意，我们只计算一次检查每个序列中的值是否为null a_null = df['A'].isnull() b_null = df['B'].isnull() c_null = df['C'].isnull() conds = [~a_null & b_null & c_null, a_null & ~b_null & c_null, a_null & b_null & ~c_null, ~a_null & ~b_null & c_null, ~a_null & b_null & ~c_null, a_null & ~b_null & ~c_null, ~a_null & ~b_null & ~c_null] choices = [df['A'], df['B'], df['C'], 0.75 * df['A'] + 0.25 * df['B'], 0.75 * df['A'] + 0.25 * df['C'], 0.5 * df['B'] + 0.5 * df['C'], 0.5 * df['A'] + 0.25 * df['B'] + 0.25 * df['C']] df['F'] = np.select(conds, choices) 结果: A B C F 0 1.0 NaN NaN 1.00 1 NaN NaN 2.0 2.00 2 2.0 1.0 3.0 2.00 3 NaN 1.0 6.0 3.50 4 5.0 1.0 NaN 4.00 5 6.0 NaN 5.0 5.75 6 NaN 7.0 NaN 7.00 嗨，欢迎光临。我建议你读一点关于熊猫的介绍，因为你想通过df[“F”]=df.mean（axis=1）很容易获得。这样，命名变量就不需要是原始的。数据帧通常被命名为df @user32185-df。mean 没有加权平均值的规定，这是OPwants@bhaskarc你说得对。无论如何，这些都不是所有可能的组合，因为它们应该是8。有了这些，您就有了所有组合pd.DataFrame（list（itertools.product（[True，False]，repeat=3）） @user32185这是真的：有8种可能的组合，但我只对我提到的那些组合感兴趣。谢谢！这正是我想看到的：优雅且可读！谢谢！