Python 仅对NaN值应用函数_Python_Pandas_Scikit Learn

Python 仅对NaN值应用函数

python pandas scikit-learn

Python 仅对NaN值应用函数,python,pandas,scikit-learn,Python,Pandas,Scikit Learn,我有一个具有以下结构的数据帧：共有3列A、B、C A = [1,2,5,4,3,1] B = ["yes","No","hello","yes","no", 'why'] C = [1,0,1,'NaN',0,0] test_df = pd.DataFrame({'A': A, 'B': B, 'C': C}) def def_c(inB): if inB.Lower() == 'no

我有一个具有以下结构的数据帧：

共有3列A、B、C

A = [1,2,5,4,3,1]
B = ["yes","No","hello","yes","no", 'why']
C = [1,0,1,'NaN',0,0]
test_df = pd.DataFrame({'A': A, 'B': B, 'C': C})

def def_c(inB):
    if inB.Lower() == 'no':
        cis = 0
    else:
        cis = 1
        
    return cis

一般规则是，如果B等于

no

，则使用定义的函数

cis

等于

。但是，只有当C为NaN时，才会出现这种情况，因为有时会违反此规则并将其视为真

预期的数据帧输出为 A

如果C是NaN，那么迭代数据帧并应用函数的最佳方法是什么。最好使用pandas函数还是使用sklearn的

插补功能？
您可以使用Is
和np。其中
：
# assuming `NaN` is actual NA value
isna = test_df['C'].isna()
# or in the case of string `'NaN` as in sample
# isna = test_df['C'].eq('NaN')

test_df.loc[isna, 'C'] = np.where(test_df.loc[isna, 'B'].str.lower()=='no', 0, 1)

# or
# test_df.loc[isna, 'C'] = [def_c(inB) for inB in test_df.loc[isna, 'B'] ]

输出：
   A      B  C
0  1    yes  1
1  2     No  0
2  5  hello  1
3  4    yes  1
4  3     no  0
5  1    why  0

请发布您预期的输出数据框使用定义的def_c（）函数这可能吗？当我运行该函数，然后再次调用test_df时，这是输出A B c 0 1是1 1 1 2否0 2 5 hello 1 3 4是NaN 4 3否0 5 1为什么0 NaN值仍然是一个AringId您尝试isna=test_df['c']eq（'NaN'），如注释所示？
   A      B  C
0  1    yes  1
1  2     No  0
2  5  hello  1
3  4    yes  1
4  3     no  0
5  1    why  0