是否有python函数用于自动填充numeric&;的空值;对象列?

是否有python函数用于自动填充numeric&;的空值;对象列?,python,pandas,dataframe,Python,Pandas,Dataframe,使用python为数据帧中的分类列和数字列自动填充空值的有效方法是什么 到目前为止,我使用下面的函数为分类列和数字列填充空值 样本数据: import pandas as pd import numpy as np dictdf = {"Num1": [111, 222, 444, np.nan, 666, 222], "Obj1": ["a", np.nan, "b", "c&qu

使用python为数据帧中的分类列和数字列自动填充空值的有效方法是什么

到目前为止,我使用下面的函数为分类列和数字列填充空值

样本数据:

import pandas as pd
import numpy as np


dictdf = {"Num1": [111, 222, 444, np.nan, 666, 222],
          "Obj1": ["a", np.nan, "b", "c", "d", np.nan],
          "Num2": [np.nan, 247, 321, np.nan, 654, 212],
          "Obj2": ["cdb", np.nan, np.nan, "kbc", "kdd", "np"]}

df = pd.DataFrame(dictdf)

df
输出:

    Num1    Obj1    Num2    Obj2
0   111.0   a      -999.0   cdb
1   222.0   Missing 247.0   Missing
2   444.0   b       321.0   Missing
3   -999.0  c      -999.0   kbc
4   666.0   d       654.0   kdd
5   222.0   Missing 212.0   np

我正在寻找一种更有效、更好的方法来处理样本或大型数据集。如有任何建议或参考,我们将不胜感激。

您可以为替换创建所有列的字典:

d1 = dict.fromkeys(df.select_dtypes(np.number).columns, -999)
d2 = dict.fromkeys(df.columns.difference(d1.keys()), 'Missing')
#merge both dicts
d = {**d1, **d2}

df = df.fillna(d)
#or with one dict
#df = df.fillna(d1).fillna('Missing')
print (df)
    Num1     Obj1   Num2     Obj2
0  111.0        a -999.0      cdb
1  222.0  Missing  247.0  Missing
2  444.0        b  321.0  Missing
3 -999.0        c -999.0      kbc
4  666.0        d  654.0      kdd
5  222.0  Missing  212.0       np
此外,如果要仅测试缺少值的列,请执行以下操作:

df1 = df.loc[:, df.isna().sum().gt(0)]
d1 = dict.fromkeys(df1.select_dtypes(np.number).columns, -999)
d2 = dict.fromkeys(df1.columns.difference(d1.keys()), 'Missing')
d = {**d1, **d2}

df = df.fillna(d)

与您的解决方案类似的想法:

c = df.select_dtypes(np.number).columns

df[c] = df[c].fillna(-999)
df = df.fillna('Missing')
print (df)
    Num1     Obj1   Num2     Obj2
0  111.0        a -999.0      cdb
1  222.0  Missing  247.0  Missing
2  444.0        b  321.0  Missing
3 -999.0        c -999.0      kbc
4  666.0        d  654.0      kdd
5  222.0  Missing  212.0       np

请为每列尝试使用DataFrame fillna函数

df['Num1'] = df['Num1'].fillna(-999)
df['Num2'] = df['Num2'].fillna(-999)
df['Obj1'] = df['Obj1'].fillna("Missing")
df['Obj2'] = df['Obj2'].fillna("Missing")

首先,快速、简单、准确地回答。。。!!令人惊讶的是,如果我有100列呢??
c = df.select_dtypes(np.number).columns

df[c] = df[c].fillna(-999)
df = df.fillna('Missing')
print (df)
    Num1     Obj1   Num2     Obj2
0  111.0        a -999.0      cdb
1  222.0  Missing  247.0  Missing
2  444.0        b  321.0  Missing
3 -999.0        c -999.0      kbc
4  666.0        d  654.0      kdd
5  222.0  Missing  212.0       np
df['Num1'] = df['Num1'].fillna(-999)
df['Num2'] = df['Num2'].fillna(-999)
df['Obj1'] = df['Obj1'].fillna("Missing")
df['Obj2'] = df['Obj2'].fillna("Missing")