Python 通过中的函数替换NaN时索引超出范围_Python_Function_Pandas_Indexoutofboundsexception_Nan

Python 通过中的函数替换NaN时索引超出范围

python function pandas

Python 通过中的函数替换NaN时索引超出范围,python,function,pandas,indexoutofboundsexception,nan,Python,Function,Pandas,Indexoutofboundsexception,Nan,我已经创建了一个函数，用相应列的方式替换Pandas数据帧中的NAN。我用一个小的数据帧测试了这个函数，它成功了。当我将它应用到更大的数据帧（30000行，9列）时，我得到了错误消息：IndexError:index out-bounds 功能如下： # The 'update' function will replace all the NaNs in a dataframe with the mean of the respective columns def update(df):

我已经创建了一个函数，用相应列的方式替换Pandas数据帧中的NAN。我用一个小的数据帧测试了这个函数，它成功了。当我将它应用到更大的数据帧（30000行，9列）时，我得到了错误消息：IndexError:index out-bounds

功能如下：

# The 'update' function will replace all the NaNs in a dataframe with the mean of the respective columns

def update(df):   # the function takes one argument, the dataframe that will be updated
      ncol = df.shape[1]  # number of columns in the dataframe
      for i in range(0 , ncol):  # loops over all the columns
             df.iloc[:,i][df.isnull().iloc[:, i]]=df.mean()[i]  # subsets the df using the isnull() method, extracting the positions
                                                        # in each column where the 
      return(df)

我用来测试函数的小数据帧如下所示：

     0   1   2  3
0   NaN NaN  3  4
1   NaN NaN  7  8
2   9.0 10.0 11 12

你能解释一下错误吗？非常感谢您的建议。

我会将此方法与以下方法结合使用：

平均值：

In [138]: df.mean()
Out[138]:
0     9.0
1    10.0
2     7.0
3     8.0
dtype: float64

得到“索引越界”的原因是，当

是假定为有序位置的一次迭代时，您正在分配值

df.mean（）[i]

df.mean（）

是一个

系列

，其索引是

df

的列

df.mean（）。但事实并非如此，这就是为什么你会犯错误
您的代码。。。固定的
def update(df):   # the function takes one argument, the dataframe that will be updated
      ncol = df.shape[1]  # number of columns in the dataframe
      for i in range(0 , ncol):  # loops over all the columns
             df.iloc[:,i][df.isnull().iloc[:, i]]=df.mean().iloc[i]  # subsets the df using the isnull() method, extracting the positions
                                                        # in each column where the 
      return(df)

df.combine_first(pd.DataFrame([df.mean()], df.index))

pd.DataFrame(
    np.where(
        df.notnull(), df.values,
        np.nanmean(df.values, 0, keepdims=1)),
    df.index, df.columns)

此外，您的函数正在直接更改df
。你可能要小心点。我不确定那是你想要的

说了这么多。我推荐另一种方法
def update(df):
    return df.where(df.notnull(), df.mean(), axis=1)


您可以使用任意数量的方法来用平均值填充缺失。我建议使用@MaxU的答案
df.where


当第一个参数为True时，取df
，否则取第二个参数
df.where(df.notnull(), df.mean(), axis=1)

df.首先将_
与笨拙的熊猫
广播
def update(df):   # the function takes one argument, the dataframe that will be updated
      ncol = df.shape[1]  # number of columns in the dataframe
      for i in range(0 , ncol):  # loops over all the columns
             df.iloc[:,i][df.isnull().iloc[:, i]]=df.mean().iloc[i]  # subsets the df using the isnull() method, extracting the positions
                                                        # in each column where the 
      return(df)

df.combine_first(pd.DataFrame([df.mean()], df.index))

pd.DataFrame(
    np.where(
        df.notnull(), df.values,
        np.nanmean(df.values, 0, keepdims=1)),
    df.index, df.columns)

np.其中
def update(df):   # the function takes one argument, the dataframe that will be updated
      ncol = df.shape[1]  # number of columns in the dataframe
      for i in range(0 , ncol):  # loops over all the columns
             df.iloc[:,i][df.isnull().iloc[:, i]]=df.mean().iloc[i]  # subsets the df using the isnull() method, extracting the positions
                                                        # in each column where the 
      return(df)

df.combine_first(pd.DataFrame([df.mean()], df.index))

pd.DataFrame(
    np.where(
        df.notnull(), df.values,
        np.nanmean(df.values, 0, keepdims=1)),
    df.index, df.columns)

我按照您的建议更改了函数中的代码，但仍然得到一个错误：indexer错误：单位置索引器超出范围我使用sampledf
运行了准确的代码，然后它运行了。我用copywarning设置了，但它运行了。是的，我理解。事实上，正如我在文章中提到的，在测试数据帧中运行校正之前，原始函数就已经存在了。但是，它在目标数据帧上失败，可以在这里找到：。。你能解释一下吗？