Python 缺少数据，在Pandas中插入行并用NAN填充_Python_Numpy_Pandas

Python 缺少数据，在Pandas中插入行并用NAN填充

python numpy pandas

Python 缺少数据，在Pandas中插入行并用NAN填充,python,numpy,pandas,Python,Numpy,Pandas,我不熟悉Python和Pandas，所以可能有一个简单的解决方案，我看不到我有许多不连续的数据集，如下所示： ind A B C 0 0.0 1 3 1 0.5 4 2 2 1.0 6 1 3 3.5 2 0 4 4.0 4 5 5 4.5 3 3 我现在寻找一种解决方案，以获得以下结果： ind A B C 0 0.0 1 3 1 0.5 4 2 2 1.0 6

我不熟悉Python和Pandas，所以可能有一个简单的解决方案，我看不到

我有许多不连续的数据集，如下所示：

ind A    B  C  
0   0.0  1  3  
1   0.5  4  2  
2   1.0  6  1  
3   3.5  2  0  
4   4.0  4  5  
5   4.5  3  3

我现在寻找一种解决方案，以获得以下结果：

ind A    B  C  
0   0.0  1  3  
1   0.5  4  2  
2   1.0  6  1  
3   1.5  NAN NAN  
4   2.0  NAN NAN  
5   2.5  NAN NAN  
6   3.0  NAN NAN  
7   3.5  2  0  
8   4.0  4  5  
9   4.5  3  3

问题是，A中的间距因数据集的位置和长度而异…

在这种情况下，我将用新生成的数据帧覆盖A列，并将其合并到原始df中，然后使用它：

    In [177]:

df.merge(how='right', on='A', right = pd.DataFrame({'A':np.arange(df.iloc[0]['A'], df.iloc[-1]['A'] + 0.5, 0.5)})).sort(columns='A').reset_index().drop(['index'], axis=1)
Out[177]:
     A   B   C
0  0.0   1   3
1  0.5   4   2
2  1.0   6   1
3  1.5 NaN NaN
4  2.0 NaN NaN
5  2.5 NaN NaN
6  3.0 NaN NaN
7  3.5   2   0
8  4.0   4   5
9  4.5   3   3

因此，在一般情况下，您可以调整

arange

函数，该函数采用开始值和结束值，注意，当范围打开或关闭时，我在结束处添加了0.5，并传递一个步长值

更一般的方法可以是这样的：

In [197]:

df = df.set_index(keys='A', drop=False).reindex(np.arange(df.iloc[0]['A'], df.iloc[-1]['A'] + 0.5, 0.5))
df.reset_index(inplace=True) 
df['A'] = df['index']
df.drop(['A'], axis=1, inplace=True)
df.reset_index().drop(['level_0'], axis=1)
Out[197]:
   index   B   C
0    0.0   1   3
1    0.5   4   2
2    1.0   6   1
3    1.5 NaN NaN
4    2.0 NaN NaN
5    2.5 NaN NaN
6    3.0 NaN NaN
7    3.5   2   0
8    4.0   4   5
9    4.5   3   3

在这里，我们将索引设置为列

，但不要将其删除，然后使用

arange

函数重新为df编制索引。

set\u index

和

reset\u index

是您的朋友

df = DataFrame({"A":[0,0.5,1.0,3.5,4.0,4.5], "B":[1,4,6,2,4,3], "C":[3,2,1,0,5,3]})

首先将列A移动到索引：

In [64]: df.set_index("A")
Out[64]: 
     B  C
 A        
0.0  1  3
0.5  4  2
1.0  6  1
3.5  2  0
4.0  4  5
4.5  3  3

然后用一个新的索引重新编制索引，这里缺失的数据用NAN填充。我们使用

索引

对象，因为我们可以命名它；这将在下一步中使用

In [66]: new_index = Index(arange(0,5,0.5), name="A")
In [67]: df.set_index("A").reindex(new_index)
Out[67]: 
      B   C
0.0   1   3
0.5   4   2
1.0   6   1
1.5 NaN NaN
2.0 NaN NaN
2.5 NaN NaN
3.0 NaN NaN
3.5   2   0
4.0   4   5
4.5   3   3

最后，使用

reset\u index

将索引移回列。因为我们命名了索引，所以它的工作非常神奇：

In [69]: df.set_index("A").reindex(new_index).reset_index()
Out[69]: 
       A   B   C
0    0.0   1   3
1    0.5   4   2
2    1.0   6   1
3    1.5 NaN NaN
4    2.0 NaN NaN
5    2.5 NaN NaN
6    3.0 NaN NaN
7    3.5   2   0
8    4.0   4   5
9    4.5   3   3

使用上面EdChum的答案，我创建了以下函数

def fill_missing_range(df, field, range_from, range_to, range_step=1, fill_with=0):
    return df\
      .merge(how='right', on=field,
            right = pd.DataFrame({field:np.arange(range_from, range_to, range_step)}))\
      .sort_values(by=field).reset_index().fillna(fill_with).drop(['index'], axis=1)

用法示例：

fill_missing_range(df, 'A', 0.0, 4.5, 0.5, np.nan)

这个问题很久以前就被问到了，但我有一个简单的解决方案值得一提。你可以简单地使用NumPy的NaN。例如：

import numpy as np
df[i,j] = np.NaN

会成功的。

欢迎来到stackoverflow。请确保向其他用户展示您的代码（工作），以便他们能够很好地理解您的问题并进行调试