Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/337.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何在pandas中重复两个或多个值后检查数据是否丢失,并用以前的值替换丢失的值?_Python_Python 2.7_Python 3.x_Pandas_Numpy - Fatal编程技术网

Python 如何在pandas中重复两个或多个值后检查数据是否丢失,并用以前的值替换丢失的值?

Python 如何在pandas中重复两个或多个值后检查数据是否丢失,并用以前的值替换丢失的值?,python,python-2.7,python-3.x,pandas,numpy,Python,Python 2.7,Python 3.x,Pandas,Numpy,我试图用以前的值填充缺少的值,但前提是以前的值重复? 样本DF: Index Columns 0 1978.0 1 1918.0 2 1918.0 3 1918.0 4 NaN 5 NaN 6 NaN 7 1853.0 8 1831.0 9 NaN Index Columns 0 1978.0 1 1918.0 2 1918.0 3 1918.0 4 1918.0 5 191

我试图用以前的值填充缺少的值,但前提是以前的值重复? 样本DF:

Index Columns
0    1978.0
1    1918.0
2    1918.0
3    1918.0
4       NaN
5       NaN
6       NaN
7    1853.0
8    1831.0
9       NaN
Index Columns
0    1978.0
1    1918.0
2    1918.0
3    1918.0
4    1918.0
5    1918.0
6    1918.0
7    1853.0
8    1831.0
9       NaN
Column_Name  : Columns
Total_NaN_count : 4
NaN_values_with_previous_elements_repeating : 3
对于上述数据帧,将索引4,5,6处的NaN替换为1918.0,并将索引8处的NaN保留为NaN

所需输出1:

Index Columns
0    1978.0
1    1918.0
2    1918.0
3    1918.0
4       NaN
5       NaN
6       NaN
7    1853.0
8    1831.0
9       NaN
Index Columns
0    1978.0
1    1918.0
2    1918.0
3    1918.0
4    1918.0
5    1918.0
6    1918.0
7    1853.0
8    1831.0
9       NaN
Column_Name  : Columns
Total_NaN_count : 4
NaN_values_with_previous_elements_repeating : 3
而且,如果我能从所有的NaN值中得到像这样发生的实例数,那就太好了。 ie;示例DF有4个NaN值,其中3个NaN值是这样出现的

所需输出2:

Index Columns
0    1978.0
1    1918.0
2    1918.0
3    1918.0
4       NaN
5       NaN
6       NaN
7    1853.0
8    1831.0
9       NaN
Index Columns
0    1978.0
1    1918.0
2    1918.0
3    1918.0
4    1918.0
5    1918.0
6    1918.0
7    1853.0
8    1831.0
9       NaN
Column_Name  : Columns
Total_NaN_count : 4
NaN_values_with_previous_elements_repeating : 3
请让我知道是否有任何办法得到这个


谢谢

您可以选择带有条件的数据帧,并在该数据帧上单击ffill

cond = df['Columns'].shift(1) == df['Columns'].shift(2)
df.loc[cond] = df.loc[cond].ffill()

    Columns
0   1978.0
1   1918.0
2   1918.0
3   1918.0
4   1918.0
6   1853.0
7   1831.0
8   NaN
更新:这将处理新的测试用例

cond = (df.Columns.shift(1) == df.Columns.shift(2)) | (df.Columns.shift(-2).notnull())
df.loc[:] = df.fillna(df.loc[cond].ffill())
你得到

    Columns
0   1978.0
1   1918.0
2   1918.0
3   1918.0
4   1918.0
5   1918.0
6   1918.0
7   1853.0
8   1831.0
9   NaN

为了提高性能和方便起见,这里提供了一种处理底层阵列数据的简单方法-

# Extract array data which being a view lets us modify the original
# dataframe later on just by modifying it
a = df.Columns.values

# Indices of NaN positions that also have repeating values preceding to them
idx = np.flatnonzero(np.r_[False,False,a[1:-1] == a[:-2]] & np.isnan(a))

# Finally assign previous values for all those places
a[idx] = a[idx-1]

@jeremycg感谢您的编辑,这是我第一次发布:)类似的内容:
x.Columns[(x.Columns.shift(1)=x.Columns.shift(2))&np.isnan(x.Columns)]=x.Columns.shift(1)
@Vaishali和jeremycg,感谢您的回答,但我想填充下一个NaN值,而不仅仅是一个NaN值。我已经修改了示例数据框和输出数据框,以使其清晰明了(对不起,之前应该问清楚)。@Vaishali但如果只有3个NaN值,它会起作用吗?我希望它能够工作,而不考虑NaN值之间的数量。我尝试了一个df,其中包含3个以上的NaN值,并且缺少几个NaN值。我认为所有的熊猫问题都应该有一个numpy标签。。。只是想看看你身上的魔法@Divakar,谢谢你的回答,但我想填充所有下一个NaN值,而不仅仅是一个NaN值。我已经修改了示例数据框和输出数据框,以使其更清晰(对不起,之前应该问清楚)。