Python 零填充数据帧前向填充
我试图用零填充数据帧,但是我不想接触前导的NaN:Python 零填充数据帧前向填充,python,pandas,dataframe,Python,Pandas,Dataframe,我试图用零填充数据帧,但是我不想接触前导的NaN: rng = pd.date_range('2016-06-01', periods=9, freq='D') df = pd.DataFrame({'data': pd.Series([np.nan]*3 + [20, 30, 40] + [np.nan]*3, rng)}) 2016-06-01 NaN 2016-06-02 NaN 2016-06-03 NaN 2016-06-04 20.0 2016-06-
rng = pd.date_range('2016-06-01', periods=9, freq='D')
df = pd.DataFrame({'data': pd.Series([np.nan]*3 + [20, 30, 40] + [np.nan]*3, rng)})
2016-06-01 NaN
2016-06-02 NaN
2016-06-03 NaN
2016-06-04 20.0
2016-06-05 30.0
2016-06-06 40.0
2016-06-07 NaN
2016-06-08 NaN
2016-06-09 NaN
填充/更换后我想要的df如下:
pd.DataFrame({'data': pd.Series([np.nan]*3 + [20, 30, 40] + [0.]*3, rng)})
2016-06-01 NaN
2016-06-02 NaN
2016-06-03 NaN
2016-06-04 20.0
2016-06-05 30.0
2016-06-06 40.0
2016-06-07 0.0
2016-06-08 0.0
2016-06-09 0.0
由于fillna()
只允许值或方法,并且fillna(0)
替换所有的NaN,包括前导的,所以我希望替换可以跳到这里,但是
df.replace([np.nan], 0, method='ffill')
还将替换所有NAN
如何仅在第一个非NaN值之后,使用多个数据列将填充值归零?您可以使用函数:
In [80]: df
Out[80]:
data data1 data2
2016-06-01 NaN NaN NaN
2016-06-02 NaN NaN 10.0
2016-06-03 NaN 20.0 20.0
2016-06-04 20.0 30.0 20.0
2016-06-05 NaN 40.0 NaN
2016-06-06 40.0 30.0 40.0
2016-06-07 NaN NaN NaN
2016-06-08 NaN NaN NaN
2016-06-09 NaN NaN NaN
In [81]: %paste
first_valid_idx = df.apply(lambda x: x.first_valid_index()).to_frame()
df = df.fillna(0)
for ix, r in first_valid_idx.iterrows():
df.loc[df.index < r[0], ix] = np.nan
## -- End pasted text --
In [82]: df
Out[82]:
data data1 data2
2016-06-01 NaN NaN NaN
2016-06-02 NaN NaN 10.0
2016-06-03 NaN 20.0 20.0
2016-06-04 20.0 30.0 20.0
2016-06-05 0.0 40.0 0.0
2016-06-06 40.0 30.0 40.0
2016-06-07 0.0 0.0 0.0
2016-06-08 0.0 0.0 0.0
2016-06-09 0.0 0.0 0.0
In [83]: first_valid_idx
Out[83]:
0
data 2016-06-04
data1 2016-06-03
data2 2016-06-02
我认为您可以首先通过with找到
组中的NaN
,然后找到所有其他值:
print (df.data.notnull().cumsum())
2016-06-01 0
2016-06-02 0
2016-06-03 0
2016-06-04 1
2016-06-05 2
2016-06-06 3
2016-06-07 3
2016-06-08 3
2016-06-09 3
Freq: D, Name: data, dtype: int32
print (df.data.mask(df.data.notnull().cumsum() != 0, df.data.fillna(0)))
2016-06-01 NaN
2016-06-02 NaN
2016-06-03 NaN
2016-06-04 20.0
2016-06-05 30.0
2016-06-06 40.0
2016-06-07 0.0
2016-06-08 0.0
2016-06-09 0.0
Freq: D, Name: data, dtype: float64
编辑:
使用多个列也很好:
df = pd.DataFrame({'data': pd.Series([np.nan]*3 + [20, 30, 40] + [np.nan]*3, rng),
'data1': pd.Series([np.nan]*2 + [20, 30, 40,30] + [np.nan]*3, rng),
'data2': pd.Series([np.nan]*1 + [10,20, 20, 30, 40] + [np.nan]*3, rng)})
print (df.mask(df.notnull().cumsum() != 0, df.fillna(0)))
data data1 data2
2016-06-01 NaN NaN NaN
2016-06-02 NaN NaN 10.0
2016-06-03 NaN 20.0 20.0
2016-06-04 20.0 30.0 20.0
2016-06-05 30.0 40.0 30.0
2016-06-06 40.0 30.0 40.0
2016-06-07 0.0 0.0 0.0
2016-06-08 0.0 0.0 0.0
2016-06-09 0.0 0.0 0.0
通过注释编辑2-更好的用法是:
你能更精确地定义“前导Nan
s”吗?我假设在到达第一个非Nan
数据点之前,它是一列中所有的Nan
s吗?因此,例如,如果我们修改了您的示例,使第一行包含10
,您可能希望替换所有NaN
s?在我的实际数据中,我有不止一列(实际上是数百列)。你能想出一种填充多个列的方法吗?@MaxU如果在同一列中有多个NaN值的间隙,该解决方案将不起作用,因为只有最后一组NaN值将被0填充。@joris,这是一个很好的观点,谢谢!我注意到了这一点,并且已经找到了另一种方法-我现在正在研究多列解决方案,给我几分钟时间…@MaxU这可能是你对多列的解决方案:def zerofill(s):s.ix[s.index>s.last_valid_index()]=0返回s df_full.apply(zerofill)@tom101,是的,类似的东西。但是我认为jezrael和DSM的解决方案要好得多-df.mask(df.notnull().cummax(),df.fillna(0))
而不是使用df.notnull().cumsum()0
,您可以使用df.notnull().cummax()
,我想。谢谢您的建议,我将其添加到答案中。这是一个非常好的建议!
df = pd.DataFrame({'data': pd.Series([np.nan]*3 + [20, 30, 40] + [np.nan]*3, rng),
'data1': pd.Series([np.nan]*2 + [20, 30, 40,30] + [np.nan]*3, rng),
'data2': pd.Series([np.nan]*1 + [10,20, 20, 30, 40] + [np.nan]*3, rng)})
print (df.mask(df.notnull().cumsum() != 0, df.fillna(0)))
data data1 data2
2016-06-01 NaN NaN NaN
2016-06-02 NaN NaN 10.0
2016-06-03 NaN 20.0 20.0
2016-06-04 20.0 30.0 20.0
2016-06-05 30.0 40.0 30.0
2016-06-06 40.0 30.0 40.0
2016-06-07 0.0 0.0 0.0
2016-06-08 0.0 0.0 0.0
2016-06-09 0.0 0.0 0.0
print (df.mask(df.notnull().cummax(), df.fillna(0)))
data data1 data2
2016-06-01 NaN NaN NaN
2016-06-02 NaN NaN 10.0
2016-06-03 NaN 20.0 20.0
2016-06-04 20.0 30.0 20.0
2016-06-05 30.0 40.0 30.0
2016-06-06 40.0 30.0 40.0
2016-06-07 0.0 0.0 0.0
2016-06-08 0.0 0.0 0.0
2016-06-09 0.0 0.0 0.0