Python 用后续值填充数组
在我的dataframe中,我将得到一个只有很少非nan值的列。我想将非nan值用作前面所有包含nan值的行的分组变量。为了模拟它,我制作了以下数组:Python 用后续值填充数组,python,arrays,python-2.7,pandas,numpy,Python,Arrays,Python 2.7,Pandas,Numpy,在我的dataframe中,我将得到一个只有很少非nan值的列。我想将非nan值用作前面所有包含nan值的行的分组变量。为了模拟它,我制作了以下数组: count = np.array([np.NaN,np.NaN,np.NaN,3,np.NaN,np.NaN,6,np.NaN,np.NaN,np.NaN,np.NaN,np.NaN,12]) count = Series(count) 对于这个数组,我能够创建一个填充函数 def pad_expsamp_time(array): se
count = np.array([np.NaN,np.NaN,np.NaN,3,np.NaN,np.NaN,6,np.NaN,np.NaN,np.NaN,np.NaN,np.NaN,12])
count = Series(count)
对于这个数组,我能够创建一个填充函数
def pad_expsamp_time(array):
sect = np.zeros(array.size) # create array filled with zeros
inds = array.index[array.notnull()] # select the non-zero values
rev_inds = inds[::-1] # sort high to low
# fill array with value until index of value. Repeat for lower values.
for i in rev_inds:
sect[:i] = i
return Series(sect)
当可以假定非nan值的索引等于实际值时,此函数起作用。但是,当索引不等于内容时,如何填充数组?
例如,如果数组计数为:
count = np.array([np.NaN,np.NaN,np.NaN,1,np.NaN,np.NaN,2,np.NaN,np.NaN,np.NaN,np.NaN,np.NaN,3])
并且期望的输出是
count = np.array([1,1,1,1,2,2,2,3,3,3,3,3,3]
阵列末端可能有NAN。我希望这些都是NAN,这样数据帧就会忽略它们
count = np.array([np.NaN,np.NaN,np.NaN,1,np.NaN,np.NaN,2,np.NaN,np.NaN,3,np.NaN,np.NaN])
# Will become:
count = np.array([1,1,1,1,2,2,2,3,3,3,np.nan,np.nan]
这是一种矢量化方法-
# Append False at either sides of NaN mask as we try to find start &
# stop of each NaN interval by looking for rising and falling edges
mask = np.hstack((False,np.isnan(count),False))
start = np.flatnonzero(mask[1:] > mask[:-1])
stop = np.flatnonzero(mask[1:] < mask[:-1])
lens = stop - start
# Account for NaNs if any at the end of input that might throw off stop values
stop = stop.clip(max=count.size-1)
# Assign values
count[mask[1:-1]] = count[stop].repeat(lens)
案例2:
案例3:
IIUC您可以简单地使用以下方法: 您的样本:
In [89]: s = pd.Series(np.array([np.nan,np.nan,np.nan,1,np.nan,np.nan,2,np.nan,np.nan,3,np.nan,np.nan]))
In [90]: s
Out[90]:
0 NaN
1 NaN
2 NaN
3 1.0
4 NaN
5 NaN
6 2.0
7 NaN
8 NaN
9 3.0
10 NaN
11 NaN
dtype: float64
In [91]: s.bfill()
Out[91]:
0 1.0
1 1.0
2 1.0
3 1.0
4 2.0
5 2.0
6 2.0
7 3.0
8 3.0
9 3.0
10 NaN
11 NaN
dtype: float64
Divakar的样品:
In [81]: s = pd.Series(array([ nan, nan, nan, 6., nan, nan, 5., nan, nan, nan, nan, nan, 2.]))
In [82]: s
Out[82]:
0 NaN
1 NaN
2 NaN
3 6.0
4 NaN
5 NaN
6 5.0
7 NaN
8 NaN
9 NaN
10 NaN
11 NaN
12 2.0
dtype: float64
In [83]: s.bfill()
Out[83]:
0 6.0
1 6.0
2 6.0
3 6.0
4 5.0
5 5.0
6 5.0
7 2.0
8 2.0
9 2.0
10 2.0
11 2.0
12 2.0
dtype: float64
In [84]: s = pd.Series(array([ nan, nan, nan, 1., nan, nan, 2., nan, nan, nan, nan, nan, 3.]))
In [85]: s.bfill()
Out[85]:
0 1.0
1 1.0
2 1.0
3 1.0
4 2.0
5 2.0
6 2.0
7 3.0
8 3.0
9 3.0
10 3.0
11 3.0
12 3.0
dtype: float64
In [86]: s = pd.Series(array([ nan, nan, nan, 1., nan, nan, 2., nan, nan, 3., nan, nan]))
In [87]: s.bfill()
Out[87]:
0 1.0
1 1.0
2 1.0
3 1.0
4 2.0
5 2.0
6 2.0
7 3.0
8 3.0
9 3.0
10 NaN
11 NaN
dtype: float64
最后一个元素是否可以是
NaN
?@Divakar是的,它可以被索引。那么,您需要它填充一些东西吗?如果是这样,我们应该用什么来填充它?也许可以添加一个示例案例?太棒了!非常感谢你@Divakar,依我看,这是相当标准的熊猫操作,所以那里没有魔法;)哇!真正地所有的麻烦都是徒劳的哈哈。要是我早知道就好了。现在我也可以停止编写正向填充函数了。。。
In [114]: count
Out[114]:
array([ nan, nan, nan, 1., nan, nan, 2., nan, nan, 3., nan,
nan])
In [115]: # Listed code ...
In [116]: count
Out[116]:
array([ 1., 1., 1., 1., 2., 2., 2., 3., 3., 3., nan,
nan])
In [89]: s = pd.Series(np.array([np.nan,np.nan,np.nan,1,np.nan,np.nan,2,np.nan,np.nan,3,np.nan,np.nan]))
In [90]: s
Out[90]:
0 NaN
1 NaN
2 NaN
3 1.0
4 NaN
5 NaN
6 2.0
7 NaN
8 NaN
9 3.0
10 NaN
11 NaN
dtype: float64
In [91]: s.bfill()
Out[91]:
0 1.0
1 1.0
2 1.0
3 1.0
4 2.0
5 2.0
6 2.0
7 3.0
8 3.0
9 3.0
10 NaN
11 NaN
dtype: float64
In [81]: s = pd.Series(array([ nan, nan, nan, 6., nan, nan, 5., nan, nan, nan, nan, nan, 2.]))
In [82]: s
Out[82]:
0 NaN
1 NaN
2 NaN
3 6.0
4 NaN
5 NaN
6 5.0
7 NaN
8 NaN
9 NaN
10 NaN
11 NaN
12 2.0
dtype: float64
In [83]: s.bfill()
Out[83]:
0 6.0
1 6.0
2 6.0
3 6.0
4 5.0
5 5.0
6 5.0
7 2.0
8 2.0
9 2.0
10 2.0
11 2.0
12 2.0
dtype: float64
In [84]: s = pd.Series(array([ nan, nan, nan, 1., nan, nan, 2., nan, nan, nan, nan, nan, 3.]))
In [85]: s.bfill()
Out[85]:
0 1.0
1 1.0
2 1.0
3 1.0
4 2.0
5 2.0
6 2.0
7 3.0
8 3.0
9 3.0
10 3.0
11 3.0
12 3.0
dtype: float64
In [86]: s = pd.Series(array([ nan, nan, nan, 1., nan, nan, 2., nan, nan, 3., nan, nan]))
In [87]: s.bfill()
Out[87]:
0 1.0
1 1.0
2 1.0
3 1.0
4 2.0
5 2.0
6 2.0
7 3.0
8 3.0
9 3.0
10 NaN
11 NaN
dtype: float64