Python 如何选择处理潜在np.nan值的后续numpy数组
我有这样一个系列:Python 如何选择处理潜在np.nan值的后续numpy数组,python,arrays,pandas,numpy,Python,Arrays,Pandas,Numpy,我有这样一个系列: s = pd.Series({10: np.array([[0.72260683, 0.27739317, 0. ], [0.7187053 , 0.2812947 , 0. ], [0.71435467, 0.28564533, 1. ], [0.3268072 , 0.6731928
s = pd.Series({10: np.array([[0.72260683, 0.27739317, 0. ],
[0.7187053 , 0.2812947 , 0. ],
[0.71435467, 0.28564533, 1. ],
[0.3268072 , 0.6731928 , 0. ],
[0.31941951, 0.68058049, 1. ],
[0.31260015, 0.68739985, 0. ]]),
20: np.array([[0.7022099 , 0.2977901 , 0. ],
[0.6983866 , 0.3016134 , 0. ],
[0.69411673, 0.30588327, 1. ],
[0.33857735, 0.66142265, 0. ],
[0.33244109, 0.66755891, 1. ],
[0.32675582, 0.67324418, 0. ]]),
20: np.array([[0.68811957, 0.34188043, 0. ],
[0.68425783, 0.31574217, 0. ],
[0.67994496, 0.32005504, 1. ],
[0.34872593, 0.66127407, 1. ],
[0.34276171, 0.65723829, 1. ],
[0.33722803, 0.66277197, 0. ]]),
38: np.array([[0.68811957, 0.31188043, 0. ],
[0.68425783, 0.31574217, 0. ],
[0.67994496, 0.32005504, 1. ],
[0.34872593, 0.65127407, 0. ],
[0.34276171, 0.65723829, 1. ],
[0.33722803, 0.66277197, 0. ]]),
np.nan: np.nan}
)
我想用np.array([1,4,1,5])
或np.array([1,4,1,np.nan])
将其子集,返回np.nan
,无论数组的最后一个元素上的值是什么。我怎样才能做到这一点
请注意,我不能简单地删除一个系列的最后一个元素。您可以通过删除系列的缺失值来修改上一个元素,并通过(仅需要系列的唯一索引)最后添加它们:
编辑:如果索引中不需要唯一值,请使用以下内容创建唯一的多重索引:
在最后一步中,删除多索引的帮助器级别:
c = c.reset_index(level=-1, drop=True)
print (c)
10.0 0.0
20.0 1.0
20.0 0.0
38.0 1.0
NaN NaN
dtype: float64
这很有效,非常感谢!作为一个附带问题,我是否可以使用np.stack(s[mask])
而不是np.array(s[mask].tolist()
?两者似乎返回相同的内容,但可能有什么特殊性?
s = pd.Series({10: np.array([[0.72260683, 0.27739317, 0. ],
[0.7187053 , 0.2812947 , 0. ],
[0.71435467, 0.28564533, 1. ],
[0.3268072 , 0.6731928 , 0. ],
[0.31941951, 0.68058049, 1. ],
[0.31260015, 0.68739985, 0. ]]),
20: np.array([[0.7022099 , 0.2977901 , 0. ],
[0.6983866 , 0.3016134 , 0. ],
[0.69411673, 0.30588327, 1. ],
[0.33857735, 0.66142265, 0. ],
[0.33244109, 0.66755891, 1. ],
[0.32675582, 0.67324418, 0. ]]),
23: np.array([[0.68811957, 0.34188043, 0. ],
[0.68425783, 0.31574217, 0. ],
[0.67994496, 0.32005504, 1. ],
[0.34872593, 0.66127407, 1. ],
[0.34276171, 0.65723829, 1. ],
[0.33722803, 0.66277197, 0. ]]),
38: np.array([[0.68811957, 0.31188043, 0. ],
[0.68425783, 0.31574217, 0. ],
[0.67994496, 0.32005504, 1. ],
[0.34872593, 0.65127407, 0. ],
[0.34276171, 0.65723829, 1. ],
[0.33722803, 0.66277197, 0. ]]),
np.nan: np.nan}
).rename({23:20})
print (s)
10.0 [[0.72260683, 0.27739317, 0.0], [0.7187053, 0....
20.0 [[0.7022099, 0.2977901, 0.0], [0.6983866, 0.30...
20.0 [[0.68811957, 0.34188043, 0.0], [0.68425783, 0...
38.0 [[0.68811957, 0.31188043, 0.0], [0.68425783, 0...
NaN NaN
dtype: object
a = np.array([1, 4, 1, 2, np.nan])
s = s.to_frame('a').set_index(s.groupby(s.index).cumcount(), append=True)['a']
print (s)
10.0 0 [[0.72260683, 0.27739317, 0.0], [0.7187053, 0....
20.0 0 [[0.7022099, 0.2977901, 0.0], [0.6983866, 0.30...
1 [[0.68811957, 0.34188043, 0.0], [0.68425783, 0...
38.0 0 [[0.68811957, 0.31188043, 0.0], [0.68425783, 0...
NaN 0 NaN
Name: a, dtype: object
mask = s.notna()
b = np.array(s[mask].tolist())[np.arange(mask.sum()), a[mask].astype(int), 2]
print (b)
[0. 1. 0. 1.]
c = pd.Series(b, index=s[mask].index).reindex(s.index)
print (c)
10.0 0 0.0
20.0 0 1.0
1 0.0
38.0 0 1.0
NaN 0 NaN
dtype: float64
c = c.reset_index(level=-1, drop=True)
print (c)
10.0 0.0
20.0 1.0
20.0 0.0
38.0 1.0
NaN NaN
dtype: float64