Python 熊猫时间序列中连续nan值的计数
我实际上在Python3和Pandas中处理时间序列,我想合成连续缺失值的周期,但我只能找到nan值的索引Python 熊猫时间序列中连续nan值的计数,python,python-3.x,pandas,time-series,continuous,Python,Python 3.x,Pandas,Time Series,Continuous,我实际上在Python3和Pandas中处理时间序列,我想合成连续缺失值的周期,但我只能找到nan值的索引 Sample data : Valeurs 2018-01-01 00:00:00 1.0 2018-01-01 04:00:00 NaN 2018-01-01 08:00:00 2.0 2018-01-01 12:00:00 NaN 2018-01-01 16:00:00 NaN 2018-01-
Sample data :
Valeurs
2018-01-01 00:00:00 1.0
2018-01-01 04:00:00 NaN
2018-01-01 08:00:00 2.0
2018-01-01 12:00:00 NaN
2018-01-01 16:00:00 NaN
2018-01-01 20:00:00 5.0
2018-01-02 00:00:00 6.0
2018-01-02 04:00:00 7.0
2018-01-02 08:00:00 8.0
2018-01-02 12:00:00 9.0
2018-01-02 16:00:00 5.0
2018-01-02 20:00:00 NaN
2018-01-03 00:00:00 NaN
2018-01-03 04:00:00 NaN
2018-01-03 08:00:00 1.0
2018-01-03 12:00:00 2.0
2018-01-03 16:00:00 NaN
Expected results :
Start_Date number of contiguous missing values
2018-01-01 04:00:00 1
2018-01-01 12:00:00 2
2018-01-02 20:00:00 3
2018-01-03 16:00:00 1
如何使用pandas(shift()、cumsum()、groupby()获得这种类型的结果
谢谢你的建议
Sylvain如果您有值出现的索引,您可以使用中的itertools查找连续块如果您有值出现的索引,您可以使用中的itertools查找连续块
groupby
和agg
groupby
和agg
处理基础
numpy
数组:
a = df.Valeurs.values
m = np.concatenate(([False],np.isnan(a),[False]))
idx = np.nonzero(m[1:] != m[:-1])[0]
out = df[df.Valeurs.isnull() & ~df.Valeurs.shift().isnull()].index
pd.DataFrame({'Start date': out, 'contiguous': (idx[1::2] - idx[::2])})
处理基础
numpy
数组:
a = df.Valeurs.values
m = np.concatenate(([False],np.isnan(a),[False]))
idx = np.nonzero(m[1:] != m[:-1])[0]
out = df[df.Valeurs.isnull() & ~df.Valeurs.shift().isnull()].index
pd.DataFrame({'Start date': out, 'contiguous': (idx[1::2] - idx[::2])})
Start date contiguous
0 2018-01-01 04:00:00 1
1 2018-01-01 12:00:00 2
2 2018-01-02 20:00:00 3
3 2018-01-03 16:00:00 1