Python 熊猫取第二个最近的值并插值
我希望以转换以下格式的数据帧为例:Python 熊猫取第二个最近的值并插值,python,pandas,dataframe,interpolation,resampling,Python,Pandas,Dataframe,Interpolation,Resampling,我希望以转换以下格式的数据帧为例: >>>df vals 2019-08-10 12:03:05 1.0 2019-08-10 12:03:06 NaN 2019-08-10 12:03:07 NaN 2019-08-10 12:03:08 3.0 2019-08-10 12:03:09 4.0 2019-08-10 12:03:10 NaN 2019-08-10 12:03:11 NaN 2019-08-
>>>df
vals
2019-08-10 12:03:05 1.0
2019-08-10 12:03:06 NaN
2019-08-10 12:03:07 NaN
2019-08-10 12:03:08 3.0
2019-08-10 12:03:09 4.0
2019-08-10 12:03:10 NaN
2019-08-10 12:03:11 NaN
2019-08-10 12:03:12 5.0
2019-08-10 12:03:13 NaN
2019-08-10 12:03:14 1.0
2019-08-10 12:03:15 NaN
2019-08-10 12:03:16 NaN
2019-08-10 12:03:17 6.0
例如:
>>>df
vals
2019-08-10 12:03:05 1.0
2019-08-10 12:03:06 1.667
2019-08-10 12:03:07 2.333
2019-08-10 12:03:08 3.0
2019-08-10 12:03:09 3.667
2019-08-10 12:03:10 4.333
2019-08-10 12:03:11 5.0
2019-08-10 12:03:12 3.667
2019-08-10 12:03:13 2.333
2019-08-10 12:03:14 1.0
2019-08-10 12:03:15 2.667
2019-08-10 12:03:16 4.333
2019-08-10 12:03:17 6.0
其中数据帧首先对齐,如下所示(每3个值取最接近的值):
然后在每个值之间进行线性插值以生成最终数据帧。如果间隔超过2秒,我不想在这两个值之间插值
这就是我迄今为止所尝试的:
df.resample('3s').nearest()
产生:
>>> df.resample('3s').nearest()
vals
2019-08-10 12:03:03 1.0
2019-08-10 12:03:06 NaN
2019-08-10 12:03:09 4.0
2019-08-10 12:03:12 5.0
2019-08-10 12:03:15 NaN
此外:
很明显,最近值是一个完全的谎言,或者至少是一个误称,因为最接近10的值显然是4。此外,2019-08-10 12:03:16的最终值肯定应该是6.0
这只是尝试将值与第二个值对齐,在此之后,简单的interpolate
似乎可以工作
非常感谢您的帮助。如果您想用最近的值替换nan值,则可以使用插值
data['value'] = data['value'].interpolate(method='nearest')
如果要用最近的值替换nan值,则可以使用插值
data['value'] = data['value'].interpolate(method='nearest')
我认为您需要base
参数,用于以索引第一个值的3
为模的采样周期的变化偏移量(因为3秒):
然后是iterpolate:
df['new'] = df['new'].interpolate()
print (df)
vals new
2019-08-10 12:03:05 1.0 1.000000
2019-08-10 12:03:06 NaN 1.666667
2019-08-10 12:03:07 NaN 2.333333
2019-08-10 12:03:08 3.0 3.000000
2019-08-10 12:03:09 4.0 3.666667
2019-08-10 12:03:10 NaN 4.333333
2019-08-10 12:03:11 NaN 5.000000
2019-08-10 12:03:12 5.0 3.666667
2019-08-10 12:03:13 NaN 2.333333
2019-08-10 12:03:14 1.0 1.000000
2019-08-10 12:03:15 NaN 2.666667
2019-08-10 12:03:16 NaN 4.333333
2019-08-10 12:03:17 6.0 6.000000
使用添加2秒到索引进行测试:
df.index += pd.Timedelta(2, 's')
df['new'] = df.resample('3s', base=df.index[0].second % 3).first()
print (df)
vals new
2019-08-10 12:03:07 1.0 1.0
2019-08-10 12:03:08 NaN NaN
2019-08-10 12:03:09 NaN NaN
2019-08-10 12:03:10 3.0 3.0
2019-08-10 12:03:11 4.0 NaN
2019-08-10 12:03:12 NaN NaN
2019-08-10 12:03:13 NaN 5.0
2019-08-10 12:03:14 5.0 NaN
2019-08-10 12:03:15 NaN NaN
2019-08-10 12:03:16 1.0 1.0
2019-08-10 12:03:17 NaN NaN
2019-08-10 12:03:18 NaN NaN
2019-08-10 12:03:19 6.0 6.0
我认为您需要base
参数,用于以索引第一个值的3
为模的采样周期的变化偏移量(因为3秒):
然后是iterpolate:
df['new'] = df['new'].interpolate()
print (df)
vals new
2019-08-10 12:03:05 1.0 1.000000
2019-08-10 12:03:06 NaN 1.666667
2019-08-10 12:03:07 NaN 2.333333
2019-08-10 12:03:08 3.0 3.000000
2019-08-10 12:03:09 4.0 3.666667
2019-08-10 12:03:10 NaN 4.333333
2019-08-10 12:03:11 NaN 5.000000
2019-08-10 12:03:12 5.0 3.666667
2019-08-10 12:03:13 NaN 2.333333
2019-08-10 12:03:14 1.0 1.000000
2019-08-10 12:03:15 NaN 2.666667
2019-08-10 12:03:16 NaN 4.333333
2019-08-10 12:03:17 6.0 6.000000
使用添加2秒到索引进行测试:
df.index += pd.Timedelta(2, 's')
df['new'] = df.resample('3s', base=df.index[0].second % 3).first()
print (df)
vals new
2019-08-10 12:03:07 1.0 1.0
2019-08-10 12:03:08 NaN NaN
2019-08-10 12:03:09 NaN NaN
2019-08-10 12:03:10 3.0 3.0
2019-08-10 12:03:11 4.0 NaN
2019-08-10 12:03:12 NaN NaN
2019-08-10 12:03:13 NaN 5.0
2019-08-10 12:03:14 5.0 NaN
2019-08-10 12:03:15 NaN NaN
2019-08-10 12:03:16 1.0 1.0
2019-08-10 12:03:17 NaN NaN
2019-08-10 12:03:18 NaN NaN
2019-08-10 12:03:19 6.0 6.0
输出
Time vals
0 2019-08-10 12:03:05 1.000000
1 2019-08-10 12:03:06 1.666667
2 2019-08-10 12:03:07 2.333333
3 2019-08-10 12:03:08 3.000000
4 2019-08-10 12:03:09 4.000000
5 2019-08-10 12:03:10 4.333333
6 2019-08-10 12:03:11 4.666667
7 2019-08-10 12:03:12 5.000000
8 2019-08-10 12:03:13 3.000000
9 2019-08-10 12:03:14 1.000000
10 2019-08-10 12:03:15 2.666667
11 2019-08-10 12:03:16 4.333333
12 2019-08-10 12:03:17 6.000000
输出
Time vals
0 2019-08-10 12:03:05 1.000000
1 2019-08-10 12:03:06 1.666667
2 2019-08-10 12:03:07 2.333333
3 2019-08-10 12:03:08 3.000000
4 2019-08-10 12:03:09 4.000000
5 2019-08-10 12:03:10 4.333333
6 2019-08-10 12:03:11 4.666667
7 2019-08-10 12:03:12 5.000000
8 2019-08-10 12:03:13 3.000000
9 2019-08-10 12:03:14 1.000000
10 2019-08-10 12:03:15 2.666667
11 2019-08-10 12:03:16 4.333333
12 2019-08-10 12:03:17 6.000000
您可以使用pass参数根据您在下面链接上的要求签入详细信息插值方法。您可以使用pass参数根据您在下面链接上的要求签入详细信息插值方法