Python 熊猫取第二个最近的值并插值

Python 熊猫取第二个最近的值并插值,python,pandas,dataframe,interpolation,resampling,Python,Pandas,Dataframe,Interpolation,Resampling,我希望以转换以下格式的数据帧为例: >>>df vals 2019-08-10 12:03:05 1.0 2019-08-10 12:03:06 NaN 2019-08-10 12:03:07 NaN 2019-08-10 12:03:08 3.0 2019-08-10 12:03:09 4.0 2019-08-10 12:03:10 NaN 2019-08-10 12:03:11 NaN 2019-08-

我希望以转换以下格式的数据帧为例:

>>>df
                      vals
2019-08-10 12:03:05   1.0
2019-08-10 12:03:06   NaN
2019-08-10 12:03:07   NaN
2019-08-10 12:03:08   3.0
2019-08-10 12:03:09   4.0
2019-08-10 12:03:10   NaN
2019-08-10 12:03:11   NaN
2019-08-10 12:03:12   5.0
2019-08-10 12:03:13   NaN
2019-08-10 12:03:14   1.0
2019-08-10 12:03:15   NaN
2019-08-10 12:03:16   NaN
2019-08-10 12:03:17   6.0
例如:

>>>df
                      vals
2019-08-10 12:03:05   1.0
2019-08-10 12:03:06   1.667
2019-08-10 12:03:07   2.333
2019-08-10 12:03:08   3.0
2019-08-10 12:03:09   3.667
2019-08-10 12:03:10   4.333
2019-08-10 12:03:11   5.0
2019-08-10 12:03:12   3.667
2019-08-10 12:03:13   2.333
2019-08-10 12:03:14   1.0
2019-08-10 12:03:15   2.667
2019-08-10 12:03:16   4.333
2019-08-10 12:03:17   6.0
其中数据帧首先对齐,如下所示(每3个值取最接近的值):

然后在每个值之间进行线性插值以生成最终数据帧。如果间隔超过2秒,我不想在这两个值之间插值

这就是我迄今为止所尝试的:

df.resample('3s').nearest()
产生:

>>> df.resample('3s').nearest()
                     vals
2019-08-10 12:03:03   1.0
2019-08-10 12:03:06   NaN
2019-08-10 12:03:09   4.0
2019-08-10 12:03:12   5.0
2019-08-10 12:03:15   NaN
此外:

很明显,最近值是一个完全的谎言,或者至少是一个误称,因为最接近10的值显然是4。此外,
2019-08-10 12:03:16的最终值肯定应该是
6.0

这只是尝试将值与第二个值对齐,在此之后,简单的
interpolate
似乎可以工作


非常感谢您的帮助。

如果您想用最近的值替换nan值,则可以使用插值

data['value'] = data['value'].interpolate(method='nearest')

如果要用最近的值替换nan值,则可以使用插值

data['value'] = data['value'].interpolate(method='nearest')

我认为您需要
base
参数,用于以索引第一个值的
3
为模的采样周期的变化偏移量(因为3秒):

然后是iterpolate:

df['new'] = df['new'].interpolate()
print (df)
                     vals       new
2019-08-10 12:03:05   1.0  1.000000
2019-08-10 12:03:06   NaN  1.666667
2019-08-10 12:03:07   NaN  2.333333
2019-08-10 12:03:08   3.0  3.000000
2019-08-10 12:03:09   4.0  3.666667
2019-08-10 12:03:10   NaN  4.333333
2019-08-10 12:03:11   NaN  5.000000
2019-08-10 12:03:12   5.0  3.666667
2019-08-10 12:03:13   NaN  2.333333
2019-08-10 12:03:14   1.0  1.000000
2019-08-10 12:03:15   NaN  2.666667
2019-08-10 12:03:16   NaN  4.333333
2019-08-10 12:03:17   6.0  6.000000
使用添加2秒到索引进行测试:

df.index += pd.Timedelta(2, 's')
df['new'] = df.resample('3s', base=df.index[0].second % 3).first()
print (df)

                     vals  new
2019-08-10 12:03:07   1.0  1.0
2019-08-10 12:03:08   NaN  NaN
2019-08-10 12:03:09   NaN  NaN
2019-08-10 12:03:10   3.0  3.0
2019-08-10 12:03:11   4.0  NaN
2019-08-10 12:03:12   NaN  NaN
2019-08-10 12:03:13   NaN  5.0
2019-08-10 12:03:14   5.0  NaN
2019-08-10 12:03:15   NaN  NaN
2019-08-10 12:03:16   1.0  1.0
2019-08-10 12:03:17   NaN  NaN
2019-08-10 12:03:18   NaN  NaN
2019-08-10 12:03:19   6.0  6.0

我认为您需要
base
参数,用于以索引第一个值的
3
为模的采样周期的变化偏移量(因为3秒):

然后是iterpolate:

df['new'] = df['new'].interpolate()
print (df)
                     vals       new
2019-08-10 12:03:05   1.0  1.000000
2019-08-10 12:03:06   NaN  1.666667
2019-08-10 12:03:07   NaN  2.333333
2019-08-10 12:03:08   3.0  3.000000
2019-08-10 12:03:09   4.0  3.666667
2019-08-10 12:03:10   NaN  4.333333
2019-08-10 12:03:11   NaN  5.000000
2019-08-10 12:03:12   5.0  3.666667
2019-08-10 12:03:13   NaN  2.333333
2019-08-10 12:03:14   1.0  1.000000
2019-08-10 12:03:15   NaN  2.666667
2019-08-10 12:03:16   NaN  4.333333
2019-08-10 12:03:17   6.0  6.000000
使用添加2秒到索引进行测试:

df.index += pd.Timedelta(2, 's')
df['new'] = df.resample('3s', base=df.index[0].second % 3).first()
print (df)

                     vals  new
2019-08-10 12:03:07   1.0  1.0
2019-08-10 12:03:08   NaN  NaN
2019-08-10 12:03:09   NaN  NaN
2019-08-10 12:03:10   3.0  3.0
2019-08-10 12:03:11   4.0  NaN
2019-08-10 12:03:12   NaN  NaN
2019-08-10 12:03:13   NaN  5.0
2019-08-10 12:03:14   5.0  NaN
2019-08-10 12:03:15   NaN  NaN
2019-08-10 12:03:16   1.0  1.0
2019-08-10 12:03:17   NaN  NaN
2019-08-10 12:03:18   NaN  NaN
2019-08-10 12:03:19   6.0  6.0
输出

                   Time     vals
0   2019-08-10 12:03:05     1.000000
1   2019-08-10 12:03:06     1.666667
2   2019-08-10 12:03:07     2.333333
3   2019-08-10 12:03:08     3.000000
4   2019-08-10 12:03:09     4.000000
5   2019-08-10 12:03:10     4.333333
6   2019-08-10 12:03:11     4.666667
7   2019-08-10 12:03:12     5.000000
8   2019-08-10 12:03:13     3.000000
9   2019-08-10 12:03:14     1.000000
10  2019-08-10 12:03:15     2.666667
11  2019-08-10 12:03:16     4.333333
12  2019-08-10 12:03:17     6.000000
输出

                   Time     vals
0   2019-08-10 12:03:05     1.000000
1   2019-08-10 12:03:06     1.666667
2   2019-08-10 12:03:07     2.333333
3   2019-08-10 12:03:08     3.000000
4   2019-08-10 12:03:09     4.000000
5   2019-08-10 12:03:10     4.333333
6   2019-08-10 12:03:11     4.666667
7   2019-08-10 12:03:12     5.000000
8   2019-08-10 12:03:13     3.000000
9   2019-08-10 12:03:14     1.000000
10  2019-08-10 12:03:15     2.666667
11  2019-08-10 12:03:16     4.333333
12  2019-08-10 12:03:17     6.000000

您可以使用pass参数根据您在下面链接上的要求签入详细信息插值方法。您可以使用pass参数根据您在下面链接上的要求签入详细信息插值方法