Python 3.x 给出奇怪结果的插值_Python 3.x_Pandas_Interpolation_Python Datetime_Pandas Resample

Python 3.x 给出奇怪结果的插值

python-3.x pandas

Python 3.x 给出奇怪结果的插值,python-3.x,pandas,interpolation,python-datetime,pandas-resample,Python 3.x,Pandas,Interpolation,Python Datetime,Pandas Resample,我使用Pandas在时间上插值数据点，但是在重新采样和插值时，在使用不同的重新采样率时，对于相同的插值时间，我会得到不同的结果下面是一个测试示例： import pandas as pd import datetime data = pd.DataFrame({'time': list(map(lambda a: datetime.datetime.strptime(a, '%Y-%m-%d %H:%M:%S'),

我使用Pandas在时间上插值数据点，但是在重新采样和插值时，在使用不同的重新采样率时，对于相同的插值时间，我会得到不同的结果

下面是一个测试示例：

import pandas as pd
import datetime

data = pd.DataFrame({'time': list(map(lambda a: datetime.datetime.strptime(a, '%Y-%m-%d %H:%M:%S'),
                                                ['2021-03-28 12:00:00', '2021-03-28 12:01:40',
                                                 '2021-03-28 12:03:20', '2021-03-28 12:05:00',
                                                 '2021-03-28 12:06:40', '2021-03-28 12:08:20',
                                                 '2021-03-28 12:10:00', '2021-03-28 12:11:40',
                                                 '2021-03-28 12:13:20', '2021-03-28 12:15:00'])),
                     'latitude': [44.0, 44.00463175663968, 44.00919766508212,
                                  44.01357245844425, 44.0176360866699, 44.02127701531401,
                                  44.02439529286458, 44.02690530159084, 44.02873811544965,
                                  44.02984339933479],
                     'longitude': [-62.75, -62.74998054893869, -62.748902164559304,
                                   -62.74679419470262, -62.7437142666763, -62.739746727555016,
                                   -62.735000345048086, -62.72960533041183, -62.72370976436673,
                                   -62.717475524320704]})

data.set_index('time', inplace=True)

a = data.resample('20s').interpolate(method='time')
b = data.resample('60s').interpolate(method='time')

print(a.iloc[:18:3])
print(b.iloc[:6])

# --- OUTPUT --- #

                      latitude  longitude
time                                     
2021-03-28 12:00:00  44.000000 -62.750000
2021-03-28 12:01:00  44.002779 -62.749988  # <-- Different Values
2021-03-28 12:02:00  44.005545 -62.749765  # <-- Different Values
2021-03-28 12:03:00  44.008284 -62.749118  # <-- Different Values
2021-03-28 12:04:00  44.010948 -62.748059  # <-- Different Values
2021-03-28 12:05:00  44.013572 -62.746794
                      latitude  longitude
time                                     
2021-03-28 12:00:00  44.000000 -62.750000
2021-03-28 12:01:00  44.002714 -62.749359  # <-- Different Values
2021-03-28 12:02:00  44.005429 -62.748718  # <-- Different Values
2021-03-28 12:03:00  44.008143 -62.748077  # <-- Different Values
2021-03-28 12:04:00  44.010858 -62.747435  # <-- Different Values
2021-03-28 12:05:00  44.013572 -62.746794

将熊猫作为pd导入
导入日期时间
data=pd.DataFrame（{'time'：list（映射（lambda:datetime.datetime.strtime（a），%Y-%m-%d%H:%m:%S）），
['2021-03-28 12:00:00', '2021-03-28 12:01:40',
'2021-03-28 12:03:20', '2021-03-28 12:05:00',
'2021-03-28 12:06:40', '2021-03-28 12:08:20',
'2021-03-28 12:10:00', '2021-03-28 12:11:40',
'2021-03-28 12:13:20', '2021-03-28 12:15:00'])),
“纬度”：[44.0,44.00463175663968,44.00919766508212，
44.01357245844425, 44.0176360866699, 44.02127701531401,
44.02439529286458, 44.02690530159084, 44.02873811544965,
44.02984339933479],
‘经度’：[-62.75，--62.74998054893869，--62.748902164559304，
-62.74679419470262, -62.7437142666763, -62.739746727555016,
-62.735000345048086, -62.72960533041183, -62.72370976436673,
-62.717475524320704]})
data.set_索引（'time'，inplace=True）
a=数据。重采样（'20s'）。插值（方法='时间'）
b=数据。重新采样（'60秒'）。插值（方法='时间'）
打印（a.iloc[：18:3]）
打印（b.iloc[：6]）
#---输出--#
经纬度
时间
2021-03-28 12:00:00  44.000000 -62.750000
2021-03-28 12:01:00 44.002779-62.749988#总结我的评论和一些解释：
如果将data.resample（'60s'）.asfreq（）
与data.resample（'20s'）.asfreq（）进行比较，可以观察到发生了什么。虽然所有的样本数据都适合20年代的网格，但60年代的网格中只剩下很少的值。基本上描述了这个问题
点是，pandas
重新采样，然后插值。如果重新采样导致数据丢失，则这些数据不可用于插值。如果要使用最初拥有的所有数据，则需要插值，然后重置索引。你可以这样做
# let's create new indices, the desired index...
new_index_20s = pd.date_range(data.index.min(), data.index.max(), freq='20s')
# and a helper for interpolation; the combination of existing and desired index
tmp_index_20s = data.index.union(new_index_20s)

new_index_60s = pd.date_range(data.index.min(), data.index.max(), freq='60s')
tmp_index_60s = data.index.union(new_index_20s)

# re-index to the helper index,
# interpolate,
# and re-index to desired index 
a1 = data.reindex(tmp_index_20s).interpolate('index').reindex(new_index_20s)
b1 = data.reindex(tmp_index_60s).interpolate('index').reindex(new_index_60s)

现在，您已在生成的时间序列中达成一致：
print(a1.iloc[:18:3])
print(b1.iloc[:6])
                      latitude  longitude
2021-03-28 12:00:00  44.000000 -62.750000
2021-03-28 12:01:00  44.002779 -62.749988
2021-03-28 12:02:00  44.005545 -62.749765
2021-03-28 12:03:00  44.008284 -62.749118
2021-03-28 12:04:00  44.010948 -62.748059
2021-03-28 12:05:00  44.013572 -62.746794
                      latitude  longitude
2021-03-28 12:00:00  44.000000 -62.750000
2021-03-28 12:01:00  44.002779 -62.749988
2021-03-28 12:02:00  44.005545 -62.749765
2021-03-28 12:03:00  44.008284 -62.749118
2021-03-28 12:04:00  44.010948 -62.748059
2021-03-28 12:05:00  44.013572 -62.746794

比较data.resample（'60s'）.asfreq（）
与data.resample（'20s'）.asfreq（）
以查看哪些点用于插值。你所有的输入样本都适合20秒的网格，但只有很少的点适合60秒的网格。也许我只是错误地使用了工具。我希望它能利用所有的数据，分别将样本增加到20岁和60岁。有没有一种方法可以告诉熊猫在特定的时间间隔内重新采样，但同时根据原始数据帧对数据进行插值？线性插值现在对我来说已经足够好了。我不会责怪你错误地使用了这些工具——如果你问我的话，发生了什么事情并不明显！我认为这样做的诀窍-重新索引和插值，而不是重新采样numpy.interp
也可以用作。这就成功了！非常感谢您的帮助：）如果您将此作为答案发布，我将接受它，以便其他人快速查看。