Python 3.x 缺失数据的时间序列

Python 3.x 缺失数据的时间序列,python-3.x,pandas,dataframe,Python 3.x,Pandas,Dataframe,下面可以找到一个数据集。时间序列中有20个缺失值,由NaN表示。您能告诉我如何编写python-3脚本以获得对NaN值的最佳估计吗 请注意,您需要考虑日期和时间并非等距分布的事实,因此您不能只取上一个值和下一个值之间的平均值(此处时间始终为16:00:00,但对于其他数据集则不一定如此,因此我希望看到一个也考虑非等距分布时间的通用解决方案) 你能给我看一个使用熊猫的通用代码,并且可以解决我前面的问题吗?您能否在解决方案中假设您的输入是字符串列表,例如[1/3/2012 16:00:00 26.9

下面可以找到一个数据集。时间序列中有20个缺失值,由NaN表示。您能告诉我如何编写python-3脚本以获得对NaN值的最佳估计吗

请注意,您需要考虑日期和时间并非等距分布的事实,因此您不能只取上一个值和下一个值之间的平均值(此处时间始终为16:00:00,但对于其他数据集则不一定如此,因此我希望看到一个也考虑非等距分布时间的通用解决方案)

你能给我看一个使用熊猫的通用代码,并且可以解决我前面的问题吗?您能否在解决方案中假设您的输入是字符串列表,例如[1/3/2012 16:00:00 26.96',1/4/2012 16:00:00 27.47',…]。日期和时间之间有一个空格,时间和值之间有一个选项卡

对于以下数据,理想的20 NaN值应为:[26.96、32.15、32.61、29.3、28.96、28.78、31.05、29.58、29.5、30.9、31.26、31.48、29.74、29.31、29.72、28.88、30.2、27.3、26.7、27.52]

1/3/2012 16:00:00   NaN
1/4/2012 16:00:00   27.47
1/5/2012 16:00:00   27.728
1/6/2012 16:00:00   28.19
1/9/2012 16:00:00   28.1
1/10/2012 16:00:00  28.15
1/11/2012 16:00:00  27.98
1/12/2012 16:00:00  28.02
1/13/2012 16:00:00  28.25
1/17/2012 16:00:00  28.65
1/18/2012 16:00:00  28.4
1/19/2012 16:00:00  28.435
1/20/2012 16:00:00  29.74
1/23/2012 16:00:00  29.95
1/24/2012 16:00:00  29.5703
1/25/2012 16:00:00  29.65
1/26/2012 16:00:00  29.7
1/27/2012 16:00:00  29.53
1/30/2012 16:00:00  29.62
1/31/2012 16:00:00  29.7
2/1/2012 16:00:00   30.05
2/2/2012 16:00:00   30.17
2/3/2012 16:00:00   30.4
2/6/2012 16:00:00   30.22
2/7/2012 16:00:00   30.485
2/8/2012 16:00:00   30.67
2/9/2012 16:00:00   30.8
2/10/2012 16:00:00  30.8
2/13/2012 16:00:00  30.77
2/14/2012 16:00:00  30.46
2/15/2012 16:00:00  30.39
2/16/2012 16:00:00  31.55
2/17/2012 16:00:00  31.32
2/21/2012 16:00:00  31.61
2/22/2012 16:00:00  31.68
2/23/2012 16:00:00  31.59
2/24/2012 16:00:00  31.5
2/27/2012 16:00:00  31.5
2/28/2012 16:00:00  31.93
2/29/2012 16:00:00  32
3/1/2012 16:00:00   32.39
3/2/2012 16:00:00   32.44
3/5/2012 16:00:00   32.05
3/6/2012 16:00:00   31.98
3/7/2012 16:00:00   31.92
3/8/2012 16:00:00   32.21
3/9/2012 16:00:00   32.16
3/12/2012 16:00:00  32.2
3/13/2012 16:00:00  32.69
3/14/2012 16:00:00  32.88
3/15/2012 16:00:00  32.94
3/16/2012 16:00:00  32.95
3/19/2012 16:00:00  32.61
3/20/2012 16:00:00  32.15
3/21/2012 16:00:00  NaN
3/22/2012 16:00:00  32.09
3/23/2012 16:00:00  32.11
3/26/2012 16:00:00  NaN
3/27/2012 16:00:00  32.7
3/28/2012 16:00:00  32.7
3/29/2012 16:00:00  32.19
3/30/2012 16:00:00  32.41
4/2/2012 16:00:00   32.46
4/3/2012 16:00:00   32.19
4/4/2012 16:00:00   31.69
4/5/2012 16:00:00   31.63
4/9/2012 16:00:00   31.4
4/10/2012 16:00:00  31.19
4/11/2012 16:00:00  30.53
4/12/2012 16:00:00  31.04
4/13/2012 16:00:00  31.16
4/16/2012 16:00:00  31.19
4/17/2012 16:00:00  31.61
4/18/2012 16:00:00  31.31
4/19/2012 16:00:00  31.68
4/20/2012 16:00:00  32.89
4/23/2012 16:00:00  32.5
4/24/2012 16:00:00  32.52
4/25/2012 16:00:00  32.32
4/26/2012 16:00:00  32.23
4/27/2012 16:00:00  32.22
4/30/2012 16:00:00  32.11
5/1/2012 16:00:00   32.335
5/2/2012 16:00:00   31.925
5/3/2012 16:00:00   31.9
5/4/2012 16:00:00   31.57
5/7/2012 16:00:00   30.86
5/8/2012 16:00:00   30.78
5/9/2012 16:00:00   30.83
5/10/2012 16:00:00  31.02
5/11/2012 16:00:00  31.54
5/14/2012 16:00:00  31.04
5/15/2012 16:00:00  30.795
5/16/2012 16:00:00  30.32
5/17/2012 16:00:00  30.2084
5/18/2012 16:00:00  29.81
5/21/2012 16:00:00  29.79
5/22/2012 16:00:00  29.88
5/23/2012 16:00:00  29.4
5/24/2012 16:00:00  NaN
5/25/2012 16:00:00  29.36
5/29/2012 16:00:00  29.72
5/30/2012 16:00:00  29.479
5/31/2012 16:00:00  29.42
6/1/2012 16:00:00   NaN
6/4/2012 16:00:00   NaN
6/5/2012 16:00:00   28.75
6/6/2012 16:00:00   29.37
6/7/2012 16:00:00   29.7
6/8/2012 16:00:00   29.68
6/11/2012 16:00:00  29.81
6/12/2012 16:00:00  29.3
6/13/2012 16:00:00  29.44
6/14/2012 16:00:00  29.46
6/15/2012 16:00:00  30.08
6/18/2012 16:00:00  30.03
6/19/2012 16:00:00  31.11
6/20/2012 16:00:00  31.05
6/21/2012 16:00:00  31.14
6/22/2012 16:00:00  30.73
6/25/2012 16:00:00  30.32
6/26/2012 16:00:00  30.27
6/27/2012 16:00:00  30.5
6/28/2012 16:00:00  30.05
6/29/2012 16:00:00  30.69
7/2/2012 16:00:00   30.62
7/3/2012 16:00:00   30.76
7/5/2012 16:00:00   30.78
7/6/2012 16:00:00   30.7
7/9/2012 16:00:00   30.23
7/10/2012 16:00:00  30.22
7/11/2012 16:00:00  29.735
7/12/2012 16:00:00  29.18
7/13/2012 16:00:00  29.48
7/16/2012 16:00:00  29.53
7/17/2012 16:00:00  29.86
7/18/2012 16:00:00  30.45
7/19/2012 16:00:00  30.8
7/20/2012 16:00:00  NaN
7/23/2012 16:00:00  NaN
7/24/2012 16:00:00  29.36
7/25/2012 16:00:00  29.33
7/26/2012 16:00:00  NaN
7/27/2012 16:00:00  29.85
7/30/2012 16:00:00  29.82
7/31/2012 16:00:00  29.71
8/1/2012 16:00:00   29.65
8/2/2012 16:00:00   29.525
8/3/2012 16:00:00   29.94
8/6/2012 16:00:00   30.11
8/7/2012 16:00:00   30.35
8/8/2012 16:00:00   30.47
8/9/2012 16:00:00   30.65
8/10/2012 16:00:00  30.62
8/13/2012 16:00:00  30.46
8/14/2012 16:00:00  30.39
8/15/2012 16:00:00  30.28
8/16/2012 16:00:00  30.94
8/17/2012 16:00:00  30.92
8/20/2012 16:00:00  30.85
8/21/2012 16:00:00  30.96
8/22/2012 16:00:00  30.76
8/23/2012 16:00:00  30.4
8/24/2012 16:00:00  30.63
8/27/2012 16:00:00  30.96
8/28/2012 16:00:00  30.8
8/29/2012 16:00:00  30.75
8/30/2012 16:00:00  30.61
8/31/2012 16:00:00  30.96
9/4/2012 16:00:00   30.66
9/5/2012 16:00:00   30.53
9/6/2012 16:00:00   31.36
9/7/2012 16:00:00   31.07
9/10/2012 16:00:00  NaN
9/11/2012 16:00:00  30.91
9/12/2012 16:00:00  31.18
9/13/2012 16:00:00  31.18
9/14/2012 16:00:00  31.25
9/17/2012 16:00:00  NaN
9/18/2012 16:00:00  31.21
9/19/2012 16:00:00  31.19
9/20/2012 16:00:00  NaN
9/21/2012 16:00:00  31.61
9/24/2012 16:00:00  31.07
9/25/2012 16:00:00  31
9/26/2012 16:00:00  30.6
9/27/2012 16:00:00  30.4
9/28/2012 16:00:00  30.26
10/1/2012 16:00:00  29.98
10/2/2012 16:00:00  29.89
10/3/2012 16:00:00  29.99
10/4/2012 16:00:00  30.03
10/5/2012 16:00:00  30.25
10/8/2012 16:00:00  29.92
10/9/2012 16:00:00  NaN
10/10/2012 16:00:00 NaN
10/11/2012 16:00:00 29.25
10/12/2012 16:00:00 29.32
10/15/2012 16:00:00 NaN
10/16/2012 16:00:00 29.74
10/17/2012 16:00:00 29.64
10/18/2012 16:00:00 29.73
10/19/2012 16:00:00 29.08
10/22/2012 16:00:00 28.83
10/23/2012 16:00:00 28.2
10/24/2012 16:00:00 28.2
10/25/2012 16:00:00 28.2
10/26/2012 16:00:00 28.34
10/31/2012 16:00:00 NaN
11/1/2012 16:00:00  29.56
11/2/2012 16:00:00  29.77
11/5/2012 16:00:00  29.74
11/6/2012 16:00:00  NaN
11/7/2012 16:00:00  29.825
11/8/2012 16:00:00  29.37
11/9/2012 16:00:00  29.19
11/12/2012 16:00:00 29.01
11/13/2012 16:00:00 NaN
11/14/2012 16:00:00 27.29
11/15/2012 16:00:00 26.97
11/16/2012 16:00:00 NaN
11/19/2012 16:00:00 26.8
11/20/2012 16:00:00 26.8
11/21/2012 16:00:00 27.1666
11/23/2012 13:00:00 27.77
11/26/2012 16:00:00 27.58
11/27/2012 16:00:00 27.38
11/28/2012 16:00:00 27.39
11/29/2012 16:00:00 27.36
11/30/2012 16:00:00 27.13
12/3/2012 16:00:00  26.82
12/4/2012 16:00:00  26.63
12/5/2012 16:00:00  26.93
12/6/2012 16:00:00  26.98
12/7/2012 16:00:00  26.82
12/10/2012 16:00:00 26.97
12/11/2012 16:00:00 27.49
12/12/2012 16:00:00 27.62
12/13/2012 16:00:00 NaN
12/14/2012 16:00:00 27.13
12/17/2012 16:00:00 27.215
12/18/2012 16:00:00 27.63
12/19/2012 16:00:00 27.73
12/20/2012 16:00:00 27.68
12/21/2012 16:00:00 27.49
12/24/2012 13:00:00 27.25
12/26/2012 16:00:00 27.2
12/27/2012 16:00:00 27.09
12/28/2012 16:00:00 26.9
12/31/2012 16:00:00 26.77

默认情况下,插值方法使用线性估计填充NAs,但可以将其设置为使用datetime。这个下面是使用几行数据的示例:

import pandas as pd
import numpy as np

data = {"val":[32.15, np.NaN, 32.09, 32.11, np.NaN, 32.7]}
df = pd.DataFrame(data, index=["3/20/2012 16:00:00", "3/21/2012 16:00:00", "3/22/2012 16:00:00", "3/23/2012 16:00:00", "3/26/2012 16:00:00", "3/27/2012 16:00:00" ])
df.index = pd.to_datetime(df.index)
print(df)
                       val
2012-03-20 16:00:00  32.15
2012-03-21 16:00:00    NaN
2012-03-22 16:00:00  32.09
2012-03-23 16:00:00  32.11
2012-03-26 16:00:00    NaN
2012-03-27 16:00:00  32.70

df.interpolate(method="time", inplace=True)

print(df)
                         val
2012-03-20 16:00:00  32.1500
2012-03-21 16:00:00  32.1200
2012-03-22 16:00:00  32.0900
2012-03-23 16:00:00  32.1100
2012-03-26 16:00:00  32.5525
2012-03-27 16:00:00  32.7000

将interpolate()与datetime索引一起使用欢迎使用Stackoverflow。请花点时间阅读这篇文章,以及如何提供答案,并相应地修改你的问题。这些提示可能也很有用。具体来说,请向我们展示您迄今为止所做的尝试以及您遇到的问题