Python 3.x 缺失数据的时间序列
下面可以找到一个数据集。时间序列中有20个缺失值,由NaN表示。您能告诉我如何编写python-3脚本以获得对NaN值的最佳估计吗 请注意,您需要考虑日期和时间并非等距分布的事实,因此您不能只取上一个值和下一个值之间的平均值(此处时间始终为16:00:00,但对于其他数据集则不一定如此,因此我希望看到一个也考虑非等距分布时间的通用解决方案) 你能给我看一个使用熊猫的通用代码,并且可以解决我前面的问题吗?您能否在解决方案中假设您的输入是字符串列表,例如[1/3/2012 16:00:00 26.96',1/4/2012 16:00:00 27.47',…]。日期和时间之间有一个空格,时间和值之间有一个选项卡 对于以下数据,理想的20 NaN值应为:[26.96、32.15、32.61、29.3、28.96、28.78、31.05、29.58、29.5、30.9、31.26、31.48、29.74、29.31、29.72、28.88、30.2、27.3、26.7、27.52]Python 3.x 缺失数据的时间序列,python-3.x,pandas,dataframe,Python 3.x,Pandas,Dataframe,下面可以找到一个数据集。时间序列中有20个缺失值,由NaN表示。您能告诉我如何编写python-3脚本以获得对NaN值的最佳估计吗 请注意,您需要考虑日期和时间并非等距分布的事实,因此您不能只取上一个值和下一个值之间的平均值(此处时间始终为16:00:00,但对于其他数据集则不一定如此,因此我希望看到一个也考虑非等距分布时间的通用解决方案) 你能给我看一个使用熊猫的通用代码,并且可以解决我前面的问题吗?您能否在解决方案中假设您的输入是字符串列表,例如[1/3/2012 16:00:00 26.9
1/3/2012 16:00:00 NaN
1/4/2012 16:00:00 27.47
1/5/2012 16:00:00 27.728
1/6/2012 16:00:00 28.19
1/9/2012 16:00:00 28.1
1/10/2012 16:00:00 28.15
1/11/2012 16:00:00 27.98
1/12/2012 16:00:00 28.02
1/13/2012 16:00:00 28.25
1/17/2012 16:00:00 28.65
1/18/2012 16:00:00 28.4
1/19/2012 16:00:00 28.435
1/20/2012 16:00:00 29.74
1/23/2012 16:00:00 29.95
1/24/2012 16:00:00 29.5703
1/25/2012 16:00:00 29.65
1/26/2012 16:00:00 29.7
1/27/2012 16:00:00 29.53
1/30/2012 16:00:00 29.62
1/31/2012 16:00:00 29.7
2/1/2012 16:00:00 30.05
2/2/2012 16:00:00 30.17
2/3/2012 16:00:00 30.4
2/6/2012 16:00:00 30.22
2/7/2012 16:00:00 30.485
2/8/2012 16:00:00 30.67
2/9/2012 16:00:00 30.8
2/10/2012 16:00:00 30.8
2/13/2012 16:00:00 30.77
2/14/2012 16:00:00 30.46
2/15/2012 16:00:00 30.39
2/16/2012 16:00:00 31.55
2/17/2012 16:00:00 31.32
2/21/2012 16:00:00 31.61
2/22/2012 16:00:00 31.68
2/23/2012 16:00:00 31.59
2/24/2012 16:00:00 31.5
2/27/2012 16:00:00 31.5
2/28/2012 16:00:00 31.93
2/29/2012 16:00:00 32
3/1/2012 16:00:00 32.39
3/2/2012 16:00:00 32.44
3/5/2012 16:00:00 32.05
3/6/2012 16:00:00 31.98
3/7/2012 16:00:00 31.92
3/8/2012 16:00:00 32.21
3/9/2012 16:00:00 32.16
3/12/2012 16:00:00 32.2
3/13/2012 16:00:00 32.69
3/14/2012 16:00:00 32.88
3/15/2012 16:00:00 32.94
3/16/2012 16:00:00 32.95
3/19/2012 16:00:00 32.61
3/20/2012 16:00:00 32.15
3/21/2012 16:00:00 NaN
3/22/2012 16:00:00 32.09
3/23/2012 16:00:00 32.11
3/26/2012 16:00:00 NaN
3/27/2012 16:00:00 32.7
3/28/2012 16:00:00 32.7
3/29/2012 16:00:00 32.19
3/30/2012 16:00:00 32.41
4/2/2012 16:00:00 32.46
4/3/2012 16:00:00 32.19
4/4/2012 16:00:00 31.69
4/5/2012 16:00:00 31.63
4/9/2012 16:00:00 31.4
4/10/2012 16:00:00 31.19
4/11/2012 16:00:00 30.53
4/12/2012 16:00:00 31.04
4/13/2012 16:00:00 31.16
4/16/2012 16:00:00 31.19
4/17/2012 16:00:00 31.61
4/18/2012 16:00:00 31.31
4/19/2012 16:00:00 31.68
4/20/2012 16:00:00 32.89
4/23/2012 16:00:00 32.5
4/24/2012 16:00:00 32.52
4/25/2012 16:00:00 32.32
4/26/2012 16:00:00 32.23
4/27/2012 16:00:00 32.22
4/30/2012 16:00:00 32.11
5/1/2012 16:00:00 32.335
5/2/2012 16:00:00 31.925
5/3/2012 16:00:00 31.9
5/4/2012 16:00:00 31.57
5/7/2012 16:00:00 30.86
5/8/2012 16:00:00 30.78
5/9/2012 16:00:00 30.83
5/10/2012 16:00:00 31.02
5/11/2012 16:00:00 31.54
5/14/2012 16:00:00 31.04
5/15/2012 16:00:00 30.795
5/16/2012 16:00:00 30.32
5/17/2012 16:00:00 30.2084
5/18/2012 16:00:00 29.81
5/21/2012 16:00:00 29.79
5/22/2012 16:00:00 29.88
5/23/2012 16:00:00 29.4
5/24/2012 16:00:00 NaN
5/25/2012 16:00:00 29.36
5/29/2012 16:00:00 29.72
5/30/2012 16:00:00 29.479
5/31/2012 16:00:00 29.42
6/1/2012 16:00:00 NaN
6/4/2012 16:00:00 NaN
6/5/2012 16:00:00 28.75
6/6/2012 16:00:00 29.37
6/7/2012 16:00:00 29.7
6/8/2012 16:00:00 29.68
6/11/2012 16:00:00 29.81
6/12/2012 16:00:00 29.3
6/13/2012 16:00:00 29.44
6/14/2012 16:00:00 29.46
6/15/2012 16:00:00 30.08
6/18/2012 16:00:00 30.03
6/19/2012 16:00:00 31.11
6/20/2012 16:00:00 31.05
6/21/2012 16:00:00 31.14
6/22/2012 16:00:00 30.73
6/25/2012 16:00:00 30.32
6/26/2012 16:00:00 30.27
6/27/2012 16:00:00 30.5
6/28/2012 16:00:00 30.05
6/29/2012 16:00:00 30.69
7/2/2012 16:00:00 30.62
7/3/2012 16:00:00 30.76
7/5/2012 16:00:00 30.78
7/6/2012 16:00:00 30.7
7/9/2012 16:00:00 30.23
7/10/2012 16:00:00 30.22
7/11/2012 16:00:00 29.735
7/12/2012 16:00:00 29.18
7/13/2012 16:00:00 29.48
7/16/2012 16:00:00 29.53
7/17/2012 16:00:00 29.86
7/18/2012 16:00:00 30.45
7/19/2012 16:00:00 30.8
7/20/2012 16:00:00 NaN
7/23/2012 16:00:00 NaN
7/24/2012 16:00:00 29.36
7/25/2012 16:00:00 29.33
7/26/2012 16:00:00 NaN
7/27/2012 16:00:00 29.85
7/30/2012 16:00:00 29.82
7/31/2012 16:00:00 29.71
8/1/2012 16:00:00 29.65
8/2/2012 16:00:00 29.525
8/3/2012 16:00:00 29.94
8/6/2012 16:00:00 30.11
8/7/2012 16:00:00 30.35
8/8/2012 16:00:00 30.47
8/9/2012 16:00:00 30.65
8/10/2012 16:00:00 30.62
8/13/2012 16:00:00 30.46
8/14/2012 16:00:00 30.39
8/15/2012 16:00:00 30.28
8/16/2012 16:00:00 30.94
8/17/2012 16:00:00 30.92
8/20/2012 16:00:00 30.85
8/21/2012 16:00:00 30.96
8/22/2012 16:00:00 30.76
8/23/2012 16:00:00 30.4
8/24/2012 16:00:00 30.63
8/27/2012 16:00:00 30.96
8/28/2012 16:00:00 30.8
8/29/2012 16:00:00 30.75
8/30/2012 16:00:00 30.61
8/31/2012 16:00:00 30.96
9/4/2012 16:00:00 30.66
9/5/2012 16:00:00 30.53
9/6/2012 16:00:00 31.36
9/7/2012 16:00:00 31.07
9/10/2012 16:00:00 NaN
9/11/2012 16:00:00 30.91
9/12/2012 16:00:00 31.18
9/13/2012 16:00:00 31.18
9/14/2012 16:00:00 31.25
9/17/2012 16:00:00 NaN
9/18/2012 16:00:00 31.21
9/19/2012 16:00:00 31.19
9/20/2012 16:00:00 NaN
9/21/2012 16:00:00 31.61
9/24/2012 16:00:00 31.07
9/25/2012 16:00:00 31
9/26/2012 16:00:00 30.6
9/27/2012 16:00:00 30.4
9/28/2012 16:00:00 30.26
10/1/2012 16:00:00 29.98
10/2/2012 16:00:00 29.89
10/3/2012 16:00:00 29.99
10/4/2012 16:00:00 30.03
10/5/2012 16:00:00 30.25
10/8/2012 16:00:00 29.92
10/9/2012 16:00:00 NaN
10/10/2012 16:00:00 NaN
10/11/2012 16:00:00 29.25
10/12/2012 16:00:00 29.32
10/15/2012 16:00:00 NaN
10/16/2012 16:00:00 29.74
10/17/2012 16:00:00 29.64
10/18/2012 16:00:00 29.73
10/19/2012 16:00:00 29.08
10/22/2012 16:00:00 28.83
10/23/2012 16:00:00 28.2
10/24/2012 16:00:00 28.2
10/25/2012 16:00:00 28.2
10/26/2012 16:00:00 28.34
10/31/2012 16:00:00 NaN
11/1/2012 16:00:00 29.56
11/2/2012 16:00:00 29.77
11/5/2012 16:00:00 29.74
11/6/2012 16:00:00 NaN
11/7/2012 16:00:00 29.825
11/8/2012 16:00:00 29.37
11/9/2012 16:00:00 29.19
11/12/2012 16:00:00 29.01
11/13/2012 16:00:00 NaN
11/14/2012 16:00:00 27.29
11/15/2012 16:00:00 26.97
11/16/2012 16:00:00 NaN
11/19/2012 16:00:00 26.8
11/20/2012 16:00:00 26.8
11/21/2012 16:00:00 27.1666
11/23/2012 13:00:00 27.77
11/26/2012 16:00:00 27.58
11/27/2012 16:00:00 27.38
11/28/2012 16:00:00 27.39
11/29/2012 16:00:00 27.36
11/30/2012 16:00:00 27.13
12/3/2012 16:00:00 26.82
12/4/2012 16:00:00 26.63
12/5/2012 16:00:00 26.93
12/6/2012 16:00:00 26.98
12/7/2012 16:00:00 26.82
12/10/2012 16:00:00 26.97
12/11/2012 16:00:00 27.49
12/12/2012 16:00:00 27.62
12/13/2012 16:00:00 NaN
12/14/2012 16:00:00 27.13
12/17/2012 16:00:00 27.215
12/18/2012 16:00:00 27.63
12/19/2012 16:00:00 27.73
12/20/2012 16:00:00 27.68
12/21/2012 16:00:00 27.49
12/24/2012 13:00:00 27.25
12/26/2012 16:00:00 27.2
12/27/2012 16:00:00 27.09
12/28/2012 16:00:00 26.9
12/31/2012 16:00:00 26.77
默认情况下,插值方法使用线性估计填充NAs,但可以将其设置为使用datetime。这个下面是使用几行数据的示例:
import pandas as pd
import numpy as np
data = {"val":[32.15, np.NaN, 32.09, 32.11, np.NaN, 32.7]}
df = pd.DataFrame(data, index=["3/20/2012 16:00:00", "3/21/2012 16:00:00", "3/22/2012 16:00:00", "3/23/2012 16:00:00", "3/26/2012 16:00:00", "3/27/2012 16:00:00" ])
df.index = pd.to_datetime(df.index)
print(df)
val
2012-03-20 16:00:00 32.15
2012-03-21 16:00:00 NaN
2012-03-22 16:00:00 32.09
2012-03-23 16:00:00 32.11
2012-03-26 16:00:00 NaN
2012-03-27 16:00:00 32.70
df.interpolate(method="time", inplace=True)
print(df)
val
2012-03-20 16:00:00 32.1500
2012-03-21 16:00:00 32.1200
2012-03-22 16:00:00 32.0900
2012-03-23 16:00:00 32.1100
2012-03-26 16:00:00 32.5525
2012-03-27 16:00:00 32.7000
将interpolate()与datetime索引一起使用欢迎使用Stackoverflow。请花点时间阅读这篇文章,以及如何提供答案,并相应地修改你的问题。这些提示可能也很有用。具体来说,请向我们展示您迄今为止所做的尝试以及您遇到的问题