Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/330.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python中时间序列中两个变量的相关性?_Python_Statistics - Fatal编程技术网

Python中时间序列中两个变量的相关性?

Python中时间序列中两个变量的相关性?,python,statistics,Python,Statistics,如果在一个时间序列中有两个不同的数据集,有没有一种简单的方法可以在python中找到这两个数据集之间的相关性 例如: # [ (dateTimeObject, y, z) ... ] x = [ (8:00am, 12, 8), (8:10am, 15, 10) .... ] 如何在Python中获得y和z的相关性?我建议pandas库有一个带有相关函数的模块 from scipy import stats # Y and Z are numpy arrays or lists of vari

如果在一个时间序列中有两个不同的数据集,有没有一种简单的方法可以在python中找到这两个数据集之间的相关性

例如:

# [ (dateTimeObject, y, z) ... ]
x = [ (8:00am, 12, 8), (8:10am, 15, 10) .... ]

如何在Python中获得y和z的相关性?

我建议pandas库有一个带有相关函数的模块

from scipy import stats
# Y and Z are numpy arrays or lists of variables 
stats.pearsonr(Y, Z)

您可以通过协方差矩阵或相关系数来实现这一点。以及用于此的文档功能,前者还附带了一个如何使用它的示例(corrcoef用法非常类似)

使用numpy:

from numpy import *
v = [ ('k', 1, 2), ('l', 2, 4), ('m', 13, 9) ]
corrcoef([ a[1] for a in v ], [ a[2] for a in v ])[0,1]

这里的理解有点慢。熊猫(http://github.com/wesm/pandas 而pandas.sourceforge.net)可能是你最好的选择。我有偏见,因为我写了它,但是:

In [7]: ts1
Out[7]: 
2000-01-03 00:00:00    -0.945653010936
2000-01-04 00:00:00    0.759529904445
2000-01-05 00:00:00    0.177646448683
2000-01-06 00:00:00    0.579750822716
2000-01-07 00:00:00    -0.0752734982291
2000-01-10 00:00:00    0.138730447557
2000-01-11 00:00:00    -0.506961851495

In [8]: ts2
Out[8]: 
2000-01-03 00:00:00    1.10436688823
2000-01-04 00:00:00    0.110075215713
2000-01-05 00:00:00    -0.372818939799
2000-01-06 00:00:00    -0.520443811368
2000-01-07 00:00:00    -0.455928700936
2000-01-10 00:00:00    1.49624355051
2000-01-11 00:00:00    -0.204383054598

In [9]: ts1.corr(ts2)
Out[9]: -0.34768587480980645
值得注意的是,如果您的数据超过不同的日期集,它将计算成对相关性。它还将自动排除NaN值

In [7]: ts1
Out[7]: 
2000-01-03 00:00:00    -0.945653010936
2000-01-04 00:00:00    0.759529904445
2000-01-05 00:00:00    0.177646448683
2000-01-06 00:00:00    0.579750822716
2000-01-07 00:00:00    -0.0752734982291
2000-01-10 00:00:00    0.138730447557
2000-01-11 00:00:00    -0.506961851495

In [8]: ts2
Out[8]: 
2000-01-03 00:00:00    1.10436688823
2000-01-04 00:00:00    0.110075215713
2000-01-05 00:00:00    -0.372818939799
2000-01-06 00:00:00    -0.520443811368
2000-01-07 00:00:00    -0.455928700936
2000-01-10 00:00:00    1.49624355051
2000-01-11 00:00:00    -0.204383054598

In [9]: ts1.corr(ts2)
Out[9]: -0.34768587480980645