Python 获取数据帧的协方差返回值
我有这样一个数据帧:Python 获取数据帧的协方差返回值,python,pandas,numpy,Python,Pandas,Numpy,我有这样一个数据帧: YAU OTBL HLE 2009-03-08 nan nan nan 2009-03-09 1.59904743 1.66397210 1.67345829 2009-03-10 -0.37065629 -0.36541822 -0.36015840 2009-03-11 -0.41055669 0.60004777 0.00536958 def
YAU OTBL HLE
2009-03-08 nan nan nan
2009-03-09 1.59904743 1.66397210 1.67345829
2009-03-10 -0.37065629 -0.36541822 -0.36015840
2009-03-11 -0.41055669 0.60004777 0.00536958
def get_covariance_returns(returns):
return np.cov(returns.values, rowvar=False)
这是我的职责
def get_covariance_returns(returns):
return np.cov(returns.values)
returns参数是一个数据帧,用于返回每个股票代码和日期。
输出是一个二维数组,表示返回的协方差
当我运行代码时,我有:
AssertionError: Wrong shape for output returns_covariance. Got (4, 4), expected (3, 3)
现在,我修改了我的函数如下:
YAU OTBL HLE
2009-03-08 nan nan nan
2009-03-09 1.59904743 1.66397210 1.67345829
2009-03-10 -0.37065629 -0.36541822 -0.36015840
2009-03-11 -0.41055669 0.60004777 0.00536958
def get_covariance_returns(returns):
return np.cov(returns.values, rowvar=False)
我的结果是:
OUTPUT returns_covariance:
[[ nan nan nan]
[ nan nan nan]
[ nan nan nan]]
请注意,预期输出为:
EXPECTED OUTPUT FOR returns_covariance:
[[ 0.89856076 0.7205586 0.8458721 ]
[ 0.7205586 0.78707297 0.76450378]
[ 0.8458721 0.76450378 0.83182775]]
我需要一个指南来了解我的实现出了什么问题。我正在用Python语言编程。如果您删除NaN
s:
>>> np.cov(df.dropna().values, rowvar=False)
array([[ 1.31997225, 1.01614032, 1.2238726 ],
[ 1.01614032, 1.0304141 , 1.04243784],
[ 1.2238726 , 1.04243784, 1.17528792]])
或者更简单地说,使用自动转换为NaN的:
>>> df.cov()
YAU OTBL HLE
YAU 1.319972 1.016140 1.223873
OTBL 1.016140 1.030414 1.042438
HLE 1.223873 1.042438 1.175288
[编辑]:根据您的预期输出,您实际上正在将NaN
替换为零:
>>> np.cov(df.replace(np.nan, 0).values, rowvar=False)
array([[ 0.89856076, 0.7205586 , 0.8458721 ],
[ 0.7205586 , 0.78707297, 0.76450378],
[ 0.8458721 , 0.76450378, 0.83182775]])
>>> df.replace(np.nan, 0).cov()
YAU OTBL HLE
YAU 0.898561 0.720559 0.845872
OTBL 0.720559 0.787073 0.764504
HLE 0.845872 0.764504 0.831828
无论如何,我将离开我原来的帖子,因为它显示了两个cov
函数之间的区别:df.fillna(0.cov()