Python 将多维多索引数据帧与单索引数据帧随时间相乘
我是Python新手,正在寻找帮助以使2个数据帧随时间增加。如果您能帮助理解错误,我们将不胜感激 第一数据帧(cov) 第二数据帧(w) 守则:Python 将多维多索引数据帧与单索引数据帧随时间相乘,python,pandas,dataframe,numpy,Python,Pandas,Dataframe,Numpy,我是Python新手,正在寻找帮助以使2个数据帧随时间增加。如果您能帮助理解错误,我们将不胜感激 第一数据帧(cov) 第二数据帧(w) 守则: std = np.dot(np.transpose(w) , np.matmul(cov , w)) 错误: ValueError: matmul: Input operand 1 has a mismatch in its core dimension 0, with gufunc signature (n?,k),(k,m?)->(n?,
std = np.dot(np.transpose(w) , np.matmul(cov , w))
错误:
ValueError: matmul: Input operand 1 has a mismatch in its core dimension 0, with gufunc signature (n?,k),(k,m?)->(n?,m?) (size 12361 is different from 10)
我只显示数据帧的小片段。原始cov数据框为123610行×10列,w数据框为12361行×10列
预期产出:
Date
2018-12-27 44.45574103083
2018-12-28 46.593367859
2018-12-31 45.282932300
非常感谢 我认为您可以在
日期
级别上使用groupby
,然后将w
中与组中日期对应的权重相乘:
cov.groupby(level='Date').apply(lambda g: w.loc[g.name].dot(g.values@(w.loc[g.name])))
由于三维数组能更好地表示数据,您还可以避免apply
中组上的隐式循环,并使用:
性能方面,第二种解决方案似乎更好:
%timeit cov.groupby(level='Date').apply(lambda g: w.loc[g.name].dot(g.values@(w.loc[g.name])))
4.74 ms ± 614 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit np.einsum('ik,ik->i', w.values, np.einsum('ijk,ik->ij', reshaped, w.values))
35.6 µs ± 5.19 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
数据:
对这很有效,非常感谢!很高兴它帮助了@BjarneTimm。请随意接受答案。你认为你也能解决这个问题吗?
cov.groupby(level='Date').apply(lambda g: w.loc[g.name].dot(g.values@(w.loc[g.name])))
reshaped = cov.values.reshape(cov.index.levels[0].nunique(), cov.index.levels[1].nunique(), cov.shape[-1])
np.einsum('ik,ik->i', w.values, np.einsum('ijk,ik->ij', reshaped, w.values))
%timeit cov.groupby(level='Date').apply(lambda g: w.loc[g.name].dot(g.values@(w.loc[g.name])))
4.74 ms ± 614 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit np.einsum('ik,ik->i', w.values, np.einsum('ijk,ik->ij', reshaped, w.values))
35.6 µs ± 5.19 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
cov = pd.DataFrame.from_dict({'NoDur': {('2018-12-27', 'NoDur'): 0.000109,
('2018-12-27', 'Durbl'): 0.000112,
('2018-12-27', 'Manuf'): 0.000118,
('2018-12-28', 'NoDur'): 0.000109,
('2018-12-28', 'Durbl'): 0.000113,
('2018-12-28', 'Manuf'): 0.000117,
('2018-12-31', 'NoDur'): 0.000109,
('2018-12-31', 'Durbl'): 0.000113,
('2018-12-31', 'Manuf'): 0.000118},
'Durbl': {('2018-12-27', 'NoDur'): 0.000112,
('2018-12-27', 'Durbl'): 0.000339,
('2018-12-27', 'Manuf'): 0.000238,
('2018-12-28', 'NoDur'): 0.000113,
('2018-12-28', 'Durbl'): 0.000339,
('2018-12-28', 'Manuf'): 0.000239,
('2018-12-31', 'NoDur'): 0.000113,
('2018-12-31', 'Durbl'): 0.000339,
('2018-12-31', 'Manuf'): 0.000239},
'Manuf': {('2018-12-27', 'NoDur'): 0.000118,
('2018-12-27', 'Durbl'): 0.000238,
('2018-12-27', 'Manuf'): 0.000246,
('2018-12-28', 'NoDur'): 0.000117,
('2018-12-28', 'Durbl'): 0.000239,
('2018-12-28', 'Manuf'): 0.000242,
('2018-12-31', 'NoDur'): 0.000118,
('2018-12-31', 'Durbl'): 0.000239,
('2018-12-31', 'Manuf'): 0.000245}})
w = pd.DataFrame.from_dict({'NoDur': {'2018-12-27': -69.190732,
'2018-12-28': -113.83175,
'2018-12-31': -101.365016},
'Durbl': {'2018-12-27': -96.316224,
'2018-12-28': 30.426696,
'2018-12-31': -16.613136},
'Manuf': {'2018-12-27': -324.058486,
'2018-12-28': -410.055587,
'2018-12-31': -362.232014}})