Python 版本为0.12.0的错误
我有以下用于执行滚动OLS计算的示例数据(这里我是通过调试器执行的): 当我尝试滚动OLS时:Python 版本为0.12.0的错误,python,pandas,typeerror,linear-regression,Python,Pandas,Typeerror,Linear Regression,我有以下用于执行滚动OLS计算的示例数据(这里我是通过调试器执行的): 当我尝试滚动OLS时: (Pdb) pandas.ols(y=df[lhs], x=df[rhs], window=window, min_periods=min_periods, intercept=intercept) *** TypeError: unsupported operand type(s) for +: 'slice' and 'int' 但是,如果只是在整个数据范围内尝试常规OLS,似乎可以: (Pdb
(Pdb) pandas.ols(y=df[lhs], x=df[rhs], window=window, min_periods=min_periods, intercept=intercept)
*** TypeError: unsupported operand type(s) for +: 'slice' and 'int'
但是,如果只是在整个数据范围内尝试常规OLS,似乎可以:
(Pdb) pandas.ols(y=df[lhs], x=df[rhs], intercept=intercept)
-------------------------Summary of Regression Analysis-------------------------
Formula: Y ~ <Yield> + <intercept>
Number of Observations: 38
Number of Degrees of Freedom: 2
R-squared: 0.0226
Adj R-squared: -0.0046
Rmse: 12.5182
F-stat (1, 36): 0.8321, p-value: 0.3677
Degrees of Freedom: model 1, resid 36
-----------------------Summary of Estimated Coefficients------------------------
Variable Coef Std Err t-stat p-value CI 2.5% CI 97.5%
--------------------------------------------------------------------------------
Yield 146.6702 160.7874 0.91 0.3677 -168.4732 461.8135
intercept -4.6083 6.0652 -0.76 0.4523 -16.4961 7.2795
---------------------------------End of Summary---------------------------------
已添加
有问题的代码似乎位于Pandas 0.12中的ols.py
函数中
def _cum_xx(self, x):
dates = self._index
K = len(x.columns)
valid = self._time_has_obs
cum_xx = []
slicer = lambda df, dt: df.truncate(dt, dt).values
if not self._panel_model:
_get_index = x.index.get_loc
def slicer(df, dt):
i = _get_index(dt)
return df.values[i:i + 1, :]
last = np.zeros((K, K))
for i, date in enumerate(dates):
if not valid[i]:
cum_xx.append(last)
continue
x_slice = slicer(x, date)
xx = last = last + np.dot(x_slice.T, x_slice)
cum_xx.append(xx)
return cum_xx
\u get\u index
是x.index.get\u loc
的代理,表示它可以返回切片对象。但是下面的代码假设通过这种方式获得的值i
是一个整数,因此i+1
是有意义的
我找到了get\u loc
的源代码。原来x.index.get\u loc
是x.index.\u engine.get\u loc
的代理。在我的例子中,发生错误时相关的索引的\u engine\u type
就是ObjectEngine
,并且get\u loc
在这里定义:
cpdef get_loc(self, object val):
if is_definitely_invalid_key(val):
raise TypeError
if self.over_size_threshold and self.is_monotonic:
if not self.is_unique:
return self._get_loc_duplicates(val)
values = self._get_index_values()
loc = _bin_search(values, val) # .searchsorted(val, side='left')
if util.get_value_at(values, loc) != val:
raise KeyError(val)
return loc
self._ensure_mapping_populated()
if not self.unique:
return self._get_loc_duplicates(val)
self._check_type(val)
try:
return self.mapping.get_item(val)
except TypeError:
raise KeyError(val)
我正在研究何时/为什么get_loc
为我返回一个切片(索引中绝对没有重复项,这是文档建议的唯一方法)。同时,沿着这些思路提出的任何建议都会很有帮助。是不是你的索引不是数字?
def _cum_xx(self, x):
dates = self._index
K = len(x.columns)
valid = self._time_has_obs
cum_xx = []
slicer = lambda df, dt: df.truncate(dt, dt).values
if not self._panel_model:
_get_index = x.index.get_loc
def slicer(df, dt):
i = _get_index(dt)
return df.values[i:i + 1, :]
last = np.zeros((K, K))
for i, date in enumerate(dates):
if not valid[i]:
cum_xx.append(last)
continue
x_slice = slicer(x, date)
xx = last = last + np.dot(x_slice.T, x_slice)
cum_xx.append(xx)
return cum_xx
cpdef get_loc(self, object val):
if is_definitely_invalid_key(val):
raise TypeError
if self.over_size_threshold and self.is_monotonic:
if not self.is_unique:
return self._get_loc_duplicates(val)
values = self._get_index_values()
loc = _bin_search(values, val) # .searchsorted(val, side='left')
if util.get_value_at(values, loc) != val:
raise KeyError(val)
return loc
self._ensure_mapping_populated()
if not self.unique:
return self._get_loc_duplicates(val)
self._check_type(val)
try:
return self.mapping.get_item(val)
except TypeError:
raise KeyError(val)