Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/331.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python Kneighbors回归器拟合数据帧_Python_Scikit Learn - Fatal编程技术网

Python Kneighbors回归器拟合数据帧

Python Kneighbors回归器拟合数据帧,python,scikit-learn,Python,Scikit Learn,我创建一个数据帧: import numpy as np import pandas as pd from sklearn.neighbors import KNeighborsRegressor data = pd.DataFrame(np.random.randint(0,10,(10,5)), columns=list('abcde')) data.c[:5] = 0 data.c[5:] = 1 data.a = np.arange(5).tolist()+np.arange(5

我创建一个数据帧:

import numpy as np
import pandas as pd
from sklearn.neighbors import KNeighborsRegressor    
data = pd.DataFrame(np.random.randint(0,10,(10,5)), columns=list('abcde'))
data.c[:5] = 0
data.c[5:] = 1
data.a = np.arange(5).tolist()+np.arange(5).tolist()
data=data.set_index(list('ac'))
data = data.unstack('c')
然后定义一个度量函数:

myfunc = lambda a, b: ((a - b)**2).sum(axis=1, level=[0]).apply(np.sqrt).sum(axis=1).values
我想要的是对列按级别0对数据帧求和,并应用sqrt,最后对所有列求和。 它可以处理如下自定义代码:

b = data.iloc[-1]
myfunc(data,b)
#output:array([ 18.09035957,  12.62123278,  20.45561243,  14.29386508,   0.        ])
但在
KneighborsRegressionor
中使用
myfunc
作为度量,会产生错误。这是不是意味着Kneighbors回归器的类不能适应数据帧?有人能帮我吗,谢谢

knn = KNeighborsRegressor(n_neighbors=3, metric=myfunc)
knn.fit(a, np.arange(5))
knn.predict(b)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-23-33f10a670994> in <module>()
      1 knn = KNeighborsRegressor(n_neighbors=3, metric=myfunc)
      2 knn.fit(a, np.arange(5))
----> 3 knn.predict(b)

C:\Anaconda3\lib\site-packages\sklearn\neighbors\regression.py in predict(self, X)
    142         X = check_array(X, accept_sparse='csr')
    143 
--> 144         neigh_dist, neigh_ind = self.kneighbors(X)
    145 
    146         weights = _get_weights(neigh_dist, self.weights)

C:\Anaconda3\lib\site-packages\sklearn\neighbors\base.py in kneighbors(self, X, n_neighbors, return_distance)
    355                 dist = pairwise_distances(
    356                     X, self._fit_X, self.effective_metric_, n_jobs=n_jobs,
--> 357                     **self.effective_metric_params_)
    358 
    359             neigh_ind = argpartition(dist, n_neighbors - 1, axis=1)

C:\Anaconda3\lib\site-packages\sklearn\metrics\pairwise.py in pairwise_distances(X, Y, metric, n_jobs, **kwds)
   1238         func = partial(distance.cdist, metric=metric, **kwds)
   1239 
-> 1240     return _parallel_pairwise(X, Y, func, n_jobs, **kwds)
   1241 
   1242 

C:\Anaconda3\lib\site-packages\sklearn\metrics\pairwise.py in _parallel_pairwise(X, Y, func, n_jobs, **kwds)
   1081     if n_jobs == 1:
   1082         # Special case to avoid picklability checks in delayed
-> 1083         return func(X, Y, **kwds)
   1084 
   1085     # TODO: in some cases, backend='threading' may be appropriate

C:\Anaconda3\lib\site-packages\sklearn\metrics\pairwise.py in _pairwise_callable(X, Y, metric, **kwds)
   1119         iterator = itertools.product(range(X.shape[0]), range(Y.shape[0]))
   1120         for i, j in iterator:
-> 1121             out[i, j] = metric(X[i], Y[j], **kwds)
   1122 
   1123     return out

<ipython-input-17-7f81015a2d21> in <lambda>(a, b)
----> 1 myfunc = lambda a, b: ((a - b)**2).sum(axis=1, level=[0]).apply(np.sqrt).sum(axis=1).values

TypeError: _sum() got an unexpected keyword argument 'level'
knn=kneighbors回归器(n_近邻=3,度量=myfunc)
knn.fit(a,np.arange(5))
knn.预测(b)
---------------------------------------------------------------------------
TypeError回溯(最近一次调用上次)
在()
1 knn=Kneighbors回归器(n_近邻=3,度量=myfunc)
2 knn.fit(a,np.arange(5))
---->3 knn.预测(b)
C:\Anaconda3\lib\site packages\sklearn\neights\regression.py in predict(self,X)
142 X=检查数组(X,接受
143
-->144 neigh_dist,neigh_ind=self.kneighbors(X)
145
146权重=\u获取\u权重(相邻距离,自身权重)
C:\Anaconda3\lib\site packages\sklearn\neighbors中的neighbors\base.py(self、X、n\u neighbors、return\u distance)
355距离=成对距离(
356 X,自拟合,自有效度量,n个工作=n个工作,
-->357**自生效(度量参数)
358
359 neigh_ind=argpartition(dist,n_邻居-1,axis=1)
C:\Anaconda3\lib\site packages\sklearn\metrics\pairwise.py,成对距离(X、Y、metric、n\u作业,**kwds)
1238 func=部分(distance.cdist,metric=metric,**kwds)
1239
->1240返回并行成对(X、Y、func、n个作业,**KWD)
1241
1242
C:\Anaconda3\lib\site packages\sklearn\metrics\pairwise.py成对并行(X,Y,func,n_作业,**kwds)
1081如果n_作业==1:
1082#特殊情况,避免延迟的可拾取性检查
->1083返回函数(X,Y,**kwds)
1084
1085#待办事项:在某些情况下,后端class='threading'可能是合适的
C:\Anaconda3\lib\site packages\sklearn\metrics\pairwise.py in\u pairwise\u callable(X,Y,metric,**kwds)
1119迭代器=itertools.product(范围(X.shape[0]),范围(Y.shape[0]))
迭代器中i,j的1120:
->1121 out[i,j]=公制(X[i],Y[j],**kwds)
1122
1123返回
在(a,b)中
---->1 myfunc=lambda,b:((a-b)**2).sum(轴=1,级别=[0]).apply(np.sqrt).sum(轴=1).值
TypeError:_sum()获得意外的关键字参数“level”

如果您从sklearn.metrics导入make_scorer添加
,然后在定义
myfunc
之后,如果您使用
myfunc=make_scorer(myfunc)
。。。那么它能工作吗?@Max Power,“lambda”可能等同于“def”。如果type:“type(myfunc)”,则其输出为function。@MaxPower此处的度量与记分器不同。它应该是距离度量对象。它不遵循其他分类分数(如度量值)。由于Vivek的更正,数据帧在内部(在所有估计器中)转换为numpy数组,因此numpy.sum()中不接受
级别
参数,因此出现错误。