Python 类型错误:'<';在';浮动';和';str';在tree.py中
我面临一个奇怪的问题,谢谢你的帮助。我的训练数据集对象是纯float32 numpy数组,由矢量器填充。问题必须是我输入到RandomForestClassifier的参数之一,因为我能够在不传递任何参数的情况下通过它。我确定输入中没有字符串:Python 类型错误:'<';在';浮动';和';str';在tree.py中,python,machine-learning,random-forest,Python,Machine Learning,Random Forest,我面临一个奇怪的问题,谢谢你的帮助。我的训练数据集对象是纯float32 numpy数组,由矢量器填充。问题必须是我输入到RandomForestClassifier的参数之一,因为我能够在不传递任何参数的情况下通过它。我确定输入中没有字符串: X_train memmap([0.25173673, 0.01420455, 0.00684149, ..., 0. , 0. , 0. ], dtype=float32) y_train me
X_train
memmap([0.25173673, 0.01420455, 0.00684149, ..., 0. , 0. ,
0. ], dtype=float32)
y_train
memmap([ 0., 0., 0., ..., -1., 1., 1.], dtype=float32)
但是,当我在数据集上运行RandomForest拟合时,会得到以下结果:
model_RandomForest = ek.RandomForestClassifier(n_estimators = 200, max_depth = 'auto', n_jobs = 1, random_state = 5,max_features = 'auto',min_samples_leaf = 100, verbose=1)
result_RandomForest = model_RandomForest.fit(X_train[train_rows], y_train[train_rows])
跟踪输出:
~\Anaconda3\envs\emberenv\lib\site-packages\sklearn\ensemble\forest.py in fit(self, X, y, sample_weight)
326 t, self, X, y, sample_weight, i, len(trees),
327 verbose=self.verbose, class_weight=self.class_weight)
--> 328 for i, t in enumerate(trees))
329
330 # Collect newly grown trees
~\Anaconda3\envs\emberenv\lib\site-packages\sklearn\externals\joblib\parallel.py in __call__(self, iterable)
777 # was dispatched. In particular this covers the edge
778 # case of Parallel used with an exhausted iterator.
--> 779 while self.dispatch_one_batch(iterator):
780 self._iterating = True
781 else:
~\Anaconda3\envs\emberenv\lib\site-packages\sklearn\externals\joblib\parallel.py in dispatch_one_batch(self, iterator)
623 return False
624 else:
--> 625 self._dispatch(tasks)
626 return True
627
~\Anaconda3\envs\emberenv\lib\site-packages\sklearn\externals\joblib\parallel.py in _dispatch(self, batch)
586 dispatch_timestamp = time.time()
587 cb = BatchCompletionCallBack(dispatch_timestamp, len(batch), self)
--> 588 job = self._backend.apply_async(batch, callback=cb)
589 self._jobs.append(job)
590
~\Anaconda3\envs\emberenv\lib\site-packages\sklearn\externals\joblib\_parallel_backends.py in apply_async(self, func, callback)
109 def apply_async(self, func, callback=None):
110 """Schedule a func to be run"""
--> 111 result = ImmediateResult(func)
112 if callback:
113 callback(result)
~\Anaconda3\envs\emberenv\lib\site-packages\sklearn\externals\joblib\_parallel_backends.py in __init__(self, batch)
330 # Don't delay the application, to avoid keeping the input
331 # arguments in memory
--> 332 self.results = batch()
333
334 def get(self):
~\Anaconda3\envs\emberenv\lib\site-packages\sklearn\externals\joblib\parallel.py in __call__(self)
129
130 def __call__(self):
--> 131 return [func(*args, **kwargs) for func, args, kwargs in self.items]
132
133 def __len__(self):
~\Anaconda3\envs\emberenv\lib\site-packages\sklearn\externals\joblib\parallel.py in <listcomp>(.0)
129
130 def __call__(self):
--> 131 return [func(*args, **kwargs) for func, args, kwargs in self.items]
132
133 def __len__(self):
~\Anaconda3\envs\emberenv\lib\site-packages\sklearn\ensemble\forest.py in _parallel_build_trees(tree, forest, X, y, sample_weight, tree_idx, n_trees, verbose, class_weight)
119 curr_sample_weight *= compute_sample_weight('balanced', y, indices)
120
--> 121 tree.fit(X, y, sample_weight=curr_sample_weight, check_input=False)
122 else:
123 tree.fit(X, y, sample_weight=sample_weight, check_input=False)
~\Anaconda3\envs\emberenv\lib\site-packages\sklearn\tree\tree.py in fit(self, X, y, sample_weight, check_input, X_idx_sorted)
788 sample_weight=sample_weight,
789 check_input=check_input,
--> 790 X_idx_sorted=X_idx_sorted)
791 return self
792
~\Anaconda3\envs\emberenv\lib\site-packages\sklearn\tree\tree.py in fit(self, X, y, sample_weight, check_input, X_idx_sorted)
181 min_samples_leaf = self.min_samples_leaf
182 else: # float
--> 183 if not 0. < self.min_samples_leaf <= 0.5:
184 raise ValueError("min_samples_leaf must be at least 1 "
185 "or in (0, 0.5], got %s"
TypeError: '<' not supported between instances of 'float' and 'str'
~\Anaconda3\envs\emberenv\lib\site packages\sklearn\employee\forest.py适合(自身、X、y、样本重量)
326吨,自身,X,y,样本重量,i,len(树木),
327 verbose=self.verbose,class\u weight=self.class\u weight)
-->328表示枚举中的i,t(树))
329
330#收集新生长的树木
调用中的~\Anaconda3\envs\emberenv\lib\site packages\sklearn\externals\joblib\parallel.py(self,iterable)
777#被派遣。特别是,这覆盖了边缘
778#与耗尽迭代器一起使用的并行情况。
-->779自调度一批时(迭代器):
780自迭代=真
781其他:
~\Anaconda3\envs\emberenv\lib\site packages\sklearn\externals\joblib\parallel.py在dispatch\u one\u批处理中(self,迭代器)
623返回错误
624其他:
-->625自我派遣(任务)
626返回真值
627
~\Anaconda3\envs\emberenv\lib\site packages\sklearn\externals\joblib\parallel.py in\u dispatch(self,batch)
586 dispatch_timestamp=time.time()
587 cb=BatchCompletionCallBack(调度时间戳,len(批处理),self)
-->588 job=self.\u backend.apply\u async(批处理,回调=cb)
589 self.\u jobs.append(作业)
590
异步应用中的~\Anaconda3\envs\emberenv\lib\site packages\sklearn\externals\joblib\\u parallel\u backends.py(self、func、callback)
109 def apply_async(self、func、callback=None):
110“计划要运行的func”
-->111结果=立即结果(func)
112如果回调:
113回调(结果)
~\Anaconda3\envs\emberenv\lib\site packages\sklearn\externals\joblib\u parallel\u backends.py in\uuuuuu init\uuuuu(self,batch)
330#不要延迟应用程序,以免保留输入
331#内存中的参数
-->332 self.results=batch()
333
334 def get(自我):
~\Anaconda3\envs\emberenv\lib\site packages\sklearn\externals\joblib\parallel.py in\uuuuu调用(self)
129
130 def呼叫(自我):
-->131返回[func(*args,**kwargs),用于self.items中的func、args、kwargs]
132
133定义长度(自):
~\Anaconda3\envs\emberenv\lib\site packages\sklearn\externals\joblib\parallel.py in(.0)
129
130 def呼叫(自我):
-->131返回[func(*args,**kwargs),用于self.items中的func、args、kwargs]
132
133定义长度(自):
~\Anaconda3\envs\emberenv\lib\site packages\sklearn\employee\forest.py in\u parallel\u build\u trees(树、林、X、y、样本权重、树idx、n\u trees、详细、类权重)
119当前样本权重*=计算样本权重('平衡',y,指数)
120
-->121树拟合(X,y,样本重量=当前样本重量,检查输入=假)
122.其他:
123树拟合(X,y,样本重量=样本重量,检查输入=假)
~\Anaconda3\envs\emberenv\lib\site packages\sklearn\tree\tree.py适合(self、X、y、样本重量、检查输入、X\u idx\u排序)
788样品重量=样品重量,
789检查输入=检查输入,
-->790 X_idx_排序=X_idx_排序)
791回归自我
792
~\Anaconda3\envs\emberenv\lib\site packages\sklearn\tree\tree.py适合(self、X、y、样本重量、检查输入、X\u idx\u排序)
181 min\u samples\u leaf=self.min\u samples\u leaf
182其他:#浮动
-->183如果不是0self.min\u samples\u leaf它显示为self.min\u samples\u leaf
是一个str。。您可以尝试float(self.min\u samples\u leaf)
min\u samples\u leaf属于sklearn包。输入格式正确。您可以再次检查这一行:model\u RandomForest=ek.RandomForestClassifier(n\u估计器=200,最大深度='auto',n\u作业=1,随机状态=5,最大特征='auto',最小样本叶=100,冗余=1)
,特别是min\u样本叶=100
?我这样问是因为很明显,这个检查返回了False-不知何故,min\u samples\u leaf
并没有被解释为整数。如果我要做一个大胆的推测,你从某处加载参数,而不是进行转换。虽然我没有将其作为参数传递,但我创建了一个新的var,并将其转换为integer和float。两者都导致了错误=/Now,请原谅我的长期怀疑,但是scikit学习代码没有为min\u samples\u leaf
留下空间,使其成为字符串,除非您传递字符串。此代码返回整数的True
:isinstance(min_samples\u leaf,(numbers.Integral,np.integer))
-您在该检查中明显得到False
。不管这听起来多么愚蠢,你能确保你真的传递了一个整数吗,就像你发布的代码一样?