Python 类型错误:'<';在';浮动';和';str';在tree.py中

Python 类型错误:'<';在';浮动';和';str';在tree.py中,python,machine-learning,random-forest,Python,Machine Learning,Random Forest,我面临一个奇怪的问题,谢谢你的帮助。我的训练数据集对象是纯float32 numpy数组,由矢量器填充。问题必须是我输入到RandomForestClassifier的参数之一,因为我能够在不传递任何参数的情况下通过它。我确定输入中没有字符串: X_train memmap([0.25173673, 0.01420455, 0.00684149, ..., 0. , 0. , 0. ], dtype=float32) y_train me

我面临一个奇怪的问题,谢谢你的帮助。我的训练数据集对象是纯float32 numpy数组,由矢量器填充。问题必须是我输入到RandomForestClassifier的参数之一,因为我能够在不传递任何参数的情况下通过它。我确定输入中没有字符串:

X_train
memmap([0.25173673, 0.01420455, 0.00684149, ..., 0.        , 0.        ,
        0.        ], dtype=float32)
y_train
memmap([ 0.,  0.,  0., ..., -1.,  1.,  1.], dtype=float32)
但是,当我在数据集上运行RandomForest拟合时,会得到以下结果:

model_RandomForest = ek.RandomForestClassifier(n_estimators = 200, max_depth = 'auto', n_jobs = 1, random_state = 5,max_features = 'auto',min_samples_leaf = 100, verbose=1)  
result_RandomForest = model_RandomForest.fit(X_train[train_rows], y_train[train_rows]) 
跟踪输出:

~\Anaconda3\envs\emberenv\lib\site-packages\sklearn\ensemble\forest.py in fit(self, X, y, sample_weight)
    326                     t, self, X, y, sample_weight, i, len(trees),
    327                     verbose=self.verbose, class_weight=self.class_weight)
--> 328                 for i, t in enumerate(trees))
    329 
    330             # Collect newly grown trees
~\Anaconda3\envs\emberenv\lib\site-packages\sklearn\externals\joblib\parallel.py in __call__(self, iterable)
    777             # was dispatched. In particular this covers the edge
    778             # case of Parallel used with an exhausted iterator.
--> 779             while self.dispatch_one_batch(iterator):
    780                 self._iterating = True
    781             else:

~\Anaconda3\envs\emberenv\lib\site-packages\sklearn\externals\joblib\parallel.py in dispatch_one_batch(self, iterator)
    623                 return False
    624             else:
--> 625                 self._dispatch(tasks)
    626                 return True
    627 

~\Anaconda3\envs\emberenv\lib\site-packages\sklearn\externals\joblib\parallel.py in _dispatch(self, batch)
    586         dispatch_timestamp = time.time()
    587         cb = BatchCompletionCallBack(dispatch_timestamp, len(batch), self)
--> 588         job = self._backend.apply_async(batch, callback=cb)
    589         self._jobs.append(job)
    590 

~\Anaconda3\envs\emberenv\lib\site-packages\sklearn\externals\joblib\_parallel_backends.py in apply_async(self, func, callback)
    109     def apply_async(self, func, callback=None):
    110         """Schedule a func to be run"""
--> 111         result = ImmediateResult(func)
    112         if callback:
    113             callback(result)

~\Anaconda3\envs\emberenv\lib\site-packages\sklearn\externals\joblib\_parallel_backends.py in __init__(self, batch)
    330         # Don't delay the application, to avoid keeping the input
    331         # arguments in memory
--> 332         self.results = batch()
    333 
    334     def get(self):

~\Anaconda3\envs\emberenv\lib\site-packages\sklearn\externals\joblib\parallel.py in __call__(self)
    129 
    130     def __call__(self):
--> 131         return [func(*args, **kwargs) for func, args, kwargs in self.items]
    132 
    133     def __len__(self):

~\Anaconda3\envs\emberenv\lib\site-packages\sklearn\externals\joblib\parallel.py in <listcomp>(.0)
    129 
    130     def __call__(self):
--> 131         return [func(*args, **kwargs) for func, args, kwargs in self.items]
    132 
    133     def __len__(self):

~\Anaconda3\envs\emberenv\lib\site-packages\sklearn\ensemble\forest.py in _parallel_build_trees(tree, forest, X, y, sample_weight, tree_idx, n_trees, verbose, class_weight)
    119             curr_sample_weight *= compute_sample_weight('balanced', y, indices)
    120 
--> 121         tree.fit(X, y, sample_weight=curr_sample_weight, check_input=False)
    122     else:
    123         tree.fit(X, y, sample_weight=sample_weight, check_input=False)

~\Anaconda3\envs\emberenv\lib\site-packages\sklearn\tree\tree.py in fit(self, X, y, sample_weight, check_input, X_idx_sorted)
    788             sample_weight=sample_weight,
    789             check_input=check_input,
--> 790             X_idx_sorted=X_idx_sorted)
    791         return self
    792 

~\Anaconda3\envs\emberenv\lib\site-packages\sklearn\tree\tree.py in fit(self, X, y, sample_weight, check_input, X_idx_sorted)
    181             min_samples_leaf = self.min_samples_leaf
    182         else:  # float
--> 183             if not 0. < self.min_samples_leaf <= 0.5:
    184                 raise ValueError("min_samples_leaf must be at least 1 "
    185                                  "or in (0, 0.5], got %s"

TypeError: '<' not supported between instances of 'float' and 'str'
~\Anaconda3\envs\emberenv\lib\site packages\sklearn\employee\forest.py适合(自身、X、y、样本重量)
326吨,自身,X,y,样本重量,i,len(树木),
327 verbose=self.verbose,class\u weight=self.class\u weight)
-->328表示枚举中的i,t(树))
329
330#收集新生长的树木
调用中的~\Anaconda3\envs\emberenv\lib\site packages\sklearn\externals\joblib\parallel.py(self,iterable)
777#被派遣。特别是,这覆盖了边缘
778#与耗尽迭代器一起使用的并行情况。
-->779自调度一批时(迭代器):
780自迭代=真
781其他:
~\Anaconda3\envs\emberenv\lib\site packages\sklearn\externals\joblib\parallel.py在dispatch\u one\u批处理中(self,迭代器)
623返回错误
624其他:
-->625自我派遣(任务)
626返回真值
627
~\Anaconda3\envs\emberenv\lib\site packages\sklearn\externals\joblib\parallel.py in\u dispatch(self,batch)
586 dispatch_timestamp=time.time()
587 cb=BatchCompletionCallBack(调度时间戳,len(批处理),self)
-->588 job=self.\u backend.apply\u async(批处理,回调=cb)
589 self.\u jobs.append(作业)
590
异步应用中的~\Anaconda3\envs\emberenv\lib\site packages\sklearn\externals\joblib\\u parallel\u backends.py(self、func、callback)
109 def apply_async(self、func、callback=None):
110“计划要运行的func”
-->111结果=立即结果(func)
112如果回调:
113回调(结果)
~\Anaconda3\envs\emberenv\lib\site packages\sklearn\externals\joblib\u parallel\u backends.py in\uuuuuu init\uuuuu(self,batch)
330#不要延迟应用程序,以免保留输入
331#内存中的参数
-->332 self.results=batch()
333
334 def get(自我):
~\Anaconda3\envs\emberenv\lib\site packages\sklearn\externals\joblib\parallel.py in\uuuuu调用(self)
129
130 def呼叫(自我):
-->131返回[func(*args,**kwargs),用于self.items中的func、args、kwargs]
132
133定义长度(自):
~\Anaconda3\envs\emberenv\lib\site packages\sklearn\externals\joblib\parallel.py in(.0)
129
130 def呼叫(自我):
-->131返回[func(*args,**kwargs),用于self.items中的func、args、kwargs]
132
133定义长度(自):
~\Anaconda3\envs\emberenv\lib\site packages\sklearn\employee\forest.py in\u parallel\u build\u trees(树、林、X、y、样本权重、树idx、n\u trees、详细、类权重)
119当前样本权重*=计算样本权重('平衡',y,指数)
120
-->121树拟合(X,y,样本重量=当前样本重量,检查输入=假)
122.其他:
123树拟合(X,y,样本重量=样本重量,检查输入=假)
~\Anaconda3\envs\emberenv\lib\site packages\sklearn\tree\tree.py适合(self、X、y、样本重量、检查输入、X\u idx\u排序)
788样品重量=样品重量,
789检查输入=检查输入,
-->790 X_idx_排序=X_idx_排序)
791回归自我
792
~\Anaconda3\envs\emberenv\lib\site packages\sklearn\tree\tree.py适合(self、X、y、样本重量、检查输入、X\u idx\u排序)
181 min\u samples\u leaf=self.min\u samples\u leaf
182其他:#浮动

-->183如果不是0self.min\u samples\u leaf它显示为
self.min\u samples\u leaf
是一个str。。您可以尝试
float(self.min\u samples\u leaf)
min\u samples\u leaf属于sklearn包。输入格式正确。您可以再次检查这一行:
model\u RandomForest=ek.RandomForestClassifier(n\u估计器=200,最大深度='auto',n\u作业=1,随机状态=5,最大特征='auto',最小样本叶=100,冗余=1)
,特别是
min\u样本叶=100
?我这样问是因为很明显,这个检查返回了False-不知何故,
min\u samples\u leaf
并没有被解释为整数。如果我要做一个大胆的推测,你从某处加载参数,而不是进行转换。虽然我没有将其作为参数传递,但我创建了一个新的var,并将其转换为integer和float。两者都导致了错误=/Now,请原谅我的长期怀疑,但是scikit学习代码没有为
min\u samples\u leaf
留下空间,使其成为字符串,除非您传递字符串。此代码返回整数的
True
isinstance(min_samples\u leaf,(numbers.Integral,np.integer))
-您在该检查中明显得到
False
。不管这听起来多么愚蠢,你能确保你真的传递了一个整数吗,就像你发布的代码一样?