随机森林分类器批学习Python维数错误_Python_Scikit Learn_Random Forest_Multiclass Classification

随机森林分类器批学习Python维数错误

python scikit-learn

随机森林分类器批学习Python维数错误,python,scikit-learn,random-forest,multiclass-classification,Python,Scikit Learn,Random Forest,Multiclass Classification,我有一个大数据框，大约有一百万条记录和19个特性（+1个目标变量）。由于记忆错误，我无法训练我的RF分类器（这是一个多类分类，大约有750个类），因此我求助于批量学习。模型训练得很好，但当我运行model.predict命令时，它会给我以下ValueError： ValueError:操作数无法与形状（231106628）（231106620）（231106628）一起广播。我的代码如下： #Splitting into Dependent and Independent Variables

我有一个大数据框，大约有一百万条记录和19个特性（+1个目标变量）。由于记忆错误，我无法训练我的RF分类器（这是一个多类分类，大约有750个类），因此我求助于批量学习。模型训练得很好，但当我运行

model.predict

命令时，它会给我以下ValueError：

ValueError:操作数无法与形状（231106628）（231106620）（231106628）一起广播。

我的代码如下：

#Splitting into Dependent and Independent Variables

X= df.iloc[:,1:]
y= df.iloc[:,0]

#Train-test Split

train_X, test_X, train_y, test_y = train_test_split(X,y,test_size=0.25,random_state=1234) 

data_splits= zip(np.array_split(train_X,6),np.array_split(train_y,6))

rf_clf= RandomForestClassifier(warm_start=True, n_estimators=1,criterion='entropy',random_state=1234)

for i in range(10): #10 passes through the data
    for X,y in data_splits:
        rf_clf.fit(X,y)
        rf_clf.n_estimators +=1 # increment by one, so next will add 1 tree

y_preds= rf_clf.predict(test_X)

如果有任何帮助，我将不胜感激。欢迎提出任何其他建议。

找到了答案。这是由于数据批处理中y变量类的不一致性造成的。

找到了答案。这是由于数据批次中y变量类的不一致性造成的。

打印文本的形状\u X形状为（231106,19）打印文本的形状\u X形状为（231106,19）