Python Scikit学习:“;y中填充最少的类只有1个成员“;

Python Scikit学习:“;y中填充最少的类只有1个成员“;,python,machine-learning,scikit-learn,Python,Machine Learning,Scikit Learn,我正在尝试使用Scikit learn进行随机森林回归。使用Pandas加载数据后的第一步是将数据拆分为测试集和训练集。但是,我得到了一个错误: y中填充最少的类只有1个成员 我搜索过谷歌,发现了各种各样的错误实例,但我似乎仍然无法理解这个错误的含义 training_file = "training_data.txt" data = pd.read_csv(training_file, sep='\t') y = data.Result X = data.drop('Result', axi

我正在尝试使用Scikit learn进行随机森林回归。使用Pandas加载数据后的第一步是将数据拆分为测试集和训练集。但是,我得到了一个错误:

y中填充最少的类只有1个成员

我搜索过谷歌,发现了各种各样的错误实例,但我似乎仍然无法理解这个错误的含义

training_file = "training_data.txt"
data = pd.read_csv(training_file, sep='\t')

y = data.Result
X = data.drop('Result', axis=1)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=123, stratify=y)

pipeline = make_pipeline(preprocessing.StandardScaler(), RandomForestRegressor(n_estimators=100))

hyperparameters = { 'randomforestregressor__max_features' : ['auto', 'sqrt', 'log2'],
                'randomforestregressor__max_depth' : [None, 5, 3, 1] }

model = GridSearchCV(pipeline, hyperparameters, cv=10)

model.fit(X_train, y_train)

prediction = model.predict(X_test)

joblib.dump(model, 'ms5000.pkl')
train\u test\u split
方法产生此堆栈跟踪:

Traceback (most recent call last):
    File "/Users/justin.shapiro/Desktop/IPML_Model/model_definition.py", line 18, in <module>
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.22, random_state=123, stratify=y)
  File "/Library/Python/2.7/site-packages/sklearn/model_selection/_split.py", line 1700, in train_test_split
train, test = next(cv.split(X=arrays[0], y=stratify))
  File "/Library/Python/2.7/site-packages/sklearn/model_selection/_split.py", line 953, in split
for train, test in self._iter_indices(X, y, groups):
  File "/Library/Python/2.7/site-packages/sklearn/model_selection/_split.py", line 1259, in _iter_indices
raise ValueError("The least populated class in y has only 1"
ValueError: The least populated class in y has only 1 member, which is too few. The minimum number of groups for any class cannot be less than 2.

这是什么错误?我如何才能消除它?

这一行中出现的错误:

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.22, random_state=123, stratify=y)

尝试删除此行中出现的错误:

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.22, random_state=123, stratify=y)

尝试删除
stratify=y

@jshapy8解释:这是一个回归问题,stratify用于根据提供的标签(用于分类)划分列车和测试。因此,您不能在此使用分层。@jshapy8说明:这是一个回归问题,分层用于根据提供的标签(用于分类)划分列车和测试。因此,在这里不能使用分层。