Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/278.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python ValueError:目标为多类,但平均值=';二进制';。请选择其他平均值设置_Python_Machine Learning_Scikit Learn - Fatal编程技术网

Python ValueError:目标为多类,但平均值=';二进制';。请选择其他平均值设置

Python ValueError:目标为多类,但平均值=';二进制';。请选择其他平均值设置,python,machine-learning,scikit-learn,Python,Machine Learning,Scikit Learn,我正在使用一个结合在一起的tweet训练和测试数据集。(combi=train.append(测试,忽略索引=True) 培训csv手动标记了-1、0和1(基本上为负、中性和正)的情绪,而测试没有 我希望代码使用逻辑回归来输出f1分数,但出现了一个问题:f1_分数(yvalid,prediction_int)被使用: 我的代码如下: from sklearn.feature_extraction.text import CountVectorizer bow_vectorizer = Count

我正在使用一个结合在一起的tweet训练和测试数据集。(combi=train.append(测试,忽略索引=True)

培训csv手动标记了-1、0和1(基本上为负、中性和正)的情绪,而测试没有

我希望代码使用逻辑回归来输出f1分数,但出现了一个问题:f1_分数(yvalid,prediction_int)被使用:

我的代码如下:

from sklearn.feature_extraction.text import CountVectorizer
bow_vectorizer = CountVectorizer(max_df=0.90, min_df=2,        max_features=1000, stop_words='english')
bow = bow_vectorizer.fit_transform(combi['tidy_tweet'])

from sklearn.feature_extraction.text import TfidfVectorizer
tfidf_vectorizer = TfidfVectorizer(max_df=0.90, min_df=2, max_features=1000, stop_words='english')
tfidf = tfidf_vectorizer.fit_transform(combi['tidy_tweet'])

from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import f1_score

train_bow = bow[:1300,:]
test_bow = bow[1300:,:]

xtrain_bow, xvalid_bow, ytrain, yvalid =     train_test_split(train_bow, train['label'], random_state=42,  test_size=0.3)

lreg = LogisticRegression()
lreg.fit(xtrain_bow, ytrain) # training the model

prediction = lreg.predict_proba(xvalid_bow) 
prediction_int = prediction[:,1] >= 0.3 
prediction_int = prediction_int.astype(np.int)

f1_score(yvalid, prediction_int)
阅读相关文档,您将看到
f1_score
中参数
average
的默认值为
binary
;由于此处未指定该值,因此它采用该默认值,但对于多类分类的情况无效(同意,这可能是一个错误的设计选择)

正如错误消息所建议的,您应该显式地选择并指定文档中显示的其他可用参数之一;以下是具有虚拟多类数据的文档示例:

from sklearn.metrics import f1_score
# dummy multi-class data, similar to yours:
y_true = [0, 1, 2, 0, 1, 2]
y_pred = [0, 2, 1, 0, 0, 1]

f1_score(y_true, y_pred, average='macro') 
# 0.26666666666666666

f1_score(y_true, y_pred, average='micro')
# 0.33333333333333331

f1_score(y_true, y_pred, average='weighted') 
# 0.26666666666666666

f1_score(y_true, y_pred) 
# ValueError: Target is multiclass but average='binary'. Please choose another average setting.