Warning: file_get_contents(/data/phpspider/zhask/data//catemap/7/jsf/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何有效地比较所有模型的准确性_Python_Pandas_Scikit Learn - Fatal编程技术网

Python 如何有效地比较所有模型的准确性

Python 如何有效地比较所有模型的准确性,python,pandas,scikit-learn,Python,Pandas,Scikit Learn,我已经分割了训练数据并初始化了11个分类器模型,现在我想对它们进行比较 我在Ubuntu 18.04上运行VS代码 我试过: # Prepare lists models = [ran, knn, log, xgb, gbc, svc, ext, ada, gnb, gpc, bag] scores = [] # Sequentially fit and cross validate all models for mod in models: mod.fit(X_tr

我已经分割了训练数据并初始化了11个分类器模型,现在我想对它们进行比较

我在Ubuntu 18.04上运行VS代码

我试过:

# Prepare lists
models = [ran, knn, log, xgb, gbc, svc, ext, ada, gnb, gpc, bag]         
scores = []

# Sequentially fit and cross validate all models
for mod in models:
    mod.fit(X_train, y_train)
    acc = cross_val_score(mod, X_train, y_train, scoring = 
    "accuracy", cv = 10)
scores.append(acc.mean())

# Creating a table of results, ranked highest to lowest
results = pd.DataFrame({
    'Model': ['Random Forest', 'K Nearest Neighbour', 'Logistic 
     Regression', 'XGBoost', 'Gradient Boosting', 'SVC', 'Extra 
     Trees', 'AdaBoost', 'Gaussian Naive Bayes', 'Gaussian Process', 
     'Bagging Classifier'],
     'Score': scores})
返回最后一部分:

ValueError:数组的长度必须相同

我已经计算了2倍,实际上有11个模型


我缺少什么?

您的代码中似乎有缩进错误,请参见下面编辑的代码。在代码中,如果执行
len(scores)
操作,将得到
1
,因为在循环外调用append时,只添加最后一个值

# Prepare lists
models = [ran, knn, log, xgb, gbc, svc, ext, ada, gnb, gpc, bag]         
scores = []

# Sequentially fit and cross validate all models
for mod in models:
    mod.fit(X_train, y_train)
    acc = cross_val_score(mod, X_train, y_train, scoring = 
    "accuracy", cv = 10)
    scores.append(acc.mean())

在对上一个答案进行了投票之后,我继续证明错误确实是由于您的
分数。append()
超出了您的
for
循环:

我们不需要实际适合任何模型;我们可以通过对代码进行以下修改来模拟这种情况,这些修改不会改变问题的本质:

import numpy as np
import pandas as pd

models = ['ran', 'knn', 'log', 'xgb', 'gbc', 'svc', 'ext', 'ada', 'gnb', 'gpc', 'bag']         
scores = []
cv=10

# Sequentially fit and cross validate all models
for mod in models:
    acc = np.array([np.random.rand() for i in range(cv)]) # simulate your accuracy here
scores.append(acc.mean()) # as in your code, i.e outside the for loop

# Create a dataframe of results
results = pd.DataFrame({
    'Model': ['Random Forest', 'K Nearest Neighbour', 'Logistic Regression', 'XGBoost', 'Gradient Boosting',  
    'SVC', 'Extra Trees', 'AdaBoost', 'Gaussian Naive Bayes', 'Gaussian Process', 'Bagging Classifier'],
    'Score': scores})
for mod in models:
    acc = np.array([np.random.rand() for i in range(cv)])
    scores.append(acc.mean()) # moved inside the loop

# Create a dataframe of results
results = pd.DataFrame({
    'Model': ['Random Forest', 'K Nearest Neighbour', 'Logistic Regression', 'XGBoost', 'Gradient Boosting',  
    'SVC', 'Extra Trees', 'AdaBoost', 'Gaussian Naive Bayes', 'Gaussian Process', 'Bagging Classifier'],
    'Score': scores})
print(results)
# output:
                   Model     Score
0          Random Forest  0.492364
1    K Nearest Neighbour  0.624068
2    Logistic Regression  0.613653
3                XGBoost  0.536488
4      Gradient Boosting  0.484195
5                    SVC  0.381556
6            Extra Trees  0.274922
7               AdaBoost  0.509297
8   Gaussian Naive Bayes  0.362866
9       Gaussian Process  0.606538
10    Bagging Classifier  0.393950
毫不奇怪,这实际上复制了您的错误:

ValueError: arrays must all be same length
因为,正如在另一个答案中已经指出的,您的
分数
列表只有一个元素,即仅来自循环最后一次迭代的
acc.mean()

len(scores)
# 1
scores
# [0.47317491043203785]
因此熊猫抱怨,因为它无法填充11行数据帧

for
循环中移动
scores.append()
,正如在另一个答案中所建议的那样,解决了以下问题:

import numpy as np
import pandas as pd

models = ['ran', 'knn', 'log', 'xgb', 'gbc', 'svc', 'ext', 'ada', 'gnb', 'gpc', 'bag']         
scores = []
cv=10

# Sequentially fit and cross validate all models
for mod in models:
    acc = np.array([np.random.rand() for i in range(cv)]) # simulate your accuracy here
scores.append(acc.mean()) # as in your code, i.e outside the for loop

# Create a dataframe of results
results = pd.DataFrame({
    'Model': ['Random Forest', 'K Nearest Neighbour', 'Logistic Regression', 'XGBoost', 'Gradient Boosting',  
    'SVC', 'Extra Trees', 'AdaBoost', 'Gaussian Naive Bayes', 'Gaussian Process', 'Bagging Classifier'],
    'Score': scores})
for mod in models:
    acc = np.array([np.random.rand() for i in range(cv)])
    scores.append(acc.mean()) # moved inside the loop

# Create a dataframe of results
results = pd.DataFrame({
    'Model': ['Random Forest', 'K Nearest Neighbour', 'Logistic Regression', 'XGBoost', 'Gradient Boosting',  
    'SVC', 'Extra Trees', 'AdaBoost', 'Gaussian Naive Bayes', 'Gaussian Process', 'Bagging Classifier'],
    'Score': scores})
print(results)
# output:
                   Model     Score
0          Random Forest  0.492364
1    K Nearest Neighbour  0.624068
2    Logistic Regression  0.613653
3                XGBoost  0.536488
4      Gradient Boosting  0.484195
5                    SVC  0.381556
6            Extra Trees  0.274922
7               AdaBoost  0.509297
8   Gaussian Naive Bayes  0.362866
9       Gaussian Process  0.606538
10    Bagging Classifier  0.393950

您可能还需要记住,您不需要代码中的
model.fit()
部分-
cross\u val\u score
进行所有必要的拟合本身…

错误确切地出现在哪里?请包含完整的错误跟踪…@desertnaut返回熊猫数据帧的错误。您是否检查了以下答案(即移动
分数。追加()
for
循环的其余部分内联)?欢迎使用SO;如果其中一个答案解决了你的问题,请接受它(请看)没有一个答案好到可以接受?不,这不是问题所在。返回代码最后一部分pandas dataframe的错误。@StanislavJirak这并不意味着答案是错误的;正如您的代码所示,
分数
变成了一个单一元素列表(即,您只从for循环中附加最后一个
acc.mean()
),这确实会产生您报告的错误;请包含完整的错误跟踪…@StanislavJirak,您可以检查代码中
分数的长度,该长度将为
1
。您试图创建一个数据帧,其中一列中有
11
条目,另一列中有
1
条目,这会引发错误。如果抛出错误,则会发生该错误,因为列的长度不正确。我现在明白了。但是如何创建一个包含11列的空数组呢?我试过密码,你什么意思?为什么要创建空数组?这里建议的补救办法解决了您的问题,请参见我的答案以获得佐证…,很好的解释@StanislavJirak希望您理解代码的错误。