Python ValueError:应为2D数组,但在拟合模型时得到了1D数组

Python ValueError:应为2D数组,但在拟合模型时得到了1D数组,python,machine-learning,scikit-learn,Python,Machine Learning,Scikit Learn,我正在尝试从yellowbrick创建一个加载数据集的模型;在我将数据分割成训练和测试数据集之后,我编写了以下代码以适应模型。但是,我得到了一个ValueError错误:预期的是2D数组,而不是1D数组。使用数组重塑数据。如果数据具有单个特征或数组,则重塑(-1,1)。如果数据包含单个样本,则重塑(1,-1)。 我不知道为什么。有人能帮忙吗?代码如下: from sklearn.naive_bayes import GaussianNB from sklearn.preprocessing im

我正在尝试从yellowbrick创建一个加载数据集的模型;在我将数据分割成训练和测试数据集之后,我编写了以下代码以适应模型。但是,我得到了一个ValueError错误:预期的是2D数组,而不是1D数组。使用数组重塑数据。如果数据具有单个特征或数组,则重塑(-1,1)。如果数据包含单个样本,则重塑(1,-1)。 我不知道为什么。有人能帮忙吗?代码如下:

from sklearn.naive_bayes import GaussianNB
from sklearn.preprocessing import LabelEncoder
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import train_test_split as tts


#corpus = load_hobbies()
#X = TfidfVectorizer().fit_transform(corpus.data)
#y = LabelEncoder().fit_transform(corpus.target)
#
#X_train, X_test, y_train, y_test = tts(X, y, test_size=0.2)
#
#model = MultinomialNB().fit(X_train, y_train)
#model.score(X_test, y_test)

corpus = load_hobbies()
X = corpus.data
y = corpus.target
#
X_train, X_test, y_train, y_test = tts(X, y, test_size=0.2)
#
model = GaussianNB()
model.fit(X_train, y_train)```





虽然导入了
TfidfVectorizer
,但它看起来并不像您使用过它

X=corpus.data
返回包含所有文档内容的字符串列表。您需要使用
TfidfVectorizer
将此原始文档集合转换为矩阵

您还需要使用
X.toarray()
将此稀疏矩阵转换为密集矩阵

完成此操作后,您应该能够正确地拟合模型并使用Yellowbrick进行可视化

例如:

import numpy as np

from yellowbrick.datasets import load_hobbies
from yellowbrick.classifier import ClassificationReport

from sklearn.naive_bayes import GaussianNB
from sklearn.preprocessing import LabelEncoder
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import train_test_split as tts

# Load the data and create document vectors
corpus = load_hobbies()
tfidf = TfidfVectorizer()

X = tfidf.fit_transform(corpus.data)
y = corpus.target

# Turn sparse matrix into dense matrix
X = X.toarray()

# Split data into training and testing
X_train, X_test, y_train, y_test = tts(X, y, test_size=0.2, random_state=42)

# Instantiate the classification model and visualizer
model = GaussianNB()
visualizer = ClassificationReport(model, support=True)

visualizer.fit(X_train, y_train)        # Fit the visualizer and the model
visualizer.score(X_test, y_test)        # Evaluate the model on the test data
visualizer.show()                       # Finalize and show the figure

尽管您导入了
TfidfVectorizer
,但它看起来并不像您使用过它

X=corpus.data
返回包含所有文档内容的字符串列表。您需要使用
TfidfVectorizer
将此原始文档集合转换为矩阵

您还需要使用
X.toarray()
将此稀疏矩阵转换为密集矩阵

完成此操作后,您应该能够正确地拟合模型并使用Yellowbrick进行可视化

例如:

import numpy as np

from yellowbrick.datasets import load_hobbies
from yellowbrick.classifier import ClassificationReport

from sklearn.naive_bayes import GaussianNB
from sklearn.preprocessing import LabelEncoder
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import train_test_split as tts

# Load the data and create document vectors
corpus = load_hobbies()
tfidf = TfidfVectorizer()

X = tfidf.fit_transform(corpus.data)
y = corpus.target

# Turn sparse matrix into dense matrix
X = X.toarray()

# Split data into training and testing
X_train, X_test, y_train, y_test = tts(X, y, test_size=0.2, random_state=42)

# Instantiate the classification model and visualizer
model = GaussianNB()
visualizer = ClassificationReport(model, support=True)

visualizer.fit(X_train, y_train)        # Fit the visualizer and the model
visualizer.score(X_test, y_test)        # Evaluate the model on the test data
visualizer.show()                       # Finalize and show the figure

请将您的问题包括在完整的错误回溯中。我想问题出在标签编码器上,要转换数据,您必须将矩阵传递给转换器,而不是数组。重塑(-1,1)将数组(n_个样本,)转换为矩阵(n_个样本,1)。您是否尝试过重塑值?请回答您的问题以包含完整的错误回溯。我猜问题在于标签编码器,要转换数据,您必须将矩阵传递给转换器,而不是数组。方法。重塑(-1,1)将数组(n_样本,)转换为矩阵(n_样本,1).您是否尝试过重塑价值观?