Python ValueError:尝试拟合模型时发现输入变量的样本数不一致:[2004、2005]

Python ValueError:尝试拟合模型时发现输入变量的样本数不一致:[2004、2005],python,arrays,nlp,shapes,valueerror,Python,Arrays,Nlp,Shapes,Valueerror,我正在尝试适合我的模型,但我不断得到以下错误: y = column_or_1d(y, warn=True) Traceback (most recent call last): File "/Users/amanpuranik/PycharmProjects/covid/fake news 2.py", line 107, in <module> model.fit(x_train,y_test) File "/Users/amanpuranik/PycharmP

我正在尝试适合我的模型,但我不断得到以下错误:

 y = column_or_1d(y, warn=True)
Traceback (most recent call last):
  File "/Users/amanpuranik/PycharmProjects/covid/fake news 2.py", line 107, in <module>
    model.fit(x_train,y_test)
  File "/Users/amanpuranik/PycharmProjects/covid/venv/lib/python3.7/site-packages/sklearn/naive_bayes.py", line 609, in fit
    X, y = self._check_X_y(X, y)
  File "/Users/amanpuranik/PycharmProjects/covid/venv/lib/python3.7/site-packages/sklearn/naive_bayes.py", line 475, in _check_X_y
    return check_X_y(X, y, accept_sparse='csr')
  File "/Users/amanpuranik/PycharmProjects/covid/venv/lib/python3.7/site-packages/sklearn/utils/validation.py", line 765, in check_X_y
    check_consistent_length(X, y)
  File "/Users/amanpuranik/PycharmProjects/covid/venv/lib/python3.7/site-packages/sklearn/utils/validation.py", line 212, in check_consistent_length
    " samples: %r" % [int(l) for l in lengths])
ValueError: Found input variables with inconsistent numbers of samples: [3207, 802]
所以我不知道为什么我会犯这个错误。我能做些什么来解决这个问题?这是我的密码:

data = pd.read_csv("/Users/amanpuranik/Desktop/fake-news-detection/data.csv")
data = data[['Headline', "Label"]]

x = np.array(data['Headline'])
y = np.array(data["Label"])
    #lowercase
lower = [[word.lower() for word in headline] for headline in stemmed2] #start here

#conver lower into a list of strings
lower_sentences = [" ".join(x) for x in lower]
print(lower_sentences)

#organising
articles = []


for headline in lower:
    articles.append(headline)

#print(articles[0])

#creating the bag of words model

headline_bow = CountVectorizer()
headline_bow.fit(lower_sentences)
a = headline_bow.transform(lower_sentences)
print(a)
b = headline_bow.get_feature_names()

#testing and training part
yy = np.reshape(y,(-1,1))
lower2 = np.reshape(lower_sentences,(-1,1))
x_train, x_test, y_train, y_test = train_test_split(lower2, yy, test_size=0.2, random_state=1)

print(lower2.shape)
print(yy.shape)



#fitting on the model now

model = MultinomialNB() #don forget these brackets here
model.fit(x_train,y_test) #this is where the error comes in 

model.fit(x\u-train,y\u测试)
更改为
model.fit(x\u-train,y\u-train)
。我不知道这是否修复了错误,但这是错误的。你不能同时使用不匹配的训练数据和测试数据。

我认为,应该是
y=[column\u或[u 1d(y,warn=True)]
而不是
y=column\u或[u 1d(y,warn=True)

哦,哇哦,这完全超出了我的想象,我还以为是在y\u火车上。但我现在又遇到了另一个错误“TypeError:无法使用灵活类型执行reduce”
data = pd.read_csv("/Users/amanpuranik/Desktop/fake-news-detection/data.csv")
data = data[['Headline', "Label"]]

x = np.array(data['Headline'])
y = np.array(data["Label"])
    #lowercase
lower = [[word.lower() for word in headline] for headline in stemmed2] #start here

#conver lower into a list of strings
lower_sentences = [" ".join(x) for x in lower]
print(lower_sentences)

#organising
articles = []


for headline in lower:
    articles.append(headline)

#print(articles[0])

#creating the bag of words model

headline_bow = CountVectorizer()
headline_bow.fit(lower_sentences)
a = headline_bow.transform(lower_sentences)
print(a)
b = headline_bow.get_feature_names()

#testing and training part
yy = np.reshape(y,(-1,1))
lower2 = np.reshape(lower_sentences,(-1,1))
x_train, x_test, y_train, y_test = train_test_split(lower2, yy, test_size=0.2, random_state=1)

print(lower2.shape)
print(yy.shape)



#fitting on the model now

model = MultinomialNB() #don forget these brackets here
model.fit(x_train,y_test) #this is where the error comes in