在进行逻辑回归时,如何解决Python中的值错误?
我在逻辑回归中得到一个值错误。我如何解决这个问题 我试着扔掉幸存的柱子,但还是没用 输入:在进行逻辑回归时,如何解决Python中的值错误?,python,python-3.x,sklearn-pandas,Python,Python 3.x,Sklearn Pandas,我在逻辑回归中得到一个值错误。我如何解决这个问题 我试着扔掉幸存的柱子,但还是没用 输入: X_train=train_df.drop("Survived",axis=1) Y_train=train_df["Survived"] X_test=test_df.drop("PassengerId",axis=1).copy() X_train=train_df.drop("PassengerId",axis=1).copy() X_train.head() Y_train.head() X_te
X_train=train_df.drop("Survived",axis=1)
Y_train=train_df["Survived"]
X_test=test_df.drop("PassengerId",axis=1).copy()
X_train=train_df.drop("PassengerId",axis=1).copy()
X_train.head()
Y_train.head()
X_test.head()
X_train.shape,Y_train.shape,X_test.shape
输出:
Pclass----Sex-----Age-------Parch-----Fare-------EMbarked
3--------- 0 -----34.5------0---------7.82-------2
3--------- 1 -----47 ------0---------7----------0
2--------- 0 -----62 ------0---------9.68-------2
3--------- 0 -----27 ------0---------8.66-------0
3--------- 1 -----22 ------1---------12.2-------0
((891, 7), (891,), (418, 6))
输入:
X_train=train_df.drop("Survived",axis=1)
Y_train=train_df["Survived"]
X_test=test_df.drop("PassengerId",axis=1).copy()
X_train=train_df.drop("PassengerId",axis=1).copy()
X_train.head()
Y_train.head()
X_test.head()
X_train.shape,Y_train.shape,X_test.shape
输出:
Pclass----Sex-----Age-------Parch-----Fare-------EMbarked
3--------- 0 -----34.5------0---------7.82-------2
3--------- 1 -----47 ------0---------7----------0
2--------- 0 -----62 ------0---------9.68-------2
3--------- 0 -----27 ------0---------8.66-------0
3--------- 1 -----22 ------1---------12.2-------0
((891, 7), (891,), (418, 6))
输入:
X_train.head()
输出:
Column1---Survived---Pclass----Sex----Age-----Parch----Fare----Embarked
0-------- ----0----------3-------0-----22-------0------7.25------0
1-------------1----------1-------1-----38-------0------71.2833---1
2-------------1----------3-------1-----26-------0------7.925-----0
3-------------1----------1-------1-----35-------0------53.1------0
4-------------0----------3-------0-----35-------0---- -8.05------0
逻辑回归
logreg = LogisticRegression()
logreg.fit(X_train, Y_train)
Y_pred = logreg.predict(X_test)
acc_log = round(logreg.score(X_train, Y_train) * 100, 2)
acc_log
错误消息:
ValueError Traceback (most recent call last)
<ipython-input-64-5854ca91fc64> in <module>
3 logreg = LogisticRegression()
4 logreg.fit(X_train, Y_train)
----> 5 Y_pred = logreg.predict(X_test)
6 acc_log = round(logreg.score(X_train, Y_train) * 100, 2)
7 acc_log
c:\users\user\appdata\local\programs\python\python37\lib\site-packages\sklearn\linear_model\base.py in predict(self, X)
287 Predicted class label per sample.
288 """
--> 289 scores = self.decision_function(X)
290 if len(scores.shape) == 1:
291 indices = (scores > 0).astype(np.int)
c:\users\user\appdata\local\programs\python\python37\lib\site-packages\sklearn\linear_model\base.py in decision_function(self, X)
268 if X.shape[1] != n_features:
269 raise ValueError("X has %d features per sample;
expecting %d"
--> 270 % (X.shape[1], n_features))
271
272 scores = safe_sparse_dot(X, self.coef_.T,
ValueError: X has 6 features per sample; expecting 7
ValueError回溯(最近一次调用)
在里面
3 logreg=逻辑回归()
4 logreg.fit(X_系列、Y_系列)
---->5 Y_pred=对数预测(X_检验)
6 acc_log=轮(logreg.分数(X_列车、Y_列车)*100,2)
7 acc_日志
c:\users\user\appdata\local\programs\python37\lib\site packages\sklearn\linear\u model\base.py in predict(self,X)
每个样本287个预测类别标签。
288 """
-->289分=自我决策函数(X)
290如果len(分数形状)==1:
291指数=(得分>0).aType(np.int)
c:\users\user\appdata\local\programs\python\37\lib\site packages\sklearn\linear\u model\base.py in decision\u函数(self,X)
268如果X.shape[1]!=n_特征:
269 raise VALUE ERROR(“X每个样本有%d个特征;
应为%d“
-->270%(X.形状[1],n_特征)
271
272分=安全稀疏点(X,self.coef.T,
ValueError:X每个示例有6个功能;应为7个
X\U列车和Y\U列车应具有相同的功能集。您的X\U列车中有不必要的“幸存”功能列
更好的方法是以这种格式从数据帧中提取必要的列
necessary_columns = ['Pclass', 'Sex', 'Age', 'Parch', 'Fare', 'EMbarked']
X_train = train_df[necessary_columns]
Y_train = train_df["Survived"]
X_test = test_df[necessary_columns]
拟合模型应该用y~x1,x2…x6的排他性方程来构建,以防止数据集不包含拟合模型特征属性。谢谢