Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/perl/10.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Machine learning x和y必须具有相同的第一维度,但具有形状(4536,32)和(1944,24)_Machine Learning_Spyder_Decision Tree_Analysis - Fatal编程技术网

Machine learning x和y必须具有相同的第一维度,但具有形状(4536,32)和(1944,24)

Machine learning x和y必须具有相同的第一维度,但具有形状(4536,32)和(1944,24),machine-learning,spyder,decision-tree,analysis,Machine Learning,Spyder,Decision Tree,Analysis,我是机器学习领域的初学者,我被困在某个地方。我真的需要一些帮助。我有一个由州名、月份、温度和降雨量组成的数据集。 我的代码是: import pandas as pd import numpy as np import matplotlib.pyplot as plt data=pd.read_csv('cropdata.csv') x=data.iloc[:, :-1].values y=data.iloc[:, 4].values district = pd.get_dummies(d

我是机器学习领域的初学者,我被困在某个地方。我真的需要一些帮助。我有一个由州名、月份、温度和降雨量组成的数据集。 我的代码是:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

data=pd.read_csv('cropdata.csv')
x=data.iloc[:, :-1].values
y=data.iloc[:, 4].values


district = pd.get_dummies(data['District'],drop_first = False)
month = pd.get_dummies(data['Month'],drop_first = False)
crop = pd.get_dummies(data['Crop'],drop_first = False)
data= pd.concat([data,district],axis=1)
data.drop('District', axis=1,inplace=True)
data= pd.concat([data,month],axis=1)
data.drop('Month', axis=1,inplace=True)
data= pd.concat([data,crop],axis=1)
data.drop('Crop', axis=1,inplace=True)

print(data.head(1))

train=data.iloc[:, 0:44].values
test=data.iloc[: ,44:].values

from sklearn.preprocessing import Imputer
imputer1 = Imputer(missing_values = 'NaN', strategy = 'mean', axis = 0)
imputer1 = imputer1.fit(train[:, 0:44])
train[:, 0:44] = imputer1.transform(train[:, 0:44])


from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test=train_test_split(train,test,test_size=0.3)

from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)


from sklearn.tree import DecisionTreeRegressor
from sklearn.datasets import load_iris
clf=DecisionTreeRegressor(max_depth = 19,random_state = None)

#Fitting the classifier into training set
clf.fit(X_train,y_train)
pred=clf.predict(X_test)

print(pred)
predx=pred.round()


from sklearn.metrics import accuracy_score
# Finding the accuracy of the model
a=accuracy_score(y_test,pred.round())
print("The accuracy of this model is: ", a*100)


from sklearn import tree
iris = load_iris()
clf = clf.fit(iris.data, iris.target)
plt.figure(figsize=(10,10))
tree.plot_tree(clf);
模型的精度为70%,但存在以下误差:

ValueError:x和y必须具有相同的第一维度,但具有形状(4536,44)和(1944,12)


现在我不明白我能做些什么来消除错误,以及如何从这个问题中绘制图表

根据您的错误,您的X_列数据包含训练数据集的4536行,这意味着每一行都有自己的目标值(标签),因此标签值应为4536(y_列)

但您的y_列车仅包含1944标签,与X_列车所需标签不匹配


每个X_列都需要相应的标签,您提供的一些标签和其他标签是未标记的

我猜错误可能是行
clf.fit(X_列,y_列)
您可以尝试只运行
clf=DecisionTreeRegressor()
和下一行来确认您仍然得到错误吗?如果是这样,您应该检查您是否正确定义了y_train如何将数据集划分为列并基于列进行测试???根据行划分数据集是否包含相同的列数您建议采用什么解决方案?您是否可以上载您编写的数据集和代码,以便我可以轻松跟踪您的问题