Python 错误:CSV文件头在scikit学习库中参与了决策树计算

Python 错误:CSV文件头在scikit学习库中参与了决策树计算,python,python-3.x,scikit-learn,decision-tree,Python,Python 3.x,Scikit Learn,Decision Tree,我运行以下代码表单,通过scikit学习库创建决策树 import numpy as np from sklearn.model_selection import train_test_split import os from sklearn.tree import export_graphviz import graphviz #progress 1 path="/mnt/d/TestDecisionTree/datasets" os.chdir(path) os.get

我运行以下代码表单,通过scikit学习库创建决策树

import numpy as np
from sklearn.model_selection import train_test_split
import os
from sklearn.tree import export_graphviz
import graphviz

#progress 1
path="/mnt/d/TestDecisionTree/datasets"
os.chdir(path)
os.getcwd()

#progress 2
dataset=np.loadtxt("internetlogit.csv", delimiter=",")
x=dataset[:,0:5]
y=dataset[:,5]

#progress 3
from sklearn.tree import DecisionTreeRegressor
X_train, X_test, y_train, y_test = train_test_split(x, y)
tree = DecisionTreeRegressor().fit(X_train,y_train)

#progress 4
print("Training set accuracy: {:.3f}".format(tree.score(X_train, y_train)))
print("Test set accuracy: {:.3f}".format(tree.score(X_test, y_test)))

#progress 5
dtree = tree.predict(x)
print(dtree)

#progress 6
percentageerror_tree=((y-dtree)/dtree)*100
percentageerror_tree

#progress 7
np.mean(percentageerror_tree)

#progress 8
export_graphviz(tree,out_file="result/tree.dot")

with open("result/tree.dot") as f:
    dot_graph = f.read()

graphviz.Source(dot_graph)
我的示例数据是internetlogit.csv文件中的以下数据集

age,gender,webpages,videohours,income,usage
36,0,32,0.061388889,6021,0
33,0,49,8.516666667,10239,1
46,1,22,0,1374,0
53,0,16,2.762222222,5376,0
27,1,30,0,1393,0
21,1,23,2.641111111,4866,0
42,0,30,0,1673,0
...
但我在《进步2》中发现了这个错误

这意味着CSV文件的头参与决策树计算。但是,不应该是这样。我怎样才能解决这个问题


感谢您的帮助。

熊猫
将是最简单的即时修复,例如:

将熊猫作为pd导入
从io导入StringIO
csv_file=StringIO(“”)
年龄、性别、网页、视频时数、收入、使用情况
36,0,32,0.061388889,6021,0
33,0,49,8.516666667,10239,1
46,1,22,0,1374,0
53,0,16,2.762222222,5376,0
27,1,30,0,1393,0
21,1,23,2.641111111,4866,0
42,0,30,0,1673,0
""")
df=pd.read\u csv(csv\u文件)
y=df[“用法”]
x=df.drop([“用法”],轴=1)
从sklearn.tree导入决策树
从sklearn.model\u选择导入列车\u测试\u拆分
X_序列,X_测试,y_序列,y_测试=序列测试分割(X,y)
tree=DecisionTreeRegressor().fit(X\U系列,y\U系列)
树。适合(X_列,y_列)
打印(树)
由于列名不能解释为字符串,因此在发布的代码的第13行出现了
ValueError

如果您不想使用pandas,还可以将
skiprows
传递到
np.loadtxt

dataset=np.loadtxt(csv_文件,分隔符=“,”,skiprows=2)
x=数据集[:,0:5]
y=数据集[:,5]
ValueError: could not convert string to float: 'age'