如何使用panda.read\u csv从python中的csv文件导入数据?

如何使用panda.read\u csv从python中的csv文件导入数据?,python,pandas,scikit-learn,Python,Pandas,Scikit Learn,我正在尝试使用scikit_learn和pandas解决python中的决策树问题。该数据集在CSV文件中可用。 当我尝试用python加载数据时,我得到一个错误,上面写着“ValueError:无法将字符串转换为float:'CustomerID'”。我不知道我在代码上做错了什么 import pandas as pd from sklearn.tree import DecisionTreeClassifier from sklearn.model_selection import trai

我正在尝试使用scikit_learn和pandas解决python中的决策树问题。该数据集在CSV文件中可用。 当我尝试用python加载数据时,我得到一个错误,上面写着“ValueError:无法将字符串转换为float:'CustomerID'”。我不知道我在代码上做错了什么

import pandas as pd
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn import metrics
col_names=['CustomerID','Gender','Car Type', 'Shirt Size','Class']
pima=pd.read_csv("F:\Current semster courses\Machine 
Learning\ML_A1_Fall2019\Q2_dataset.csv",header=None, names=col_names)
pima.head()
feature_cols=['CustomerID','Gender','Car Type', 'Shirt Size']
X=pima[feature_cols]
y=pima.Class
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1)
clf = DecisionTreeClassifier()

# Train Decision Tree Classifer
clf = clf.fit(X_train,y_train)

#Predict the response for test dataset
y_pred = clf.predict(X_test)
print("Accuracy:",metrics.accuracy_score(y_test, y_pred))
有人能告诉我我做错了什么吗

数据集:

CustomerID  Gender  Car Type    Shirt Size  Class
1            M      Family       Small      C0
2            M      Sports       Medium     C0
3            M      Sports       Medium     C0
4            M      Sports       Large      C0
5            M      Sports     Extra Large  C0
6            M      Sports     Extra Large  C0
7            F      Sports       Small      C0
8            F      Sports       Small      C0
9            F      Sports       Medium     C0
10           F      Luxury       Large      C0
11           M      Family       Large      C1
12           M      Family     Extra Large  C1
13           M      Family       Medium     C1
14           M      Luxury    Extra Large   C1
15           F      Luxury       Small      C1
16           F      Luxury       Small      C1
17           F      Luxury       Medium     C1
18           F      Luxury       Medium     C1
19           F      Luxury       Medium     C1
20           F      Luxury       Large      C1

啊。好啊问题是您的数据是分类数据,
scikit
无法直接使用。首先需要将其转换为数字数据。方法
\u get\u dummies()
通过获取具有多个分类值的单个列,并将其转换为多个列,每个列包含一个数字1或0,指示哪个类别为“True”

另外,您应该从功能中删除“客户ID”列。它是一个随机值,与行是否属于某个类无关

import pandas as pd
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn import metrics

col_names=['CustomerID','Gender','Car Type', 'Shirt Size','Class']
data = [['1',  'M', 'Family', 'Small',      'C0'], 
        ['2',  'M', 'Sports', 'Medium',     'C0'], 
        ['3',  'M', 'Sports', 'Medium',     'C0'], 
        ['4',  'M', 'Sports', 'Large',      'C0'], 
        ['5',  'M', 'Sports', 'Extra Large','C0'], 
        ['6',  'M', 'Sports', 'Extra Large','C0'], 
        ['7',  'F', 'Sports', 'Small',      'C0'], 
        ['8',  'F', 'Sports', 'Small',      'C0'], 
        ['9',  'F', 'Sports', 'Medium',     'C0'], 
        ['10', 'F', 'Luxury', 'Large',      'C0'], 
        ['11', 'M', 'Family', 'Large',      'C1'], 
        ['12', 'M', 'Family', 'Extra Large','C1'], 
        ['13', 'M', 'Family', 'Medium',     'C1'], 
        ['14', 'M', 'Luxury', 'Extra Large','C1'], 
        ['15', 'F', 'Luxury', 'Small',      'C1']]

#pima=pd.read_csv("F:\Current semster courses\Machine ...
pima=pd.DataFrame(data, columns = col_names)
# Convert the categorical data to multiple columns of numerical data for the decision tree
pima = pd.get_dummies(pima, prefix=['CustomerID','Gender','Car Type', 'Shirt Size','Class'])
print(pima)

#feature_cols=['CustomerID','Gender','Car Type','Shirt Size']
feature_cols=['Gender_F', 'Gender_M',
       'Car Type_Family', 'Car Type_Luxury', 'Car Type_Sports',
       'Shirt Size_Extra Large', 'Shirt Size_Large', 'Shirt Size_Medium',
       'Shirt Size_Small', 'Class_C0', 'Class_C1']
X=pima[feature_cols]
y=pima[['Class_C0', 'Class_C1']]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1)

print("X_train =", X_train) 
print("X_test =", X_test) 
print("y_train =", y_train)
print("y_test =", y_test )
clf = DecisionTreeClassifier()

# Train Decision Tree Classifer
clf = clf.fit(X_train,y_train)

#Predict the response for test dataset
y_pred = clf.predict(X_test)
print("Accuracy:",metrics.accuracy_score(y_test, y_pred))

你能不能提供几行CSV,或者甚至把完整的文件上传到某个地方,这样我们就可以重新创建这个问题。我已经添加了我的数据的截图,你介意把它也粘贴到文本中,这样我就可以复制粘贴它吗?我已经添加了数据网,你为什么不能只做pd.read_CSV('file.CSV')?对我来说读起来很好?我不希望我的数据被视为浮点或intenger,我希望它被视为字符串