Python 在标准化之后，如何使用浮点数据类型进行分类？_Python_Scikit Learn_Knn

Python 在标准化之后，如何使用浮点数据类型进行分类？

python scikit-learn

Python 在标准化之后，如何使用浮点数据类型进行分类？,python,scikit-learn,knn,Python,Scikit Learn,Knn,我正在使用一个名为“成人”的数据集，我正在尝试对一些列运行KNN，这些列我已经制作成一个新的数据帧，并对一些列进行了规范化。我在尝试运行时遇到ValueError:Unknown标签类型：“continuous”错误 clf = neighbors.KNeighborsClassifier() clf.fit(X_train, y_train) 在线研究错误后，我似乎需要在数据规范化后使用标签编码器，因为它现在是float而不是int，但我在使用标签编码器时遇到了问题。我使用的代码是： imp

我正在使用一个名为“成人”的数据集，我正在尝试对一些列运行KNN，这些列我已经制作成一个新的数据帧，并对一些列进行了规范化。我在尝试运行时遇到

ValueError:Unknown标签类型：“continuous”

错误

clf = neighbors.KNeighborsClassifier()
clf.fit(X_train, y_train)

在线研究错误后，我似乎需要在数据规范化后使用标签编码器，因为它现在是

float

而不是

int

，但我在使用标签编码器时遇到了问题。我使用的代码是：

import numpy as np ##Import necassary packages
import pandas as pd
import matplotlib.pyplot as plt
from pandas.plotting import scatter_matrix
from sklearn.preprocessing import *
from sklearn.neighbors import KNeighborsClassifier 
from sklearn.model_selection import train_test_split
url2="http://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data" #Reading in Data from a freely and easily available source on the internet
Adult = pd.read_csv(url2, header=None, skipinitialspace=True) #Decoding data by removing extra spaces in cplumns with skipinitialspace=True
##Assigning reasonable column names to the dataframe
Adult.columns = ["age","workclass","fnlwgt","education","educationnum","maritalstatus","occupation",  
                 "relationship","race","sex","capitalgain","capitalloss","hoursperweek","nativecountry",
                 "less50kmoreeq50kn"]
Adult.loc[Adult.loc[:, "race"] == "Amer-Indian-Eskimo", "race"] = "Other" #consolidating catagorical data in the race column

Adult.loc[:,"race"].value_counts().plot(kind='bar') #plotting the consolidated catagorical data in the race column
plt.title('race after consolidation')
plt.show()

Adult.loc[:, "White"] = (Adult.loc[:, "race"] == "White").astype(int) #One hot encoding the catagorical/creating new categorical data in the race column
Adult.loc[:, "Black"] = (Adult.loc[:, "race"] == "Black").astype(int)
Adult.loc[:, "Asian-Pac-Islander"] = (Adult.loc[:, "race"] == "Asian-Pac-Islander").astype(int)
Adult.loc[:, "Other"] = (Adult.loc[:, "race"] == "Other").astype(int)

Adult.loc[:,"Other"] #Verifying One-hot encoding for Other column

Adult = Adult.drop("race", axis=1) #removing the obsolete column "race"

Minage = min(Adult.loc[:,"age"])  #MinMax normalizing the age column
Maxage = max(Adult.loc[:,"age"])
MinMaxage = (Adult.loc[:,"age"] - Minage)/(Maxage - Minage)

Minhours = min(Adult.loc[:,"hoursperweek"])  #MinMax ormalizing the hoursperweek column
Maxhours = max(Adult.loc[:,"hoursperweek"])
MinMaxhours = (Adult.loc[:,"hoursperweek"] - Minhours)/(Maxhours - Minhours)

df2 = pd.DataFrame() #creating a dataframe to plot the normilized data
df2.loc[:,0] = Adult.loc[:, "White"] #filling the data frame
df2.loc[:,1] = MinMaxage
df2.loc[:,2] = MinMaxhours

df2.columns = ["White","MinMaxage","MinMaxhours"]

X = np.array(df2.drop(['MinMaxhours'], 1))
y = np.array(df2['MinMaxhours'])

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2)
clf = neighbors.KNeighborsClassifier()
clf.fit(X_train, y_train)

accuracy = clf.score(X_test, y_test)
print(accuracy)

clf.predict(X_test)
y_test

有人能帮我对数据进行标签编码，这样我就可以对数据执行Knn吗？我在sklearn网站和不同的示例中查找了它，但是在我的数据集上使用它仍然有困难。我在尝试拟合正在运行的数据时收到错误消息

clf.fit（X\u train，y\u train）

看起来您遇到了回归问题，而不是分类问题。您正在尝试预测MinMaxHours变量，它是一个实数。如果你试图预测实数，你应该使用Neist邻居算法的回归版本。为了得到预测，下面的代码应该起作用

from sklearn.neighbors import KNeighborsRegressor
clf = KNeighborsRegressor()
clf.fit(X_train, y_train)
y_test_pred = clf.predict(X_test)

我已经考虑过了。但是，在尝试运行分类问题时，在规范化数据之后，我仍然会遇到相同的问题，对吗？或者说，将标准化与分类结合使用不是很常见吗？对浮动数据进行分类是不可能的。你到底想预测什么？您可以预测小时数属于某一类别的概率，例如35-40小时，这是一个分类问题。或者你可以尝试通过建立回归模型来接近准确的小时数。那么你的输出将是一个实数而不是概率。好吧，我做了一个分类模型，没有使用标准化数据，正如你所期望的那样，准确度很差。但是，是的，我正试图根据“年龄”和“白色”预测小时数。我只是认为有一种方法可以使用规范化数据再次运行分类模型。看看其他示例，我可以创建一个使用浮动类型的分类模型，但预测值不是浮动类型吗？是的，这是正确的。可以使用浮点类型作为分类模型的输入，但是，如果要在两个类之间进行预测，则预测值只能是布尔值（1或0）。e、你试图预测图片中的动物是猫还是狗。除布尔值外，如果存在两个以上的类（0=狗、1=猫、2=鸟等），则预测值也可以是整数。希望这能澄清它！