python中带分类变量的测试机器学习模型_Python_Machine Learning_Categorical Data

python中带分类变量的测试机器学习模型

python machine-learning

python中带分类变量的测试机器学习模型,python,machine-learning,categorical-data,Python,Machine Learning,Categorical Data,我有一个这样的数据集 print(regressor.predict([[1,0,1000,2000,3000]])) 如您所见，有一个分类变量是state 后来我对分类变量进行编码如果我想用特定的数据测试我的模型，我会这样做 print(regressor.predict([[1,0,1000,2000,3000]])) 这很好用。但我想做的是，在测试时，我直接输入城市名称，比如纽约或佛罗里达如何实现这一点？机器学习模型只能处理数字数据。这就是为什么你必须对你的状态进行编码的原因。有

我有一个这样的数据集

print(regressor.predict([[1,0,1000,2000,3000]]))

如您所见，有一个分类变量是state

后来我对分类变量进行编码

如果我想用特定的数据测试我的模型，我会这样做

print(regressor.predict([[1,0,1000,2000,3000]]))

这很好用。但我想做的是，在测试时，我直接输入城市名称，比如纽约或佛罗里达

如何实现这一点？

机器学习模型只能处理数字数据。这就是为什么你必须对你的状态进行编码的原因。有几种方法可以实现你所说的：使用函数返回状态的编码值，同时可以输入以下内容

print(regressor.predict([[1,0,1000,func("New York"),3000]]))

b使用隐式编码，隐式编码为每个分类变量创建尽可能多的列

print(regressor.predict([[1,0,1000,func("New York"),3000]]))

b使用隐式编码，隐式编码为每个分类变量创建尽可能多的列

因为ML模型只输入数字，所以您必须对测试数据集进行编码，然后将其传递给模型。

这一点都不优雅，但如果。。。elif语句取决于输入，如：

a = input("Please enter the state: ") 
if a = "New York":
    print(regressor.predict([[1,0,1000,2000,3000]]))
elif a = "Florida":
    print(regressor.predict([[0,1,1000,2000,3000]]))
else:
    print("Invalid state selected")

这一点也不优雅，但你可以写下如果。。。elif语句取决于输入，如：

a = input("Please enter the state: ") 
if a = "New York":
    print(regressor.predict([[1,0,1000,2000,3000]]))
elif a = "Florida":
    print(regressor.predict([[0,1,1000,2000,3000]]))
else:
    print("Invalid state selected")

您可以使用scikit Learn对分类值进行变换和逆变换

i、 e

您可以像下面这样调用您的函数

print(regressor.predict([[1,0,1000,le.transform(["New York"])[0],3000]]))

您可以使用scikit Learn对分类值进行变换和逆变换

i、 e

您可以像下面这样调用您的函数

print(regressor.predict([[1,0,1000,le.transform(["New York"])[0],3000]]))

正如其他人之前提到的，任何模型都只接受数字作为输入。出于这个原因，我们通常创建一个预处理函数，该函数可以同时应用于训练集和测试集

在这种情况下，您需要定义一个函数，该函数将输入向量转换为数值向量，该数值向量可进一步馈送到您的机器学习模型：

Inputs -> Preprocessing -> Model

这个预处理需要和你训练时使用的一样，这样你才能达到你想要的结果

因此，通常在创建模型时，完整的“模型”实际上可以是您使用的实际模型的包装器。例如：

class MyModel():

    def __init__(self,):
        # Inputs and other variables like hyperparameters
        self.model = Model() # Initialise a model of your choice

    def preprocess(self, list_to_preprocess):
        # Preprocess this list

    def train(self, train_set):
        X_train, y_train = preprocess(X_train)
        self.model.fit(X_train, y_train)

    def predict(self, test_set):
        # If X_test is a vector, reshape and then preprocess

        X_test, y_test = preprocess(test_set)
        pred = self.model.predict(X_test)

        # Evaluate using pred and y_test

因此，最后要进行预测，请使用函数MyModel.predict而不是Model.predict来实现您想要的目标。

正如其他人之前提到的，任何模型都只将数字作为输入。出于这个原因，我们通常创建一个预处理函数，该函数可以同时应用于训练集和测试集

在这种情况下，您需要定义一个函数，该函数将输入向量转换为数值向量，该数值向量可进一步馈送到您的机器学习模型：

Inputs -> Preprocessing -> Model

这个预处理需要和你训练时使用的一样，这样你才能达到你想要的结果

因此，通常在创建模型时，完整的“模型”实际上可以是您使用的实际模型的包装器。例如：

class MyModel():

    def __init__(self,):
        # Inputs and other variables like hyperparameters
        self.model = Model() # Initialise a model of your choice

    def preprocess(self, list_to_preprocess):
        # Preprocess this list

    def train(self, train_set):
        X_train, y_train = preprocess(X_train)
        self.model.fit(X_train, y_train)

    def predict(self, test_set):
        # If X_test is a vector, reshape and then preprocess

        X_test, y_test = preprocess(test_set)
        pred = self.model.predict(X_test)

        # Evaluate using pred and y_test

因此，最后要预测，请使用函数MyModel.predict而不是Model.predict来实现您想要的功能。

LabelEncoder仅用于标签。不建议将其用于此目的。我建议改为使用。@VedangWaradpande我完全同意LabelenCoder仅用于标签。不建议将其用于此目的。我建议改用。@VedangWaradpande我完全同意