Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/arrays/12.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如果我不能在predict中使用它,编码的目的是什么_Python_Arrays_Scikit Learn_Predict_One Hot Encoding - Fatal编程技术网

Python 如果我不能在predict中使用它,编码的目的是什么

Python 如果我不能在predict中使用它,编码的目的是什么,python,arrays,scikit-learn,predict,one-hot-encoding,Python,Arrays,Scikit Learn,Predict,One Hot Encoding,这是一项后续行动 我认为我们使用OneHotEncoding的原因是将字符串数据转换为numpy数组,对吗 然后,预测语句 val\u预测=足球模型。预测(val\u X) 应该像使用编码数据一样工作 以下是我到目前为止的代码: import numpy as np import pandas as pd from sklearn.metrics import mean_absolute_error from sklearn.model_selection import train_test_s

这是一项后续行动

我认为我们使用OneHotEncoding的原因是将字符串数据转换为numpy数组,对吗

然后,预测语句
val\u预测=足球模型。预测(val\u X)
应该像使用编码数据一样工作

以下是我到目前为止的代码:

import numpy as np
import pandas as pd
from sklearn.metrics import mean_absolute_error
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder
from sklearn.tree import DecisionTreeRegressor

# Set option to display all the rows and columns in the dataset. If there are more rows, adjust number accordingly.
pd.set_option('display.max_rows', 5000)
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 1000)

# Pandas needs you to define the column as date before its imported and then call the column and define as a date
# hence this step.
date_col = ['Date']
df = pd.read_csv(
    r'C:\Users\harsh\Documents\My Dream\Desktop\Machine Learning\Attempt1\Historical Data\Concat_Cleaned.csv'
    , parse_dates=date_col, skiprows=0, low_memory=False)

# Converting/defining the columns
# Before you define column types, you need to fill all NaN with a value. We will be reconverting them later
df = df.fillna(101)
# Defining column types
convert_dict = {'League_Division': str,
                'HomeTeam': str,
                'AwayTeam': str,
                'Full_Time_Home_Goals': int,
                'Full_Time_Away_Goals': int,
                'Full_Time_Result': str,
                'Half_Time_Home_Goals': int,
                'Half_Time_Away_Goals': int,
                'Half_Time_Result': str,
                'Attendance': int,
                'Referee': str,
                'Home_Team_Shots': int,
                'Away_Team_Shots': int,
                'Home_Team_Shots_on_Target': int,
                'Away_Team_Shots_on_Target': int,
                'Home_Team_Hit_Woodwork': int,
                'Away_Team_Hit_Woodwork': int,
                'Home_Team_Corners': int,
                'Away_Team_Corners': int,
                'Home_Team_Fouls': int,
                'Away_Team_Fouls': int,
                'Home_Offsides': int,
                'Away_Offsides': int,
                'Home_Team_Yellow_Cards': int,
                'Away_Team_Yellow_Cards': int,
                'Home_Team_Red_Cards': int,
                'Away_Team_Red_Cards': int,
                'Home_Team_Bookings_Points': float,
                'Away_Team_Bookings_Points': float,
                }

df = df.astype(convert_dict)

# Reverting the replace values step to get original dataframe and with the defined filetypes
df = df.replace('101', np.NAN, regex=True)
df = df.replace(101, np.NAN, regex=True)

# Clean dataset by dropping null rows
data = df.dropna(axis=0)

# Column that you want to predict = y
y = data.Full_Time_Home_Goals

# Columns that are inputted into the model to make predictions (dependants), Cannot be column y
features = ['HomeTeam', 'AwayTeam', 'Full_Time_Away_Goals', 'Full_Time_Result']
# Create X
X = data[features]

# Split into validation and training data
train_X, val_X, train_y, val_y = train_test_split(X, y, random_state=1)

# Specify Model
soccer_model = DecisionTreeRegressor(random_state=1)

# Define and train OneHotEncoder to transform numerical data to a numeric array
enc = OneHotEncoder(handle_unknown='ignore')
enc.fit(train_X)

transformed_train_X = enc.transform(train_X)

# Fit Model
soccer_model.fit(transformed_train_X, train_y)

#  Make validation predictions and calculate mean absolute error
val_predictions = soccer_model.predict(val_X)
val_mae = mean_absolute_error(val_predictions, val_y)
print("Validation MAE when not specifying max_leaf_nodes : {:,.0f}".format(val_mae))
我得到的错误是

val_predictions = soccer_model.predict(val_X)
我得到的错误是:

ValueError:无法将字符串转换为浮点:“狼”

您可以找到我的示例数据集

再次查看这些:

transformed_train_X = enc.transform(train_X)

# Fit Model
soccer_model.fit(transformed_train_X, train_y)
您所做的是对
train\u X
进行编码,并使用编码后的数据拟合模型
soccer\u模型
。这就是模型所期望的。因此,要使用它,您应该应用相同的编码,也就是说,您应该这样做:

transformed_val_X = enc.transform(val_X)

#  Make validation predictions and calculate mean absolute error
val_predictions = soccer_model.predict(transformed_val_X)