Python 我是否需要再次预处理新数据以预测模型?
我有一个保存模型,我想为新的数据预测加载模型。我有新的数据,我预测了模型,但是预测的结果是完全错误的。我是否需要再次预处理新数据以预测模型 这是我的保存型号代码:Python 我是否需要再次预处理新数据以预测模型?,python,Python,我有一个保存模型,我想为新的数据预测加载模型。我有新的数据,我预测了模型,但是预测的结果是完全错误的。我是否需要再次预处理新数据以预测模型 这是我的保存型号代码: import numpy as np from numpy import loadtxt import matplotlib.pyplot as plt import pandas as pd dataset = pd.read_csv('Data_Sensor.csv') dataset.head()
import numpy as np
from numpy import loadtxt
import matplotlib.pyplot as plt
import pandas as pd
dataset = pd.read_csv('Data_Sensor.csv')
dataset.head()
X = dataset.iloc[:, 1:3].values
y = dataset.iloc[:, -1].values
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
from sklearn.compose import ColumnTransformer
labelencoder_y = LabelEncoder()
y = labelencoder_y.fit_transform(y)
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
model = Sequential
model.add(Dense(units = 6, kernel_initializer = 'uniform', activation = 'relu', input_dim = 2))
model.add(Dense(units = 6, kernel_initializer = 'uniform', activation = 'relu'))
model.add(Dense(units = 1, kernel_initializer = 'uniform', activation = 'sigmoid'))
model.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])
model.fit(X_train, y_train, batch_size = 10, epochs = 100)
model.save('model.h5')
y_pred = model.predict(X_test)
print(y_pred)
y_pred = (y_pred > 0.5)
print(y_pred)
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)
import numpy as np
import pandas as pd
import sklearn
from tensorflow.keras.models import load_model
model = load_model('model.h5')
import pandas as pd
dataset = pd.read_csv('Data_Sensor.csv')
dataset.head()
X = dataset.iloc[:, 1:3].values
print(X)
model.predict(X)
这是我预测新数据的负载模型,但结果是错误的:
import numpy as np
from numpy import loadtxt
import matplotlib.pyplot as plt
import pandas as pd
dataset = pd.read_csv('Data_Sensor.csv')
dataset.head()
X = dataset.iloc[:, 1:3].values
y = dataset.iloc[:, -1].values
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
from sklearn.compose import ColumnTransformer
labelencoder_y = LabelEncoder()
y = labelencoder_y.fit_transform(y)
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
model = Sequential
model.add(Dense(units = 6, kernel_initializer = 'uniform', activation = 'relu', input_dim = 2))
model.add(Dense(units = 6, kernel_initializer = 'uniform', activation = 'relu'))
model.add(Dense(units = 1, kernel_initializer = 'uniform', activation = 'sigmoid'))
model.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])
model.fit(X_train, y_train, batch_size = 10, epochs = 100)
model.save('model.h5')
y_pred = model.predict(X_test)
print(y_pred)
y_pred = (y_pred > 0.5)
print(y_pred)
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)
import numpy as np
import pandas as pd
import sklearn
from tensorflow.keras.models import load_model
model = load_model('model.h5')
import pandas as pd
dataset = pd.read_csv('Data_Sensor.csv')
dataset.head()
X = dataset.iloc[:, 1:3].values
print(X)
model.predict(X)
是的,新数据必须在预测之前进行预处理,就像对训练数据进行预处理一样 例如,您需要保留已安装的
StandardScaler
,例如使用get_params
和set_params
将其还原
正如下面的评论所建议的,使用Keras进行此操作的更好方法是在模型的开头添加一个层。这与标准缩放器进行相同的转换,并与模型的其余部分一起保存:
model = Sequential()
model.add(BatchNormalization())
model.add(Dense(units = 6, kernel_initializer = 'uniform', activation = 'relu', input_dim = 2))
model.add(Dense(units = 6, kernel_initializer = 'uniform', activation = 'relu'))
model.add(Dense(units = 1, kernel_initializer = 'uniform', activation = 'sigmoid'))
有些模型有一个正则化层可以为你做这件事。我知道,但如果OP已经要再培训,我们可以简单地添加
tf.keras.layers.BatchNormalization
@Yonlif这是对OP的有效建议,也许可以添加它作为对问题的注释:)@Yonlif或者让我把它添加到我的答案中