Machine learning 如何使用Keras过度拟合数据?
我试图用keras和tensorflow建立一个简单的回归模型。在我的问题中,数据的形式是Machine learning 如何使用Keras过度拟合数据?,machine-learning,keras,neural-network,tf.keras,Machine Learning,Keras,Neural Network,Tf.keras,我试图用keras和tensorflow建立一个简单的回归模型。在我的问题中,数据的形式是(x,y),其中x和y只是数字。我想建立一个keras模型,以便使用x作为输入来预测y 因为我认为图像能更好地解释这件事,以下是我的数据: 我们可以讨论他们是好是坏,但在我的问题上,我真的无法欺骗他们 我的keras模型如下(数据分为30%测试(X\u测试,y\u测试)和70%训练(X\u列车,y\u列车)): 注:X包含X\u测试和X\u列车 绘制我得到的预测(蓝色方块是预测predict\u Y)
(x,y)
,其中x
和y
只是数字。我想建立一个keras模型,以便使用x
作为输入来预测y
因为我认为图像能更好地解释这件事,以下是我的数据:
我们可以讨论他们是好是坏,但在我的问题上,我真的无法欺骗他们
我的keras模型如下(数据分为30%测试(X\u测试,y\u测试)
和70%训练(X\u列车,y\u列车)
):
注:X
包含X\u测试
和X\u列车
绘制我得到的预测(蓝色方块是预测predict\u Y
)
我玩了很多层,激活功能和其他参数。我的目标是找到训练模型的最佳参数,但这里的实际问题略有不同:事实上,我很难强迫模型过度拟合数据(从上面的结果可以看出)
有人对如何重现过度拟合有什么想法吗
这是我希望得到的结果:
(红色圆点在蓝色方块下面!)
编辑:
这里我向您提供了上面示例中使用的数据:您可以将粘贴直接复制到python解释器:
X_train = [0.704619794270697, 0.6779457393024553, 0.8207082120250023, 0.8588819357831449, 0.8692320257603844, 0.6878750931810429, 0.9556331888763945, 0.77677964510883, 0.7211381534179618, 0.6438319113259414, 0.6478339581502052, 0.9710222750072649, 0.8952188423349681, 0.6303124926673513, 0.9640316662124185, 0.869691568491902, 0.8320164648420931, 0.8236399177660375, 0.8877334038470911, 0.8084042532069621, 0.8045680821762038]
y_train = [0.7766424210611557, 0.8210846773655833, 0.9996114311913593, 0.8041331063189883, 0.9980525368790883, 0.8164056182686034, 0.8925487603333683, 0.7758207470960685, 0.37345286573743475, 0.9325789202459493, 0.6060269037514895, 0.9319771743389491, 0.9990691225991941, 0.9320002808310418, 0.9992560731072977, 0.9980241561997089, 0.8882905258641204, 0.4678339275898943, 0.9312152374846061, 0.9542371205095945, 0.8885893668675711]
X_test = [0.9749191829308574, 0.8735366740730178, 0.8882783211709133, 0.8022891400991644, 0.8650601322313454, 0.8697902997857514, 1.0, 0.8165876695985228, 0.8923841531760973]
y_test = [0.975653685270635, 0.9096752789481569, 0.6653736469114154, 0.46367666660348744, 0.9991817903431941, 1.0, 0.9111205717076893, 0.5264993912088891, 0.9989199241685126]
X = [0.704619794270697, 0.77677964510883, 0.7211381534179618, 0.6478339581502052, 0.6779457393024553, 0.8588819357831449, 0.8045680821762038, 0.8320164648420931, 0.8650601322313454, 0.8697902997857514, 0.8236399177660375, 0.6878750931810429, 0.8923841531760973, 0.8692320257603844, 0.8877334038470911, 0.8735366740730178, 0.8207082120250023, 0.8022891400991644, 0.6303124926673513, 0.8084042532069621, 0.869691568491902, 0.9710222750072649, 0.9556331888763945, 0.8882783211709133, 0.8165876695985228, 0.6438319113259414, 0.8952188423349681, 0.9749191829308574, 1.0, 0.9640316662124185]
Y = [0.7766424210611557, 0.7758207470960685, 0.37345286573743475, 0.6060269037514895, 0.8210846773655833, 0.8041331063189883, 0.8885893668675711, 0.8882905258641204, 0.9991817903431941, 1.0, 0.4678339275898943, 0.8164056182686034, 0.9989199241685126, 0.9980525368790883, 0.9312152374846061, 0.9096752789481569, 0.9996114311913593, 0.46367666660348744, 0.9320002808310418, 0.9542371205095945, 0.9980241561997089, 0.9319771743389491, 0.8925487603333683, 0.6653736469114154, 0.5264993912088891, 0.9325789202459493, 0.9990691225991941, 0.975653685270635, 0.9111205717076893, 0.9992560731072977]
其中X
包含X值列表和Y
对应的Y值。(X_检验,y_检验)和(X_序列,y_序列)是(X,y)的两个(非重叠)子集
为了预测和显示模型结果,我只需使用matplotlib(作为plt导入):
如注释中所述,您应该制作一个Python数组(使用NumPy),如下所示:-
Myarray = [[0.65, 1], [0.85, 0.5], ....]
然后,您只需调用数组中需要预测的特定部分。这里的第一个值是x轴值。因此,您可以调用它来获取存储在Myarray
有很多资源可以学习这些类型的东西。其中一些是==>
您可能会遇到的一个问题是,您没有足够的训练数据使模型能够很好地拟合。在您的示例中,您只有21个训练实例,每个实例只有一个功能。广义地说,对于神经网络模型,您需要10K或更多的训练实例来生成一个合适的模型 考虑以下代码,该代码生成带噪正弦波,并尝试训练一个紧密连接的前馈神经网络来拟合数据。我的模型有两个线性层,每个层有50个隐藏单元和一个ReLU激活函数。实验用变量
num_points
参数化,我将增加该变量
将tensorflow导入为tf
从tensorflow进口keras
从tensorflow.keras导入图层
将numpy作为np导入
将matplotlib.pyplot作为plt导入
np.随机种子(7)
def生成_数据(num_points=100):
X=np.linspace(0.0,2.0*np.pi,num_点)。重塑(-1,1)
噪波=np.随机.正常(0,1,num_点).重塑(-1,1)
y=3*np.sin(X)+噪声
返回X,y
def运行试验(X_系列、y_系列、X_测试、批量大小=64):
num_points=X_train.shape[0]
模型=keras.Sequential()
添加(layers.Dense(50,输入_形状=(1),激活=(relu'))
model.add(layers.Dense(50,activation='relu'))
model.add(layers.Dense(1,activation='linear'))
compile(loss=“mse”,optimizer=“adam”,metrics=[“mse”])
历史=模型拟合(X_序列,y_序列,历代=10,
批次大小=批次大小,详细程度=0)
yhat=模型。预测(X_检验,批量大小=批量大小)
plt.图(figsize=(5,5))
plt.plt(X_列,y_列,“ro”,markersize=2,label='True')
plt.plt(X_列车,yhat,“bo”,markersize=1,label='Predicted')
plt.ylim(-5,5)
plt.title('N=%d点“%(num_点))
plt.legend()
plt.grid()
plt.show()
下面是我调用代码的方式:
num_points=100
十、 y=生成数据(点数)
运行_实验(X,y,X)
现在,如果我用num_points=100
进行实验,模型预测(蓝色)在拟合真实的有噪声正弦波(红色)方面做得很糟糕
现在,这里是num\u points=1000
:
以下是num\u points=10000
:
这里是num\u points=100000
:
如您所见,对于我选择的神经网络体系结构,添加更多的训练实例可以使神经网络更好地(过度)拟合数据
如果您确实有很多训练实例,那么如果您想要有目的地过度拟合数据,您可以增加神经网络容量或减少正则化。具体而言,您可以控制以下旋钮:
- 增加层数
- 增加隐藏单位的数量
- 增加每个数据实例的要素数量
- 减少正则化(例如,通过删除退出层)
- 使用更复杂的神经网络架构(例如,变压器块而不是RNN)
- 普遍逼近定理李>
- 张2016,“理解深度学习需要反思概括”李>
predict_Y = model.predict(X)
plt.plot(X, Y, "ro", X, predict_Y, "bs")
plt.show()
Myarray = [[0.65, 1], [0.85, 0.5], ....]
x y_true y_pred error
0 0.704620 0.776642 0.773753 -0.002889
1 0.677946 0.821085 0.819597 -0.001488
2 0.820708 0.999611 0.999813 0.000202
3 0.858882 0.804133 0.805160 0.001026
4 0.869232 0.998053 0.997862 -0.000190
5 0.687875 0.816406 0.814692 -0.001714
6 0.955633 0.892549 0.893117 0.000569
7 0.776780 0.775821 0.779289 0.003469
8 0.721138 0.373453 0.374007 0.000554
9 0.643832 0.932579 0.912565 -0.020014
10 0.647834 0.606027 0.607253 0.001226
11 0.971022 0.931977 0.931549 -0.000428
12 0.895219 0.999069 0.999051 -0.000018
13 0.630312 0.932000 0.930252 -0.001748
14 0.964032 0.999256 0.999204 -0.000052
15 0.869692 0.998024 0.997859 -0.000165
16 0.832016 0.888291 0.887883 -0.000407
17 0.823640 0.467834 0.460728 -0.007106
18 0.887733 0.931215 0.932790 0.001575
19 0.808404 0.954237 0.960282 0.006045
20 0.804568 0.888589 0.906829 0.018240
{'me': -0.00015776709314323828,
'mae': 0.00329163070145315,
'mse': 4.0713782563067185e-05,
'rmse': 0.006380735268216915}
import numpy as np
import pandas as pd
import tensorflow as tf
from tensorflow.keras.layers import Input, Dense, BatchNormalization
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import Adam
import matplotlib.pyplot as plt
# Set seed just to have reproducible results
np.random.seed(84)
tf.random.set_seed(84)
# Load data from the post
# https://stackoverflow.com/questions/61252785/how-to-overfit-data-with-keras
X_train = np.array([0.704619794270697, 0.6779457393024553, 0.8207082120250023,
0.8588819357831449, 0.8692320257603844, 0.6878750931810429,
0.9556331888763945, 0.77677964510883, 0.7211381534179618,
0.6438319113259414, 0.6478339581502052, 0.9710222750072649,
0.8952188423349681, 0.6303124926673513, 0.9640316662124185,
0.869691568491902, 0.8320164648420931, 0.8236399177660375,
0.8877334038470911, 0.8084042532069621,
0.8045680821762038])
Y_train = np.array([0.7766424210611557, 0.8210846773655833, 0.9996114311913593,
0.8041331063189883, 0.9980525368790883, 0.8164056182686034,
0.8925487603333683, 0.7758207470960685,
0.37345286573743475, 0.9325789202459493,
0.6060269037514895, 0.9319771743389491, 0.9990691225991941,
0.9320002808310418, 0.9992560731072977, 0.9980241561997089,
0.8882905258641204, 0.4678339275898943, 0.9312152374846061,
0.9542371205095945, 0.8885893668675711])
X_test = np.array([0.9749191829308574, 0.8735366740730178, 0.8882783211709133,
0.8022891400991644, 0.8650601322313454, 0.8697902997857514,
1.0, 0.8165876695985228, 0.8923841531760973])
Y_test = np.array([0.975653685270635, 0.9096752789481569, 0.6653736469114154,
0.46367666660348744, 0.9991817903431941, 1.0,
0.9111205717076893, 0.5264993912088891, 0.9989199241685126])
X = np.array([0.704619794270697, 0.77677964510883, 0.7211381534179618,
0.6478339581502052, 0.6779457393024553, 0.8588819357831449,
0.8045680821762038, 0.8320164648420931, 0.8650601322313454,
0.8697902997857514, 0.8236399177660375, 0.6878750931810429,
0.8923841531760973, 0.8692320257603844, 0.8877334038470911,
0.8735366740730178, 0.8207082120250023, 0.8022891400991644,
0.6303124926673513, 0.8084042532069621, 0.869691568491902,
0.9710222750072649, 0.9556331888763945, 0.8882783211709133,
0.8165876695985228, 0.6438319113259414, 0.8952188423349681,
0.9749191829308574, 1.0, 0.9640316662124185])
Y = np.array([0.7766424210611557, 0.7758207470960685, 0.37345286573743475,
0.6060269037514895, 0.8210846773655833, 0.8041331063189883,
0.8885893668675711, 0.8882905258641204, 0.9991817903431941, 1.0,
0.4678339275898943, 0.8164056182686034, 0.9989199241685126,
0.9980525368790883, 0.9312152374846061, 0.9096752789481569,
0.9996114311913593, 0.46367666660348744, 0.9320002808310418,
0.9542371205095945, 0.9980241561997089, 0.9319771743389491,
0.8925487603333683, 0.6653736469114154, 0.5264993912088891,
0.9325789202459493, 0.9990691225991941, 0.975653685270635,
0.9111205717076893, 0.9992560731072977])
# Reshape all data to be of the shape (batch_size, 1)
X_train = X_train.reshape((-1, 1))
Y_train = Y_train.reshape((-1, 1))
X_test = X_test.reshape((-1, 1))
Y_test = Y_test.reshape((-1, 1))
X = X.reshape((-1, 1))
Y = Y.reshape((-1, 1))
# Is data scaled? NNs do well with bounded data.
assert np.all(X_train >= 0) and np.all(X_train <= 1)
assert np.all(Y_train >= 0) and np.all(Y_train <= 1)
assert np.all(X_test >= 0) and np.all(X_test <= 1)
assert np.all(Y_test >= 0) and np.all(Y_test <= 1)
assert np.all(X >= 0) and np.all(X <= 1)
assert np.all(Y >= 0) and np.all(Y <= 1)
# Build a model with variable number of hidden layers.
# We will use Keras functional API.
# https://www.perfectlyrandom.org/2019/06/24/a-guide-to-keras-functional-api/
n_dense_layers = 30 # increase this to get more complicated models
# Define the layers first.
input_tensor = Input(shape=(1,), name='input')
layers = []
for i in range(n_dense_layers):
layers += [Dense(units=50, activation='relu', name=f'dense_layer_{i}')]
if (i > 0) & (i % 5 == 0):
# avg over batches not features
layers += [BatchNormalization(axis=1)]
sigmoid_layer = Dense(units=1, activation='sigmoid', name='sigmoid_layer')
# Connect the layers using Keras Functional API
mid_layer = input_tensor
for dense_layer in layers:
mid_layer = dense_layer(mid_layer)
output_tensor = sigmoid_layer(mid_layer)
model = Model(inputs=[input_tensor], outputs=[output_tensor])
optimizer = Adam(learning_rate=0.0005)
model.compile(optimizer=optimizer, loss='mae', metrics=['mae'])
model.fit(x=[X_train], y=[Y_train], epochs=40000, batch_size=21)
# Predict on various datasets
Y_train_pred = model.predict(X_train)
# Create a dataframe to inspect results manually
train_df = pd.DataFrame({
'x': X_train.reshape((-1)),
'y_true': Y_train.reshape((-1)),
'y_pred': Y_train_pred.reshape((-1))
})
train_df['error'] = train_df['y_pred'] - train_df['y_true']
print(train_df)
# A dictionary to store all the errors in one place.
train_errors = {
'me': np.mean(train_df['error']),
'mae': np.mean(np.abs(train_df['error'])),
'mse': np.mean(np.square(train_df['error'])),
'rmse': np.sqrt(np.mean(np.square(train_df['error']))),
}
print(train_errors)
# Make a plot to visualize true vs predicted
plt.figure(1)
plt.clf()
plt.plot(train_df['x'], train_df['y_true'], 'r.', label='y_true')
plt.plot(train_df['x'], train_df['y_pred'], 'bo', alpha=0.25, label='y_pred')
plt.grid(True)
plt.xlabel('x')
plt.ylabel('y')
plt.title(f'Train data. MSE={np.round(train_errors["mse"], 5)}.')
plt.legend()
plt.show(block=False)
plt.savefig('true_vs_pred.png')