Keras LSTM序列分类-损失和验证损失减少了琐碎的数量,准确性和验证准确性保持不变
在我正式放弃将深度学习应用于股票预测的希望之前,我希望能有一双快速的第二双眼睛 目标是使用LSTM预测两类中的一类。正类对应于在接下来的六个时期内导致价格上涨5%或以上的序列,负类对应于没有上涨的序列。正如预期的那样,这导致了一点阶级不平衡,比例约为6:1负到正但现在的问题是,该模型在所有时代都显示出相同的准确性,并且只预测了负面类别。这让我觉得我的模型结构可能有问题。输入为adataframe,其中包括价格数据和少量移动平均数:Keras LSTM序列分类-损失和验证损失减少了琐碎的数量,准确性和验证准确性保持不变,keras,classification,lstm,sequence,stock,Keras,Classification,Lstm,Sequence,Stock,在我正式放弃将深度学习应用于股票预测的希望之前,我希望能有一双快速的第二双眼睛 目标是使用LSTM预测两类中的一类。正类对应于在接下来的六个时期内导致价格上涨5%或以上的序列,负类对应于没有上涨的序列。正如预期的那样,这导致了一点阶级不平衡,比例约为6:1负到正但现在的问题是,该模型在所有时代都显示出相同的准确性,并且只预测了负面类别。这让我觉得我的模型结构可能有问题。输入为adataframe,其中包括价格数据和少量移动平均数: price_open p
price_open price_high price_low price_close ma_8 ma_13 ma_21 ma_55 6prd_pctchange entry_flag
time_period_start
11-02-2016 23:00 10.83280 10.98310 10.72591 10.96000 10.932415 10.855693 10.960608 11.087525 0.008535 0.0
11-03-2016 03:00 10.96016 11.02560 10.96000 11.00003 10.937569 10.873219 10.948081 11.075059 0.004544 0.0
11-03-2016 07:00 11.00007 11.14997 10.91000 11.00006 10.954170 10.919378 10.929689 11.062878 -0.007442 0.0
11-03-2016 11:00 11.05829 11.14820 10.90001 10.99208 10.959396 10.923376 10.912183 11.057317 0.008392 0.0
11-03-2016 15:00 10.90170 11.03112 10.70000 10.91529 10.938490 10.933783 10.890906 11.048504 0.006289 0.0
11-03-2016 19:00 10.89420 10.95000 10.82460 10.94980 10.944640 10.947429 10.882745 11.041227 0.005234 0.0
11-03-2016 23:00 10.94128 11.08475 10.88404 11.08475 10.974350 10.957118 10.888859 11.032288 0.011382 0.0
11-04-2016 03:00 11.02761 11.22778 10.94360 10.99813 10.987517 10.967185 10.893531 11.023518 -0.000173 0.0
11-04-2016 07:00 10.95076 11.01814 10.92000 10.92100 10.982642 10.964934 10.904055 11.011691 -0.007187 0.0
11-04-2016 11:00 10.94511 11.06298 10.89000 10.99557 10.982085 10.958244 10.914692 11.000365 0.000318 0.0
并已转换为长度为6个周期的numpy数组,并使用scikit学习方法MinMaxScaler
进行规范化。例如,第一个序列如下所示:
array([[0. , 0.16552483, 0.09965385, 0.52742716, 0. ,
0. , 1. , 1. ],
[0.5648144 , 0.37805671, 1. , 0.9996461 , 0.19101228,
0.19104958, 0.83911884, 0.73073358],
[0.74180673, 1. , 0.80769231, 1. , 0.80630067,
0.69421501, 0.60290376, 0.46764059],
[1. , 0.99114867, 0.76926923, 0.90586292, 1. ,
0.73780155, 0.37807623, 0.34751414],
[0.30555679, 0.40566085, 0. , 0. , 0.22515636,
0.85124563, 0.104818 , 0.15716305],
[0.27229589, 0. , 0.47923077, 0.40710157, 0.45309243,
1. , 0. , 0. ]])
当我在这些序列上构建、编译和拟合一个模型时,我的结果很快稳定下来,模型最终只能预测负类
# Constants:
loss = 'binary_crossentropy'
optimizer = 'adam'
epochs = 12
batch_size = 300
# Complie model:
model = Sequential()
model.add(LSTM(100))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss=loss, optimizer=optimizer, metrics=['accuracy'])
results = model.fit(X_train, y_train, epochs=epochs, batch_size=batch_size, verbose=1, validation_data=(X_test, y_test), shuffle=False)
model.summary()
它输出:
Epoch 1/12
22/22 [==============================] - 0s 16ms/step - loss: 0.5696 - accuracy: 0.8410 - val_loss: 0.3953 - val_accuracy: 0.8885
Epoch 2/12
22/22 [==============================] - 0s 10ms/step - loss: 0.4355 - accuracy: 0.8473 - val_loss: 0.3569 - val_accuracy: 0.8885
Epoch 3/12
22/22 [==============================] - 0s 9ms/step - loss: 0.4379 - accuracy: 0.8473 - val_loss: 0.3612 - val_accuracy: 0.8885
Epoch 4/12
22/22 [==============================] - 0s 9ms/step - loss: 0.4320 - accuracy: 0.8473 - val_loss: 0.3554 - val_accuracy: 0.8885
Epoch 5/12
22/22 [==============================] - 0s 10ms/step - loss: 0.4338 - accuracy: 0.8473 - val_loss: 0.3577 - val_accuracy: 0.8885
Epoch 6/12
22/22 [==============================] - 0s 10ms/step - loss: 0.4297 - accuracy: 0.8473 - val_loss: 0.3554 - val_accuracy: 0.8885
Epoch 7/12
22/22 [==============================] - 0s 9ms/step - loss: 0.4303 - accuracy: 0.8473 - val_loss: 0.3570 - val_accuracy: 0.8885
Epoch 8/12
22/22 [==============================] - 0s 9ms/step - loss: 0.4273 - accuracy: 0.8473 - val_loss: 0.3558 - val_accuracy: 0.8885
Epoch 9/12
22/22 [==============================] - 0s 9ms/step - loss: 0.4285 - accuracy: 0.8473 - val_loss: 0.3577 - val_accuracy: 0.8885
Epoch 10/12
22/22 [==============================] - 0s 9ms/step - loss: 0.4254 - accuracy: 0.8473 - val_loss: 0.3565 - val_accuracy: 0.8885
Epoch 11/12
22/22 [==============================] - 0s 9ms/step - loss: 0.4270 - accuracy: 0.8473 - val_loss: 0.3581 - val_accuracy: 0.8885
Epoch 12/12
22/22 [==============================] - 0s 9ms/step - loss: 0.4243 - accuracy: 0.8473 - val_loss: 0.3569 - val_accuracy: 0.8885
Model: "sequential_6"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
lstm_6 (LSTM) (None, 100) 42400
_________________________________________________________________
dense_6 (Dense) (None, 1) 101
=================================================================
快速检查表明,它只预测了负类:
predictions = model.predict(X_test)
predictions_round = [1 if x > 0.5 else 0 for x in predictions]
pd.Series(predictions_round).value_counts()
0 1641
dtype: int64
我首先要说,这可能是因为预测股价切入点是一项充满噪音的任务。但我也希望该模型至少会做出一些错误的猜测,而不是简单地猜测同一类。对我来说,这似乎是我构建模型或构建输入的方式的问题
X_-train.shape
和y_-train.shape
分别给我(6561,6,8)
和(6561,)
提前感谢您的帮助 培训数据中“阳性”病例的流行率是多少?你所看到的可能是“不平衡的阶级”的问题。尽管如此,这种股票预测确实不太可能让你致富。任何可能出现的信号都是更早地被提取出来的,更复杂的模型,计算能力远远超过你,而观察到的股价,几乎可以定义为剩余随机性。它确定了sigmoid和ReLU激活在这里是如何交互的…sigmoid+舍入为0表示任何<≈ 10也许可以试试softmax?@MatthiasWinkelmann我在原始帖子中回应过。谢谢你的帮助。