Python LSTM模型问题

Python LSTM模型问题,python,jupyter-notebook,lstm,Python,Jupyter Notebook,Lstm,作为一项任务,我将研究不同的股票预测模型,以比较它们的表现。在这种情况下,我的目标是让模型依赖于预测输入参数,而不是最后的值。 我试图实现一个LSTM模型,但总是出现以下错误: 'ValueError:找到样本数不一致的输入变量:[12811300]' 有人建议如何克服这个问题吗 请参阅下面的代码: y = df_DJI['Close'].iloc[1:].dropna() X = df_DJI.drop(['Close'],axis=1).iloc[1:].dropna() look_bac

作为一项任务,我将研究不同的股票预测模型,以比较它们的表现。在这种情况下,我的目标是让模型依赖于预测输入参数,而不是最后的值。 我试图实现一个LSTM模型,但总是出现以下错误: 'ValueError:找到样本数不一致的输入变量:[12811300]'

有人建议如何克服这个问题吗

请参阅下面的代码:

y = df_DJI['Close'].iloc[1:].dropna()
X = df_DJI.drop(['Close'],axis=1).iloc[1:].dropna()

look_back = 40
forward_days = 10
num_periods = 20

from tensorflow import keras
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation
from keras.layers import Embedding
from keras.layers import LSTM
from sklearn.model_selection import train_test_split
from sklearn.feature_selection import SelectFromModel
from sklearn.ensemble import RandomForestRegressor
from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import MinMaxScaler
#model = keras.Sequential()

X_train, X_test,y_train, y_test = train_test_split(X,y,test_size=0.2,random_state=0)
NUM_NEURONS_FirstLayer = 128
NUM_NEURONS_SecondLayer = 64
EPOCHS = 220

#Build the model
model = Sequential()
model.add(LSTM(NUM_NEURONS_FirstLayer,input_shape=(look_back,1), return_sequences=True))
model.add(LSTM(NUM_NEURONS_SecondLayer,input_shape=(NUM_NEURONS_FirstLayer,1)))
model.add(Dense(10))
model.compile(loss='mean_squared_error', optimizer='adam')
history = model.fit(X_train,y_train,epochs=EPOCHS,validation_data=(X_validate,y_validate),shuffle=True,batch_size=2, verbose=2)
错误输出为:

   ---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-42-0312bf3a55cf> in <module>
     18 #model = keras.Sequential()
     19 
---> 20 X_train, X_test,y_train, y_test = train_test_split(X,y,test_size=0.2,random_state=0)
     21 NUM_NEURONS_FirstLayer = 128
     22 NUM_NEURONS_SecondLayer = 64

~/opt/anaconda3/lib/python3.8/site-packages/sklearn/model_selection/_split.py in train_test_split(*arrays, **options)
   2125         raise TypeError("Invalid parameters passed: %s" % str(options))
   2126 
-> 2127     arrays = indexable(*arrays)
   2128 
   2129     n_samples = _num_samples(arrays[0])

~/opt/anaconda3/lib/python3.8/site-packages/sklearn/utils/validation.py in indexable(*iterables)
    291     """
    292     result = [_make_indexable(X) for X in iterables]
--> 293     check_consistent_length(*result)
    294     return result
    295 

~/opt/anaconda3/lib/python3.8/site-packages/sklearn/utils/validation.py in check_consistent_length(*arrays)
    254     uniques = np.unique(lengths)
    255     if len(uniques) > 1:
--> 256         raise ValueError("Found input variables with inconsistent numbers of"
    257                          " samples: %r" % [int(l) for l in lengths])
    258 

ValueError: Found input variables with inconsistent numbers of samples: [1281, 1300]
---------------------------------------------------------------------------
ValueError回溯(最近一次调用上次)
在里面
18#model=keras.Sequential()
19
--->20 X_序列,X_测试,y_序列,y_测试=序列测试分割(X,y,测试大小=0.2,随机状态=0)
21个神经元\u第一层=128
22个神经元第二层=64
~/opt/anaconda3/lib/python3.8/site-packages/sklearn/model\u selection//u split.py in train\u test\u split(*数组,**选项)
2125 raise TypeError(“传递的参数无效:%s”%str(选项))
2126
->2127数组=可转位(*数组)
2128
2129 n_samples=_num_samples(数组[0])
可索引(*iterables)中的~/opt/anaconda3/lib/python3.8/site-packages/sklearn/utils/validation.py
291     """
292结果=[[u使X在iterables中可索引(X)]
-->293检查长度是否一致(*结果)
294返回结果
295
~/opt/anaconda3/lib/python3.8/site-packages/sklearn/utils/validation.py in check\u constant\u length(*数组)
254唯一性=np.唯一性(长度)
255如果len(uniques)>1:
-->256 raise VALUERROR(“找到的输入变量的数量不一致”)
257“样本:%r”%[int(l)表示长度为l的样本])
258
ValueError:找到样本数不一致的输入变量:[12811300]

由于
列车测试分割的输入必须具有相同的长度/形状,因此会出现错误(请参阅)。
因此,我假设删除NAN会导致
X
y
的形状不相等,因为除
'Close'
之外的一列包含更多的NAN

解决方案:在拆分到
X
y
之前,通过以下方式放下NAN:

df_DJI.dropna(inplace=True)
y = df_DJI["Close"].iloc[1:]
X = df_DJI.drop(["Close"], axis=1).iloc[1:]