Python LSTM:添加编码器';将隐藏状态添加到解码器以提高性能
我正在尝试将LSTM的隐藏状态从编码器层转移到解码器层,如中所示 我的数据是随机生成的正弦波(也就是说,波长和相位是随机确定的,序列的长度也是随机确定的),网络经过训练,可以接收许多正弦波并预测它们的进程 在不转移隐藏状态的情况下,我的代码如下:Python LSTM:添加编码器';将隐藏状态添加到解码器以提高性能,python,machine-learning,keras,lstm,recurrent-neural-network,Python,Machine Learning,Keras,Lstm,Recurrent Neural Network,我正在尝试将LSTM的隐藏状态从编码器层转移到解码器层,如中所示 我的数据是随机生成的正弦波(也就是说,波长和相位是随机确定的,序列的长度也是随机确定的),网络经过训练,可以接收许多正弦波并预测它们的进程 在不转移隐藏状态的情况下,我的代码如下: from keras.models import Model from keras.layers import Input, LSTM, Dense, TimeDistributed,Lambda, Dropout, Activation ,Repea
from keras.models import Model
from keras.layers import Input, LSTM, Dense, TimeDistributed,Lambda, Dropout, Activation ,RepeatVector
from keras.callbacks import ModelCheckpoint
import numpy as np
features_num=5
encoder_inputs = Input(shape=(None, features_num))
encoder = LSTM(40, return_state=False)
encoder_outputs= encoder(encoder_inputs)
decoder_input=RepeatVector(150)(encoder_outputs)
decoder_lstm = LSTM(40, return_sequences=True, return_state=True)
decoder_outputs,_,_=decoder_lstm(decoder_input)
decoder_outputs=TimeDistributed(Dense(features_num))(decoder_outputs)
model = Model(encoder_inputs, decoder_outputs)
print(model.summary())
model.compile(loss='mean_squared_error', optimizer='adam')
def create_wavelength(min_wavelength, max_wavelength, fluxes_in_wavelength, category ) :
#category :: 0 - train ; 2 - validate ; 4- test. 1;3;5 - dead space
c=(category+np.random.random())/6
k = fluxes_in_wavelength
#
base= (np.trunc(k*np.random.random()*(max_wavelength-min_wavelength)) +k*min_wavelength) /k
answer=base+c/k
return (answer)
def make_line(length,category):
shift= np.random.random()
wavelength = create_wavelength(30,10,1,category)
a=np.arange(length)
answer=np.sin(a/wavelength+shift)
return answer
def make_data(seq_num,seq_len,dim,category):
data=np.array([]).reshape(0,seq_len,dim)
for i in range (seq_num):
mini_data=np.array([]).reshape(0,seq_len)
for j in range (dim):
line = make_line(seq_len,category)
line=line.reshape(1,seq_len)
mini_data=np.append(mini_data,line,axis=0)
mini_data=np.swapaxes(mini_data,1,0)
mini_data=mini_data.reshape(1,seq_len,dim)
data=np.append(data,mini_data,axis=0)
return (data)
def train_generator():
while True:
sequence_length = np.random.randint(150, 300)+150
data=make_data(1000,sequence_length,features_num,0) # category=0 in train
x_train = data[:,:-150,:] # all but last 150
y_train = (data[:, -150:, :]) # last 150
yield x_train, y_train
def val_generator():
while True:
sequence_length = np.random.randint(150, 300)+150
data=make_data(1000,sequence_length,features_num,2) # category=2 in val
x_val = data[:,:-150,:] # all but last 150
y_val = (data[:, -150:, :]) # last 150
yield x_val, y_val
def test_maker():
if True:
sequence_length = np.random.randint(150, 300)+150
data=make_data(1000,sequence_length,features_num,4) # category=4 in test
x_test = data[:,:-150,:] # all but last 150
y_test = (data[:, -150:, :]) # last 150
return x_test, y_test
filepath_for_w= 'flux_vi_model.h5'
checkpointer=ModelCheckpoint(filepath_for_w, monitor='val_loss', verbose=0, save_best_only=True, mode='auto', period=1)
model.fit_generator(train_generator(),callbacks=[checkpointer], steps_per_epoch=30, epochs=1000, verbose=1,validation_data=val_generator(),validation_steps=30)
model.save('filepath_for_w')
x,y= test_maker()
a=model.predict (x)
np.save ('a.npy',a)
np.save ('y.npy',y)
np.save ('x.npy',x)
print (np.mean(np.absolute(y-a)))
结果是正弦波的实际150个点与预测值之间的距离
对于这段代码,我收到的结果是0.065
当我试图利用LSTM的隐藏状态时,令我惊讶的是,我的结果恶化了。我使用相同的代码,将模型替换为:
encoder_inputs = Input(shape=(None, features_num))
encoder = LSTM(40, return_state=True)
encoder_outputs, state_h, state_c = encoder(encoder_inputs)
encoder_states = [state_h, state_c]
decoder_input=RepeatVector(150)(encoder_outputs)
decoder_lstm = LSTM(40, return_sequences=True, return_state=True)
decoder_outputs,_,_=decoder_lstm(decoder_input, initial_state=encoder_states)
decoder_outputs=TimeDistributed(Dense(features_num))(decoder_outputs)
结果为0.101,表明在访问编码器的隐藏状态时,预测正弦波连续性的能力降低
在这种情况下,我的方法是错误的吗?隐藏状态不能用于改进预测?还是我构建的模型不正确 你能参考一份文件/报告吗?该报告显示,这种将隐藏状态从编码器转换为解码器的方法是有效的(改善了一切)?据我所知,有一种权重绑定方法,其中解码器权重是编码器的转置(在密集层中,使用LSTM选通时有更多的自适应)。这被用于正则化,并且确实被报道改善了NLP问题的结果。通常用于嵌入层。我不熟悉在预测正弦波的上下文中使用此方法的工作(当然,这是一个玩具问题),但在我提供的链接中,此方法用于在机器翻译上下文中连接编码器LSTM和解码器LSTM。