Python 通过深度学习突出句子中的重要单词_Python_Tensorflow_Keras_Lstm_Pytorch

Python 通过深度学习突出句子中的重要单词

python tensorflow keras pytorch

Python 通过深度学习突出句子中的重要单词,python,tensorflow,keras,lstm,pytorch,Python,Tensorflow,Keras,Lstm,Pytorch,我试图突出imdb数据集中的重要词语，这些词语最终有助于情绪分析预测数据集如下所示： X_训练-作为字符串的复习 Y_列车-0或1 现在，在使用手套嵌入嵌入X_序列值后，我可以将其输入神经网络现在我的问题是，如何突出最重要的单词？就像deepmoji.mit.edu 我试过什么：我试着把输入的句子分成双格，然后用一维CNN来训练。稍后，当我们想找到X_测试的重要单词时，我们只需将X_测试拆分为双格图，并找出它们的概率。它有效，但不准确我尝试使用预先构建的分级注意网络，并成功了。我得到了我

我试图突出imdb数据集中的重要词语，这些词语最终有助于情绪分析预测

数据集如下所示：

X_训练-作为字符串的复习

Y_列车-0或1

现在，在使用手套嵌入嵌入X_序列值后，我可以将其输入神经网络

现在我的问题是，如何突出最重要的单词？就像deepmoji.mit.edu

我试过什么：

我试着把输入的句子分成双格，然后用一维CNN来训练。稍后，当我们想找到X_测试的重要单词时，我们只需将X_测试拆分为双格图，并找出它们的概率。它有效，但不准确

我尝试使用预先构建的分级注意网络，并成功了。我得到了我想要的，但我无法从代码中找出每一行和概念。这对我来说就像一个黑匣子

我知道神经网络是如何工作的，我可以用numpy和手动反向传播从头开始编写它。我对lstm如何工作以及遗忘、更新和输出门实际输出的内容有详细的了解。但我仍然不知道如何提取注意力权重，以及如何将数据制作成3D数组（2D数据中的时间步长是多少？）

因此，任何类型的指导都是受欢迎的

这里有一个值得关注的版本（不是层次化的），但是你应该能够找到如何使其与层次化一起工作的方法-如果没有，我也可以提供帮助。诀窍是定义两个模型，其中一个用于训练（模型），另一个用于提取注意力值（模型带有注意力输出）：

输出将是numpy数组，每个单词都有注意值-值越高，单词越重要

编辑：您可能想用EMB替换乘法中的lstm，以获得更好的解释，但这将导致更差的性能…

也许交叉验证网络可能更适合这个问题：@Cesar我会检查它，我知道了这个概念，但在运行代码时我遇到了一个奇怪的错误。-我的代码版本。错误是：-“Node”对象没有属性“output_masks”，您可以在github或pastebin中提供完整代码的链接吗？我犯了很多错误。谢谢1另外，lstm的输出是维度（？，10），对于注意值来说，它是（？，1）随时间分布密集的softmax层输出。因此，当根据文档进行乘法时，keras.backend.Multiply（）接受相同维度的所有输入张量。我是不是遗漏了什么？很确定你所得到的错误是因为你将Keras操作转换为Tensorflow操作-尤其是乘法层是有问题的，因为你甚至没有将它包装成lambda层。混合Tensorflow和Keras操作是危险的，因为Keras构建了自己的图形，并且经常将任意Tensorflow代码插入其中是不起作用的。Multiply docs可能没有指定这种行为，但它肯定能正常工作：10个输出中的每一个都会从各自的attention timestep中乘以1个值-类似于numpy操作工作

# Tensorflow 1.9; Keras 2.2.0 (latest versions)
# should be backwards compatible upto Keras 2.0.9 and tf 1.5
from keras.models import Model
from keras.layers import *
import numpy as np

dictionary_size=1000

def create_models():
  #Get a sequence of indexes of words as input:
  # Keras supports dynamic input lengths if you provide (None,) as the 
  #  input shape
  inp = Input((None,))
  #Embed words into vectors of size 10 each:
  # Output shape is (None,10)
  embs = Embedding(dictionary_size, 10)(inp)
  # Run LSTM on these vectors and return output on each timestep
  # Output shape is (None,5)
  lstm = LSTM(5, return_sequences=True)(embs)
  ##Attention Block
  #Transform each timestep into 1 value (attention_value) 
  # Output shape is (None,1)
  attention = TimeDistributed(Dense(1))(lstm)
  #By running softmax on axis 1 we force attention_values
  # to sum up to 1. We are effectively assigning a "weight" to each timestep
  # Output shape is still (None,1) but each value changes
  attention_vals = Softmax(axis=1)(attention)
  # Multiply the encoded timestep by the respective weight
  # I.e. we are scaling each timestep based on its weight
  # Output shape is (None,5): (None,5)*(None,1)=(None,5)
  scaled_vecs = Multiply()([lstm,attention_vals])
  # Sum up all scaled timesteps into 1 vector 
  # i.e. obtain a weighted sum of timesteps
  # Output shape is (5,) : Observe the time dimension got collapsed
  context_vector = Lambda(lambda x: K.sum(x,axis=1))(scaled_vecs)
  ##Attention Block over
  # Get the output out
  out = Dense(1,activation='sigmoid')(context_vector)

  model = Model(inp, out)
  model_with_attention_output = Model(inp, [out, attention_vals])
  model.compile(optimizer='adam',loss='binary_crossentropy')
  return model, model_with_attention_output

model,model_with_attention_output = create_models()


model.fit(np.array([[1,2,3]]),[1],batch_size=1)
print ('Attention Over each word: ',model_with_attention_output.predict(np.array([[1,2,3]]),batch_size=1)[1])