使用Word2Vec对字符串列表进行矢量化，以馈送至keras顺序层_Keras_Nlp_Tokenize_Word2vec_Fasttext

使用Word2Vec对字符串列表进行矢量化，以馈送至keras顺序层

keras nlp

使用Word2Vec对字符串列表进行矢量化，以馈送至keras顺序层,keras,nlp,tokenize,word2vec,fasttext,Keras,Nlp,Tokenize,Word2vec,Fasttext,我正试图用fastText构建一个定制的单词嵌入模型，该模型将我的数据（句子列表）表示为向量，这样我就可以将其“馈送”给Keras CNN进行辱骂性语言检测我的标记化数据存储在如下列表中： data = [['is', 'this', 'a', 'news', 'if', 'you', 'have', 'no', 'news', 'than', 'shutdown',

我正试图用fastText构建一个定制的单词嵌入模型，该模型将我的数据（句子列表）表示为向量，这样我就可以将其“馈送”给Keras CNN进行辱骂性语言检测

我的标记化数据存储在如下列表中：

data = [['is',
      'this',
      'a',
      'news',
      'if',
      'you',
      'have',
      'no',
      'news',
      'than',
      'shutdown',
      'the',
      'channel'],
     ['if',
      'interest',
      'rate',
      'will',
      'hike',
      'by',
      'fed',
      'then',
      'what',
      'is',
      'the',
      'effect',
      'on',
      'nifty']]

我目前正在应用fastText模型，如下所示：

model = fastText(data, size=100, window=5, min_count=5, workers=16, sg=0, negative=5)

然后我表演：

model = FastText(sentences, min_count=1)

documents = []

for document in textList:
    word_vectors = []
    for word in document: 
        word_vectors.append(model.wv[word])
    documents.append(np.concatenate(word_vectors)

document_matrix = np.concatenate(documents)

显然，文档_矩阵不适合作为我的Keras模型的输入：

from keras.models import Sequential
from keras import layers
from keras.layers import Dense, Activation

model = Sequential()
model.add(layers.Conv1D(filters=250, kernel_size = 4, padding = 'same', input_shape=( 1,))) 
model.add(layers.GlobalMaxPooling1D()) 
model.add(layers.Dense(250, activation='relu')) 
model.add(layers.Dense(3, activation='sigmoid'))

如何使嵌入的输出与Keras的输入相匹配，我已经没有主意了

非常感谢你们，你们是最棒的

Lisa

您可以使用

模型[YOURKEYWORD]

从word2vec模型中获取每个单词的表示形式。您的word2vec模型中可能不存在某些单词嵌入，因此您可以在代码中使用

try catch

。

您好！谢谢！我切换到FastText以避免OOV问题，并执行了以下操作：documents=[]用于文本列表中的文档：word_vectors=[]用于文档中的word:#或用于分离标记的逻辑word_vectors.append（model.wv[word]）documents.append（np.concatate（word_vectors））document_matrix=np.concatenate（documents）Now，矩阵有一个形状（22938600，）不符合Sequential的输入形状，你知道我能做什么吗？谢谢，请根据此评论编辑您的问题。从您的评论中很难理解代码。