Keras 使用word2vec对文本数据进行分类时出错

Keras 使用word2vec对文本数据进行分类时出错,keras,word2vec,text-classification,word-embedding,Keras,Word2vec,Text Classification,Word Embedding,我想使用我自己的word数据集来创建嵌入。并使用我自己的标签数据来培训和测试我的模型。为此,我已经使用word2vec创建了自己的word嵌入。以及在使用标签数据训练模型时遇到的问题 我在尝试训练模型时出错。我的模型创建代码: # create the tokenizer tokenizer = Tokenizer() tokenizer.fit_on_texts(X_train) encoded_docs = tokenizer.texts_to_sequences(X_train) max

我想使用我自己的word数据集来创建嵌入。并使用我自己的标签数据来培训和测试我的模型。为此,我已经使用word2vec创建了自己的word嵌入。以及在使用标签数据训练模型时遇到的问题

我在尝试训练模型时出错。我的模型创建代码:

# create the tokenizer
tokenizer = Tokenizer()
tokenizer.fit_on_texts(X_train)
encoded_docs = tokenizer.texts_to_sequences(X_train)

max_length = max([len(s.split()) for s in X_train])
X_train = pad_sequences(encoded_docs, maxlen=max_length, padding='post')

tokenizer = Tokenizer()
tokenizer.fit_on_texts(X_test)
encoded_docs = tokenizer.texts_to_sequences(X_test)

X_test = pad_sequences(encoded_docs, maxlen=max_length, padding='post')


# setup the embedding layer
embeddings = Embedding(input_dim=embedding_matrix.shape[0], output_dim=embedding_matrix.shape[1],
                  weights=[embedding_matrix],input_length= max_length, trainable=False)

new_model = Sequential() new_model.add(embeddings)
new_model.add(Conv1D(filters=128, kernel_size=5, activation='relu'))
new_model.add(MaxPooling1D(pool_size=2)) new_model.add(Flatten())
new_model.add(Dense(1, activation='sigmoid'))
这就是我创建嵌入矩阵的方法-

embedding_matrix = np.zeros((len(model.wv.vocab), vector_dim))
    for i in range(len(model.wv.vocab)):
        embedding_vector = model.wv[model.wv.index2word[i]]
        if embedding_vector is not None:
            embedding_matrix[i] = embedding_vector
通过这样做,我得到了以下错误-

 WARNING:tensorflow:From /Users/faysal/anaconda2/envs/python3/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py:1290: calling reduce_mean (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a future version.
Instructions for updating:
keep_dims is deprecated, use keepdims instead
Epoch 1/10
Traceback (most recent call last):
  File "/Users/faysal/anaconda2/envs/python3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1322, in _do_call
    return fn(*args)
  File "/Users/faysal/anaconda2/envs/python3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1307, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/Users/faysal/anaconda2/envs/python3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1409, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[27,2] = 1049 is not in [0, 1045)
     [[Node: embedding_1/GatherV2 = GatherV2[Taxis=DT_INT32, Tindices=DT_INT32, Tparams=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](embedding_1/embeddings/read, embedding_1/Cast, embedding_1/GatherV2/axis)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/faysal/Desktop/My Computer/D/Code Workspace/Research-IoT/embedding-tut/src/main.py", line 359, in <module>
    custom_keras_model(embedding_matrix, model.wv)
  File "/Users/faysal/Desktop/My Computer/D/Code Workspace/Research-IoT/Collaboration/embedding-tut/src/main.py", line 295, in custom_keras_model
    new_model.fit(X_train, y_train, epochs=10, verbose=2)
  File "/Users/faysal/anaconda2/envs/python3/lib/python3.5/site-packages/keras/models.py", line 867, in fit
    initial_epoch=initial_epoch)
  File "/Users/faysal/anaconda2/envs/python3/lib/python3.5/site-packages/keras/engine/training.py", line 1598, in fit
    validation_steps=validation_steps)
  File "/Users/faysal/anaconda2/envs/python3/lib/python3.5/site-packages/keras/engine/training.py", line 1183, in _fit_loop
    outs = f(ins_batch)
  File "/Users/faysal/anaconda2/envs/python3/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py", line 2273, in __call__
    **self.session_kwargs)
  File "/Users/faysal/anaconda2/envs/python3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 900, in run
    run_metadata_ptr)
  File "/Users/faysal/anaconda2/envs/python3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1135, in _run
    feed_dict_tensor, options, run_metadata)
  File "/Users/faysal/anaconda2/envs/python3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1316, in _do_run
    run_metadata)
  File "/Users/faysal/anaconda2/envs/python3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1335, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[27,2] = 1049 is not in [0, 1045)
     [[Node: embedding_1/GatherV2 = GatherV2[Taxis=DT_INT32, Tindices=DT_INT32, Tparams=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](embedding_1/embeddings/read, embedding_1/Cast, embedding_1/GatherV2/axis)]]

Caused by op 'embedding_1/GatherV2', defined at:
  File "/Users/faysal/Desktop/My Computer/D/Code Workspace/Research-IoT/Collaboration/embedding-tut/src/main.py", line 359, in <module>
    custom_keras_model(embedding_matrix, model.wv)
  File "/Users/faysal/Desktop/My Computer/D/Code Workspace/Research-IoT/Collaboration/embedding-tut/src/main.py", line 278, in custom_keras_model
    new_model.add(embeddings)
  File "/Users/faysal/anaconda2/envs/python3/lib/python3.5/site-packages/keras/models.py", line 442, in add
    layer(x)
  File "/Users/faysal/anaconda2/envs/python3/lib/python3.5/site-packages/keras/engine/topology.py", line 602, in __call__
    output = self.call(inputs, **kwargs)
  File "/Users/faysal/anaconda2/envs/python3/lib/python3.5/site-packages/keras/layers/embeddings.py", line 134, in call
    out = K.gather(self.embeddings, inputs)
  File "/Users/faysal/anaconda2/envs/python3/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py", line 1134, in gather
    return tf.gather(reference, indices)
  File "/Users/faysal/anaconda2/envs/python3/lib/python3.5/site-packages/tensorflow/python/ops/array_ops.py", line 2736, in gather
    return gen_array_ops.gather_v2(params, indices, axis, name=name)
  File "/Users/faysal/anaconda2/envs/python3/lib/python3.5/site-packages/tensorflow/python/ops/gen_array_ops.py", line 3065, in gather_v2
    "GatherV2", params=params, indices=indices, axis=axis, name=name)
  File "/Users/faysal/anaconda2/envs/python3/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/Users/faysal/anaconda2/envs/python3/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 3392, in create_op
    op_def=op_def)
  File "/Users/faysal/anaconda2/envs/python3/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1718, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

InvalidArgumentError (see above for traceback): indices[27,2] = 1049 is not in [0, 1045)
     [[Node: embedding_1/GatherV2 = GatherV2[Taxis=DT_INT32, Tindices=DT_INT32, Tparams=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](embedding_1/embeddings/read, embedding_1/Cast, embedding_1/GatherV2/axis)]]


Process finished with exit code 1
警告:tensorflow:From/Users/faysal/anaconda2/envs/python3/lib/python3.5/site packages/keras/backend/tensorflow\u backend.py:1290:使用keep\u dims调用reduce\u mean(来自tensorflow.python.ops.math\u ops)已被弃用,并将在未来版本中删除。
更新说明:
keep_dims已弃用,请改用keepdims
纪元1/10
回溯(最近一次呼叫最后一次):
文件“/Users/faysal/anaconda2/envs/python3/lib/python3.5/site packages/tensorflow/python/client/session.py”,第1322行,在
返回fn(*args)
文件“/Users/faysal/anaconda2/envs/python3/lib/python3.5/site packages/tensorflow/python/client/session.py”,第1307行,在
选项、提要、获取列表、目标列表、运行元数据)
文件“/Users/faysal/anaconda2/envs/python3/lib/python3.5/site-packages/tensorflow/python/client/session.py”,第1409行,位于调用会话运行中
运行(元数据)
tensorflow.python.framework.errors\u impl.InvalidArgumentError:索引[27,2]=1049不在[01045]中
[[Node:embedding_1/GatherV2=GatherV2[Taxis=DT_INT32,Tindices=DT_INT32,Tparams=DT_FLOAT,_device=“/job:localhost/replica:0/task:0/device:CPU:0”](embedding_1/embeddings/read,embedding_1/Cast,embedding_1/GatherV2/axis)]]
在处理上述异常期间,发生了另一个异常:
回溯(最近一次呼叫最后一次):
文件“/Users/faysal/Desktop/My Computer/D/Code Workspace/Research IoT/embedded tut/src/main.py”,第359行,在
自定义_keras_模型(嵌入_矩阵,model.wv)
文件“/Users/faysal/Desktop/My Computer/D/Code Workspace/Research IoT/Collaboration/embedded tut/src/main.py”,第295行,在定制的keras模型中
新的拟合模型(X列,y列,历元=10,详细=2)
文件“/Users/faysal/anaconda2/envs/python3/lib/python3.5/site packages/keras/models.py”,第867行
初始_历元=初始_历元)
文件“/Users/faysal/anaconda2/envs/python3/lib/python3.5/site packages/keras/engine/training.py”,第1598行
验证步骤=验证步骤)
文件“/Users/faysal/anaconda2/envs/python3/lib/python3.5/site packages/keras/engine/training.py”,第1183行,循环中
outs=f(ins\U批量)
文件“/Users/faysal/anaconda2/envs/python3/lib/python3.5/site packages/keras/backend/tensorflow\u backend.py”,第2273行,在调用中__
**自我介绍(kwargs)
文件“/Users/faysal/anaconda2/envs/python3/lib/python3.5/site packages/tensorflow/python/client/session.py”,第900行,正在运行
运行_元数据_ptr)
文件“/Users/faysal/anaconda2/envs/python3/lib/python3.5/site packages/tensorflow/python/client/session.py”,第1135行,正在运行
feed_dict_tensor、options、run_元数据)
文件“/Users/faysal/anaconda2/envs/python3/lib/python3.5/site packages/tensorflow/python/client/session.py”,第1316行,运行
运行(元数据)
文件“/Users/faysal/anaconda2/envs/python3/lib/python3.5/site packages/tensorflow/python/client/session.py”,第1335行,在
提升类型(e)(节点定义、操作、消息)
tensorflow.python.framework.errors\u impl.InvalidArgumentError:索引[27,2]=1049不在[01045]中
[[Node:embedding_1/GatherV2=GatherV2[Taxis=DT_INT32,Tindices=DT_INT32,Tparams=DT_FLOAT,_device=“/job:localhost/replica:0/task:0/device:CPU:0”](embedding_1/embeddings/read,embedding_1/Cast,embedding_1/GatherV2/axis)]]
由op“嵌入_1/GatherV2”引起,定义为:
文件“/Users/faysal/Desktop/My Computer/D/Code Workspace/Research IoT/Collaboration/embedded tut/src/main.py”,第359行,在
自定义_keras_模型(嵌入_矩阵,model.wv)
文件“/Users/faysal/Desktop/My Computer/D/Code Workspace/Research IoT/Collaboration/embedded tut/src/main.py”,第278行,在定制的keras模型中
新的_模型。添加(嵌入)
文件“/Users/faysal/anaconda2/envs/python3/lib/python3.5/site packages/keras/models.py”,第442行,添加
层(x)
文件“/Users/faysal/anaconda2/envs/python3/lib/python3.5/site packages/keras/engine/topology.py”,第602行,在调用中__
输出=自调用(输入,**kwargs)
文件“/Users/faysal/anaconda2/envs/python3/lib/python3.5/site packages/keras/layers/embeddings.py”,第134行,在调用中
out=K.聚集(自嵌入、输入)
文件“/Users/faysal/anaconda2/envs/python3/lib/python3.5/site packages/keras/backend/tensorflow_backend.py”,第1134行,位于聚集区
返回tf.gather(参考、索引)
文件“/Users/faysal/anaconda2/envs/python3/lib/python3.5/site packages/tensorflow/python/ops/array_ops.py”,第2736行,位于聚集区
返回gen_数组操作聚集v2(参数、索引、轴、名称=名称)
文件“/Users/faysal/anaconda2/envs/python3/lib/python3.5/site packages/tensorflow/python/ops/gen_array_ops.py”,第3065行,位于gather_v2中
“GatherV2”,参数=参数,索引=索引,轴=轴,名称=名称)
文件“/Users/faysal/anaconda2/envs/python3/lib/python3.5/site packages/tensorflow/python/framework/op_def_library.py”,第787行,在“应用”和“操作”助手中
op_def=op_def)
文件“/Users/faysal/anaconda2/envs/python3/lib/python3.5/site packages/tensorflow/python/framework/ops.py”,第3392行,在create_op
op_def=op_def)
文件“/Users/faysal/anaconda2/envs/python3/lib/python3.5/site packages/tensorflow/python/framework/ops.py”,第1718行,在__
self._traceback=self._graph._extract_stack()35; pylint:disable=protected access
InvalidArgumentError(回溯见上文):索引[27,2]=1049不在[0,1045]中
[[Node:Embedded_1/GatherV2=GatherV2[Taxis=DT_INT32,Tindices=DT_INT32,Tparams=DT_FLOAT,_device=“/job:localhost/replica:0/task