Text 基于Keras/Theano和LSTM的多标签文本分类_Text_Theano_Keras_Lstm_Multilabel Classification

Text 基于Keras/Theano和LSTM的多标签文本分类

text keras

Text 基于Keras/Theano和LSTM的多标签文本分类,text,theano,keras,lstm,multilabel-classification,Text,Theano,Keras,Lstm,Multilabel Classification,尝试使用Keras/Theano运行LSTM多标签文本分类我有一个文本/标签csv。文本是纯文本，标签是数字，总共九个，从1到9 我认为我没有为这个问题正确配置模型。到目前为止，我的代码是： import keras.preprocessing.text import numpy as np Using Theano backend. from keras.preprocessing import sequence from keras.models import Sequenti

尝试使用Keras/Theano运行LSTM多标签文本分类

我有一个文本/标签csv。文本是纯文本，标签是数字，总共九个，从1到9

我认为我没有为这个问题正确配置模型。到目前为止，我的代码是：

import keras.preprocessing.text
import numpy as np
     Using Theano backend.

from keras.preprocessing import sequence
from keras.models import Sequential
from keras.layers.core import Dense, Activation
from keras.layers.embeddings import Embedding
from keras.layers.recurrent import LSTM

import pandas
data = pandas.read_csv("for_keras_text_label.csv", sep = ',', quotechar = '"', header = 0)

x = data['text']
y = data['label']

x = x.iloc[:].values
y = y.iloc[:].values
tk = keras.preprocessing.text.Tokenizer(nb_words=2000, filters=keras.preprocessing.text.base_filter(), lower=True, split=" ")
tk.fit_on_texts(x)

x = tk.texts_to_sequences(x)
max_len = 80
print "max_len ", max_len
print('Pad sequences (samples x time)')

x = sequence.pad_sequences(x, maxlen=max_len)

# the model
max_features = 20000
model = Sequential()
model.add(Embedding(max_features, 128, input_length=max_len, dropout=0.2))
model.add(LSTM(128, dropout_W=0.2, dropout_U=0.2))
model.add(Dense(9))
model.add(Activation('softmax'))
model.compile(loss='sparse_categorical_crossentropy', optimizer='rmsprop',    metrics=["accuracy"])

# run
model.fit(x, y=y, batch_size=200, nb_epoch=1, verbose=1, validation_split=0.2, shuffle=True)

我得到这个错误：

 IndexError: index 9 is out of bounds for axis 1 with size 9 Apply node that caused the error: 
AdvancedIncSubtensor{inplace=False,  set_instead_of_inc=True}(Alloc.0, TensorConstant{1}, ARange{dtype='int64'}.0, Elemwise{Cast{int32}}.0)
Toposort index: 213
Inputs types: [TensorType(float32, matrix), TensorType(int8, scalar), TensorType(int64, vector), TensorType(int32, vector)]
Inputs shapes: [(200, 9), (), (200,), (200,)]
Inputs strides: [(36, 4), (), (8,), (4,)]
Inputs values: ['not shown', array(1, dtype=int8), 'not shown', 'not shown']
Outputs clients: [[Reshape{2}(AdvancedIncSubtensor{inplace=False,  set_instead_of_inc=True}.0, MakeVector{dtype='int64'}.0)]]

Backtrace when the node is created(use Theano flag traceback.limit=N to make it longer):
  File "/home/ubuntu/anaconda3/envs/theano/lib/python2.7/site-packages/IPython/core/interactiveshell.py", line 2827, in run_ast_nodes
    if self.run_code(code, result):
  File "/home/ubuntu/anaconda3/envs/theano/lib/python2.7/site-packages/IPython/core/interactiveshell.py", line 2881, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-14-5264b8e23f0a>", line 7, in <module>
    model.compile(loss='sparse_categorical_crossentropy', optimizer='rmsprop', metrics=["accuracy"])
  File "/home/ubuntu/anaconda3/envs/theano/lib/python2.7/site-packages/keras/models.py", line 578, in compile
    **kwargs)
  File "/home/ubuntu/anaconda3/envs/theano/lib/python2.7/site-packages/keras/engine/training.py", line 604, in compile
    sample_weight, mask)
  File "/home/ubuntu/anaconda3/envs/theano/lib/python2.7/site-packages/keras/engine/training.py", line 303, in weighted
    score_array = fn(y_true, y_pred)
  File "/home/ubuntu/anaconda3/envs/theano/lib/python2.7/site-packages/keras/objectives.py", line 45, in sparse_categorical_crossentropy
    return K.sparse_categorical_crossentropy(y_pred, y_true)
  File "/home/ubuntu/anaconda3/envs/theano/lib/python2.7/site-packages/keras/backend/theano_backend.py", line 1079, in sparse_categorical_crossentropy
    target = T.extra_ops.to_one_hot(target, nb_class=output.shape[-1])

indexer错误：索引9超出了导致错误的大小为9的轴1应用节点的范围：
AdvancedIncSubSensor{inplace=False，设置_而不是_inc=True}（Alloc.0，TensorConstant{1}，ARange{dtype='int64'}.0，Elemwise{Cast{int32}.0）
拓扑排序索引：213
输入类型：[TensorType（float32，矩阵）、TensorType（int8，标量）、TensorType（int64，向量）、TensorType（int32，向量）]
输入形状：[（200,9），（），（200，），（200，）]
输入跨步：[（36,4），（），（8，），（4，）]
输入值：[“未显示”，数组（1，dtype=int8），“未显示”，“未显示”]
输出客户端：[[Reforme{2}（AdvancedIncSubensor{inplace=False，设置_而不是_inc=True}.0，MakeVector{dtype='int64'}.0）]]
创建节点时进行回溯（使用No标志traceback.limit=N使其更长）：
文件“/home/ubuntu/anaconda3/envs/theano/lib/python2.7/site packages/IPython/core/interactiveshell.py”，第2827行，在运行节点中
如果自我运行代码（代码、结果）：
文件“/home/ubuntu/anaconda3/envs/theano/lib/python2.7/site packages/IPython/core/interactiveshell.py”，第2881行，运行代码
exec（代码对象、self.user\u全局、self.user\n）
文件“”，第7行，在
model.compile（loss='sparse\u categorical\u crossentropy'，optimizer='rmsprop'，metrics=[“accurity”]）
文件“/home/ubuntu/anaconda3/envs/theano/lib/python2.7/site packages/keras/models.py”，第578行，编译
**kwargs）
文件“/home/ubuntu/anaconda3/envs/theano/lib/python2.7/site packages/keras/engine/training.py”，第604行，编译
样品（重量、面罩）
文件“/home/ubuntu/anaconda3/envs/theano/lib/python2.7/site-packages/keras/engine/training.py”，第303行，带权重
分数数组=fn（y_真，y_pred）
文件“/home/ubuntu/anaconda3/envs/theano/lib/python2.7/site packages/keras/objectives.py”，第45行，稀疏分类
返回K.sparse\u categorical\u crossentropy（y\u pred，y\u true）
文件“/home/ubuntu/anaconda3/envs/theano/lib/python2.7/site packages/keras/backend/theano\u backend.py”，第1079行，稀疏分类
target=T.extra\u ops.to\u one\u hot（target，nb\u class=output.shape[-1]）

您是否尝试过从标签中减去1？Theano是基于0的，因此9元素数组中没有第9个索引（最大索引为8）