Keras 带GRU模块且以10为基数的int()的文本无效

Keras 带GRU模块且以10为基数的int()的文本无效,keras,deep-learning,text-classification,valueerror,Keras,Deep Learning,Text Classification,Valueerror,我的输入只是一个csv文件,其中有50K行和两列用于阿拉伯语情绪分析:但我在尝试在堆叠的GRU模型上训练数据时不断出错 继续得到下面的错误 ValueError:以10为基数的int()的文本无效:“الهمانياسن” 㶕㶕㶕㶕㶕㶕㶕㶕㶕 X\u序列,X\u测试,y\u序列,y\u测试=序列测试分割(df.text,df.touction,测试大小=0.1,随机状态=37) 断言X_train.shape[0]==y_train.shape[0] 断言X_测试.shape[0]==y_测试.

我的输入只是一个csv文件,其中有50K行和两列用于阿拉伯语情绪分析:但我在尝试在堆叠的GRU模型上训练数据时不断出错

继续得到下面的错误

ValueError:以10为基数的int()的文本无效:“الهمانياسن” 㶕㶕㶕㶕㶕㶕㶕㶕㶕

X\u序列,X\u测试,y\u序列,y\u测试=序列测试分割(df.text,df.touction,测试大小=0.1,随机状态=37)
断言X_train.shape[0]==y_train.shape[0]
断言X_测试.shape[0]==y_测试.shape[0]
tk=标记器(num\u words=NB\u words,
过滤器='!“\$%&()*+,-./:;?@[\\]^ `{124;}~\ t\n',
较低=正确,
split=“”)
tk.fit_on_文本(X_列车)
定义一个热的顺序(顺序,nb_特征=nb_单词):
ohs=np.零((透镜,nb_特征))
对于枚举中的i,s(seqs):
ohs[i,s]=1。
返回ohs
X_列oh=一个热列(X_列)
X_测试oh=一个热测试顺序(X_测试顺序)
X_序列=传统文本到序列(X_序列)
X_测试顺序=tk.文本到序列(X_测试)
断言X_valid.shape[0]==y_valid.shape[0]
断言X\u train\u rest.shape[0]==y\u train\u rest.shape[0]
最大字数=500
top_单词=5000
X_序列=序列。pad_序列(X_序列,maxlen=max_字)
X_测试=序列。pad_序列(X_测试,maxlen=max_字)
模型=顺序()
添加(嵌入(顶部单词,100,输入长度=最大单词))
新增型号(GRU(100))
model.add(密集型(1,激活='sigmoid'))
compile(loss='binary\u crossentropy',optimizer='adam',metrics=['accurity'])
打印(model.summary())
#训练
模型.fit(X\u-train\u-oh,y\u-train\u-oh,历代=3,批量大小=64)
#模型的最终评估
分数=模型。评估(X_测试,y_测试,详细=0)
打印(“精度:%.2f%%”%(分数[1]*100))
#预测测试数据的标签
y_predict=模型预测(X_检验)
X_train, X_test, y_train, y_test = train_test_split(df.text, df.sentiment, test_size=0.1, random_state=37)
assert X_train.shape[0] == y_train.shape[0]
assert X_test.shape[0] == y_test.shape[0]
tk = Tokenizer(num_words=NB_WORDS,
               filters='!"#$%&()*+,-./:;<=>?@[\\]^_`{|}~\t\n',
               lower=True,
               split=" ")
tk.fit_on_texts(X_train)
def one_hot_seq(seqs, nb_features = NB_WORDS):
    ohs = np.zeros((len(seqs), nb_features))
    for i, s in enumerate(seqs):
        ohs[i, s] = 1.
    return ohs

X_train_oh = one_hot_seq(X_train_seq)
X_test_oh = one_hot_seq(X_test_seq)


X_train_seq = tk.texts_to_sequences(X_train)
X_test_seq = tk.texts_to_sequences(X_test)
assert X_valid.shape[0] == y_valid.shape[0]
assert X_train_rest.shape[0] == y_train_rest.shape[0]

max_words = 500
top_words = 5000
X_train  = sequence.pad_sequences(X_train , maxlen=max_words)
X_test = sequence.pad_sequences(X_test, maxlen=max_words)

model = Sequential()
model.add(Embedding(top_words, 100, input_length=max_words))
model.add(GRU(100))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
print(model.summary())

# Train
model.fit(X_train_oh, y_train_oh, epochs=3, batch_size=64)

# Final evaluation of the model
scores = model.evaluate(X_test_oh, y_test_oh, verbose=0)
print("Accuracy: %.2f%%" % (scores[1]*100))

# Predict the label for test data
y_predict = model.predict(X_test)