Warning: file_get_contents(/data/phpspider/zhask/data//catemap/0/react-native/7.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Numpy 深度学习模型在第一个历元后提示错误_Numpy_Keras_Deep Learning - Fatal编程技术网

Numpy 深度学习模型在第一个历元后提示错误

Numpy 深度学习模型在第一个历元后提示错误,numpy,keras,deep-learning,Numpy,Keras,Deep Learning,我正在尝试训练一个二进制分类模型。这是对推特的情绪分析,但该模型在第1纪元后提示错误。必须是输入的大小,但无法准确找出导致问题的输入。非常感谢您的帮助 非常感谢 我已经尝试过许多不同尺寸的实例,但问题仍然存在 import pandas as pd import os import numpy as np from sklearn.model_selection import train_test_split from keras.preprocessing.sequence import pa

我正在尝试训练一个二进制分类模型。这是对推特的情绪分析,但该模型在第1纪元后提示错误。必须是输入的大小,但无法准确找出导致问题的输入。非常感谢您的帮助

非常感谢

我已经尝试过许多不同尺寸的实例,但问题仍然存在

import pandas as pd
import os
import numpy as np
from sklearn.model_selection import train_test_split
from keras.preprocessing.sequence import pad_sequences
from keras.preprocessing.text import Tokenizer
from keras.models import Sequential
from keras.layers import Embedding, Flatten, Dense


df = pd.read_csv('twitter-sentiment-analysis2/train.csv',encoding='latin-1')
df.drop(['ItemID'], axis=1, inplace=True)
label=list(df.Sentiment)
text=list(df.SentimentText)
tokenizer = Tokenizer(filters='!"#$%&()*+,-./:;<=>?@[\\]^_`{|}~\t\n',lower=True,split=' ')
tokenizer.fit_on_texts(text)
vocab = tokenizer.word_index
X_train, X_test, y_train, y_test = train_test_split(text, label, test_size=0.1,random_state=42)

X_train_word_ids = tokenizer.texts_to_sequences(X_train)
X_test_word_ids = tokenizer.texts_to_sequences(X_test)
x_train = pad_sequences(X_train_word_ids, maxlen=50)
x_test= pad_sequences(X_test_word_ids, maxlen=50)

glove_dir = 'glove6b100dtxt/'
embeddings_index = {}
f = open(os.path.join(glove_dir, 'glove.6B.100d.txt'))
for line in f:
    values = line.split()
    word = values[0]
    coefs = np.asarray(values[1:], dtype='float32')
    embeddings_index[word] = coefs
f.close()

print('Found %s word vectors.' % len(embeddings_index))


embedding_dim = 100 #data comes from my GloVe
max_words=50
maxlen=50
embedding_matrix = np.zeros((max_words, embedding_dim))
for word, i in vocab.items():
    embedding_vector = embeddings_index.get(word)
    if i < max_words:
        if embedding_vector is not None:
            # Words not found in embedding index will be all-zeros.
            embedding_matrix[i] = embedding_vector

model = Sequential()
model.add(Embedding(max_words, embedding_dim, input_length=maxlen))
model.add(Flatten())
model.add(Dense(32, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.summary()
model.layers[0].set_weights([embedding_matrix])
model.layers[0].trainable = False
model.compile(optimizer='rmsprop',loss='binary_crossentropy',metrics=['acc'])
history = model.fit(x_train, y_train,epochs=10,batch_size=32,validation_split=0.1,shuffle=True)
model.save_weights('pre_trained_glove_model.h5')
将熊猫作为pd导入
导入操作系统
将numpy作为np导入
从sklearn.model\u选择导入列车\u测试\u拆分
从keras.preprocessing.sequence导入pad_序列
来自keras.preprocessing.text导入标记器
从keras.models导入顺序
从keras.layers导入嵌入、展平、密集
df=pd.read_csv('twitter-touction-analysis2/train.csv',encoding='latin-1')
drop(['ItemID',axis=1,inplace=True)
标签=列表(df.情绪)
text=列表(df.text)
标记器=标记器(过滤器='!“#$%&()*+,-./:;?@[\]^ `{124;}~\ t\n',下限=真,拆分=“”)
标记器.fit_on_文本(文本)
vocab=tokenizer.word\u索引
X_序列,X_测试,y_序列,y_测试=序列测试分割(文本,标签,测试大小=0.1,随机状态=42)
X\u序列\u单词\u ID=标记器。文本\u到\u序列(X\u序列)
X_测试\u单词\u ID=标记器。文本\u到\u序列(X_测试)
x_列=pad_序列(x_列\u字\u ID,最大值=50)
x_测试=焊盘序列(x_测试\u字\u ID,maxlen=50)
手套_dir='glove6b100dtxt/'
嵌入_索引={}
f=open(os.path.join(glood_dir,'glood.6B.100d.txt'))
对于f中的行:
values=line.split()
字=值[0]
coefs=np.asarray(值[1:],dtype='float32')
嵌入索引[word]=coefs
f、 关闭()
打印('找到%s个字向量。'%len(嵌入索引))
嵌入_dim=100#数据来自我的手套
最大字数=50
最大值=50
嵌入矩阵=np.0((最大字数,嵌入维数))
对于word,我在vocab.items()中:
嵌入向量=嵌入索引.get(word)
如果我
谁能给我一些建议去哪里找?再次谢谢

这是错误:

File "HM3.py", line 58, in <module>
    history = model.fit(x_train, y_train,epochs=10,batch_size=32,validation_split=0.1,shuffle=True)
  File "/usr/local/lib/python3.6/dist-packages/keras/engine/training.py", line 1039, in fit
    validation_steps=validation_steps)
  File "/usr/local/lib/python3.6/dist-packages/keras/engine/training_arrays.py", line 199, in fit_loop
    outs = f(ins_batch)
  File "/usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py", line 2715, in __call__
    return self._call(inputs)
  File "/usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py", line 2675, in _call
    fetched = self._callable_fn(*array_vals)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1439, in __call__
    run_metadata_ptr)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/errors_impl.py", line 528, in __exit__
    c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[26,39] = 31202 is not in [0, 50)
     [[{{node embedding_1/embedding_lookup}} = GatherV2[Taxis=DT_INT32, Tindices=DT_INT32, Tparams=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](embedding_1/embeddings/read, embedding_1/Cast, embedding_1/embedding_lookup/axis)]]
文件“HM3.py”,第58行,在
历史=模型.fit(x\u序列,y\u序列,历代数=10,批量大小=32,验证\u分割=0.1,随机数=True)
文件“/usr/local/lib/python3.6/dist-packages/keras/engine/training.py”,第1039行,适合
验证步骤=验证步骤)
文件“/usr/local/lib/python3.6/dist packages/keras/engine/training\u arrays.py”,第199行,在fit\u循环中
outs=f(ins\U批量)
文件“/usr/local/lib/python3.6/dist packages/keras/backend/tensorflow_backend.py”,第2715行,在调用中__
返回自调用(输入)
文件“/usr/local/lib/python3.6/dist packages/keras/backend/tensorflow_backend.py”,第2675行,在调用中
fetched=self.\u可调用\u fn(*array\u vals)
文件“/usr/local/lib/python3.6/dist packages/tensorflow/python/client/session.py”,第1439行,在调用__
运行_元数据_ptr)
文件“/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/errors\u impl.py”,第528行,在退出中__
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors\u impl.InvalidArgumentError:索引[26,39]=31202不在[0,50]中
[{node embedding_1/embedding_lookup}}=GatherV2[Taxis=DT_INT32,Tindices=DT_INT32,Tparams=DT_FLOAT,_device=“/job:localhost/replica:0/task:0/device:CPU:0”](embedding_1/embeddings/read,embedding_1/Cast,embedding_1/embedding_lookup/axis)]]
您创建的嵌入只能嵌入50个不同的单词,但在训练数据中,您对所有出现的单词进行了索引。错误告诉您,在大小为[0,50]的嵌入中找不到索引为31202的单词


一种解决方案是扩大嵌入输入以覆盖训练集中出现的所有单词。另一种方法是使用零索引和零嵌入,并将索引>=50的所有训练单词重新映射到该零。

请包括错误。完成,包括:)将尝试。非常感谢!
max_words=50
...
model.add(Embedding(max_words, embedding_dim, input_length=maxlen))