Python TensorFlow数据格式要求
我有点困惑如何将数据读入tensorflow。我试图创建一个LSTM,并使用Python TensorFlow数据格式要求,python,python-3.x,tensorflow,nlp,Python,Python 3.x,Tensorflow,Nlp,我有点困惑如何将数据读入tensorflow。我试图创建一个LSTM,并使用tf.nn.embedded\u lookup来查找向量表示,但我似乎无法运行它 我的数据当前如下所示: Out[494]: sentiment glove_indexes 0 0 [574305, 1294, 939107, 657375, 571132, 1013429... 1 0
tf.nn.embedded\u lookup
来查找向量表示,但我似乎无法运行它
我的数据当前如下所示:
Out[494]:
sentiment glove_indexes
0 0 [574305, 1294, 939107, 657375, 571132, 1013429...
1 0 [500519]
2 4 [560941, 93286]
3 0 [972036, 569274, 478483, 1051901, 684125, 6482...
4 0 [156951, 572457, 465860, 132739, 284963, 11483...
我还有一个字典,glood\u ids
,我可以用这些索引调用它来获取这些单词的向量表示
我想我可以打个电话
embed = tf.nn.embedding_lookup(glove_ids, inputs_data)
获取向量表示,但这不起作用。有人能帮我正确设置吗
编辑
我尝试了一个也不起作用的解决方法。我只是希望得到一些关于如何解决这个问题的一般性指导
我现在有一个长度为18的向量,我是说单词的最大长度是,epoch\ux\u序列中的每个条目是25,这是嵌入的长度。我相信这是正确的,每个单词都有正确的嵌入。getTrainBatch随机拉入新数据,以适应模型。我犯了一个错误
ValueError: setting an array element with a sequence.
def getTrainBatch():
labels = []
arr = np.zeros([batch_size , maxSeqLength])
for i in range(batch_size ):
num = randint(0,len(train_dat))
labels.append(y_train[num])
arr[i] = x_train[num]
return arr, labels
def my_lookup(dat):
new = []
for i in range(len(dat)):
temp = []
for j in range(len(dat[i])):
if dat[i][j] == 0:
temp.append(list(np.zeros(maxSeqLength)))
else:
temp.append(glove_ids[dat[i][j]])
new.append(temp)
return new
maxSeqLength = 18
x_train = train_dat['glove_indexes']
x_train = np.array(x_train)
x_train = sequence.pad_sequences(x_train, maxlen=maxSeqLength)
y_train = train_dat['sentiment']
y_train = np.where(y_train == 4, 1, 0)
y_train = np.array(y_train)
lstm_size = 256
batch_size = 500
learning_rate = 0.001
embed_size = GloVeEncodingsSize
n_outputs = 2
X = tf.placeholder(tf.float32, [None, embed_size, maxSeqLength])
Y = tf.placeholder(tf.int32, [None])
basic_cell = tf.contrib.rnn.BasicRNNCell(num_units = lstm_size)
outputs, states = tf.nn.dynamic_rnn(basic_cell, X, dtype = tf.float32)
logits = tf.layers.dense(states, n_outputs)
xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=Y,logits=logits)
loss = tf.reduce_mean(xentropy)
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
training_op = optimizer.minimize(loss)
correct = tf.nn.in_top_k(logits, Y, 1)
accuracy = tf.reduce_mean(tf.cast(correct, tf.float32))
init = tf.global_variables_initializer()
n_epochs = 100
with tf.Session() as sess:
init.run()
for epoch in range(n_epochs):
epoch_x_train, epoch_y_train = getTrainBatch()
epoch_x_train = my_lookup(epoch_x_train)
sess.run(training_op, feed_dict={X: epoch_x_train, Y: epoch_y_train})
acc_train = accuracy.eval(feed_dict={X: epoch_x_train, Y: epoch_y_train})
print(epoch, "Train accuracy:", acc_train)
再次编辑
通过更多的谷歌搜索,看起来这个错误来自于feed_dict。不过我不明白为什么这是错误的。我尝试了以[1,0]格式的y作为响应,或者每x_列车行仅使用1或0
完整错误消息
Traceback (most recent call last):
File "C:\ProgramData\Anaconda3\envs\py35\lib\site-packages\IPython\core\interactiveshell.py", line 2881, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-77-7960c1e2188b>", line 12, in <module>
sess.run(training_op, feed_dict={X: np.array(epoch_x_train), Y: np.array(epoch_y_train)})
File "C:\ProgramData\Anaconda3\envs\py35\lib\site-packages\tensorflow\python\client\session.py", line 889, in run
run_metadata_ptr)
File "C:\ProgramData\Anaconda3\envs\py35\lib\site-packages\tensorflow\python\client\session.py", line 1089, in _run
np_val = np.asarray(subfeed_val, dtype=subfeed_dtype)
File "C:\ProgramData\Anaconda3\envs\py35\lib\site-packages\numpy\core\numeric.py", line 531, in asarray
return array(a, dtype, copy=False, order=order)
ValueError: setting an array element with a sequence.
回溯(最近一次呼叫最后一次):
文件“C:\ProgramData\Anaconda3\envs\py35\lib\site packages\IPython\core\interactiveshell.py”,第2881行,运行代码
exec(代码对象、self.user\u全局、self.user\n)
文件“”,第12行,在
sess.run(training_op,feed_dict={X:np.array(epoch_X_train),Y:np.array(epoch_Y_train)})
文件“C:\ProgramData\Anaconda3\envs\py35\lib\site packages\tensorflow\python\client\session.py”,第889行,正在运行
运行_元数据_ptr)
文件“C:\ProgramData\Anaconda3\envs\py35\lib\site packages\tensorflow\python\client\session.py”,第1089行,正在运行
np\u val=np.asarray(subfeed\u val,dtype=subfeed\u dtype)
asarray中的文件“C:\ProgramData\Anaconda3\envs\py35\lib\site packages\numpy\core\numeric.py”,第531行
返回数组(a,数据类型,copy=False,order=order)
ValueError:使用序列设置数组元素。
列车数据是如何定义的?这看起来很像@Maxim感谢您的回复。格式错误。train\u dat
是如何定义的?这看起来很像@Maxim感谢您的回复。格式不对。