Python 对于深度网络,假设每个批次由两个输入(S和I)组成,如何计算每个样本Si和所有I是否匹配?

Python 对于深度网络,假设每个批次由两个输入(S和I)组成,如何计算每个样本Si和所有I是否匹配?,python,tensorflow,machine-learning,deep-learning,Python,Tensorflow,Machine Learning,Deep Learning,对于深度网络,假设每个批次由两个输入(S和I)组成,如何计算每个样本Si和所有I(批次示例)是否匹配?我编写了下面的程序,它使用tensorarray来计算每个Si对所有I,S的注意,并且I具有tan函数关系。理想情况下,每个Si都会最关注其对应的I。但收敛的最终结果是Si对每个I都同样关注。有什么建议吗 import tensorflow as tf from tensorflow.python.ops import tensor_array_ops import tensorflow.co

对于深度网络,假设每个批次由两个输入(S和I)组成,如何计算每个样本Si和所有I(批次示例)是否匹配?我编写了下面的程序,它使用tensorarray来计算每个Si对所有I,S的注意,并且I具有tan函数关系。理想情况下,每个Si都会最关注其对应的I。但收敛的最终结果是Si对每个I都同样关注。有什么建议吗

import tensorflow as tf
from tensorflow.python.ops import tensor_array_ops
import  tensorflow.contrib.layers as layers
import numpy as np

batch_szie = 64
word_len = 16
word_emb_dim = 512
feature_dim = 512



x =  tf.placeholder(dtype=tf.float32,shape =[None,1024],name="S")

sentence = tf.placeholder(dtype=tf.float32,shape =[None,1024],name="I")

# target = tf.placeholder(dtype=tf.float32,shape=[None,64],name = "target")
batch_size = tf.shape(sentence)[0]
labels = tf.eye(batch_size)


loss_array  = tensor_array_ops.TensorArray(dtype=tf.float32, size=64,dynamic_size=False, infer_shape=True)
attention_array  = tensor_array_ops.TensorArray(dtype=tf.float32, size=64,dynamic_size=False, infer_shape=True)

x_pre = layers.fully_connected(
        x,
        num_outputs=1024,
        # activation_fn=tf.nn.relu,
        scope="pre",
        reuse=tf.AUTO_REUSE)
sentence_tp = layers.fully_connected(
        sentence,
        num_outputs=1024,
        # activation_fn=tf.nn.relu,
        scope="s_pre",
        reuse=tf.AUTO_REUSE)
def body(i,loss_array,attention_array):
    res = tf.tile(tf.expand_dims(tf.expand_dims(x_pre[i],1),0)       [batch_size,1,1])
    res = tf.matmul(tf.expand_dims(sentence_tp,1),res)
    res = tf.reshape(res, [batch_size])
    attention = tf.reduce_sum(labels[i] * tf.nn.softmax(res, 0), 0)
    tp_loss  = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(labels=labels[i], logits=res))
    loss_array =loss_array.write(i,tp_loss)
    attention_array = attention_array.write(i,attention)
    return i+1 , loss_array , attention_array

_, loss_res,attention_res = tf.while_loop(cond=lambda i, _1,_2: i < 64,
                         body=body,
                         loop_vars=[tf.constant(0), loss_array,attention_array])

loss= tf.reduce_mean(loss_res.stack())
attention_all = attention_res.stack()

vars =  tf.trainable_variables()

dis_optimizer = tf.train.AdamOptimizer(learning_rate=0.001, beta1=0.5, beta2=0.9)
dis_grads = tf.gradients(loss, vars)
dis_grads_and_vars = list(zip(dis_grads, vars))

for grad, var in dis_grads_and_vars:
  print("var:", var, "   ", grad)
dis_train_op =dis_optimizer.apply_gradients(grads_and_vars=dis_grads_and_vars)
with tf.Session() as sess:
  sess.run(tf.global_variables_initializer())
  for i in range(1000000):
    org = np.random.uniform(low=0.0,high=100,size = (64,1024))
    image = np.sin(org)
    s =    np.cos(org)
    feed_dict = {x:image,sentence:s}
    a,_,lr = sess.run([attention_all,dis_train_op,loss],feed_dict)
    print(lr)
    for j in range(1):
        print("attention:",a)
将tensorflow导入为tf
从tensorflow.python.ops导入tensor_数组_ops
将tensorflow.contrib.layers作为图层导入
将numpy作为np导入
批次_szie=64
单词_len=16
单词\u emb\u dim=512
特征尺寸=512
x=tf.placeholder(dtype=tf.float32,shape=[None,1024],name=“S”)
语句=tf.placeholder(dtype=tf.float32,shape=[None,1024],name=“I”)
#target=tf.placeholder(dtype=tf.float32,shape=[None,64],name=“target”)
批次大小=tf.形状(句子)[0]
标签=tf.眼(批量大小)
丢失数组=张量数组\u ops.TensorArray(数据类型=tf.float32,大小=64,动态大小=False,推断形状=True)
注意(dtype=tf.float32,size=64,dynamic\u size=False,infere\u shape=True)
x_pre=层。完全连接(
x,,
num_输出=1024,
#激活\u fn=tf.nn.relu,
scope=“pre”,
重用=tf.自动_重用)
句子\u tp=层。完全连接(
判决,
num_输出=1024,
#激活\u fn=tf.nn.relu,
scope=“s_pre”,
重用=tf.自动_重用)
def主体(i、丢失阵列、注意阵列):
res=tf.tile(tf.expand_dims(tf.expand_dims(x_pre[i],1),0)[批量大小,1,1])
res=tf.matmul(tf.expand_dims(句子,1),res)
res=tf.重塑(res,[批次大小])
注意=tf.reduce_sum(标签[i]*tf.nn.softmax(res,0),0)
tp\u损失=tf.reduce\u平均值(tf.nn.softmax\u交叉熵\u与logits\u v2(标签=标签[i],logits=res))
丢失数组=丢失数组。写入(i,tp\u丢失)
注意数组=注意数组。写入(i,注意)
返回i+1,损失阵,注意阵
_当循环(cond=lambda i,_1,_2:i<64,
身体,
循环变量=[tf.常量(0),丢失数组,注意数组])
损耗=tf.reduce\u mean(损耗\u res.stack())
注意(all)=注意(res.stack)
vars=tf.可训练的_变量()
dis_优化器=tf.train.AdamOptimizer(学习率=0.001,β1=0.5,β2=0.9)
dis_grads=tf.梯度(损失、变分)
dis_grads_and_vars=列表(zip(dis_grads,vars))
对于梯度、分布梯度和变量中的变量:
打印(“变量:,变量,”,梯度)
dis_train_op=dis_优化器。应用梯度(梯度和变量=dis_梯度和变量)
使用tf.Session()作为sess:
sess.run(tf.global\u variables\u initializer())
对于范围内的i(1000000):
org=np.random.uniform(低=0.0,高=100,大小=(641024))
image=np.sin(org)
s=np.cos(组织)
feed_dict={x:image,句子:s}
a、 lr=sess.run([全场注意,列车停止,损失],进站指令)
打印(lr)
对于范围(1)内的j:
列印(注意:,a)
当对每个用例的关注度为1/批大小时,聚合