展开Tensorflow循环以避免频繁的GPU内核开销_Tensorflow_Tensorflow Gpu

展开Tensorflow循环以避免频繁的GPU内核开销

tensorflow

展开Tensorflow循环以避免频繁的GPU内核开销,tensorflow,tensorflow-gpu,Tensorflow,Tensorflow Gpu,考虑下面的程序，我迭代计算了许多次b+=a A = tf.constant(np.random.randn(1000000)) B = tf.constant(np.random.randn(1000000)) init = tf.global_variables_initializer() with tf.Session() as sess: sess.run(init) for i in range(100): B = tf.add(A, B) 显然，上面的循环调用了至少

考虑下面的程序，我迭代计算了许多次

b+=a

A = tf.constant(np.random.randn(1000000))
B = tf.constant(np.random.randn(1000000))

init = tf.global_variables_initializer()
with tf.Session() as sess:
  sess.run(init)
  for i in range(100):
    B = tf.add(A, B)

显然，上面的循环调用了至少100次内核启动，这是不必要的，因为我实际上是在做这个加法。有没有办法避免内核启动开销？理想情况下，我正在寻找一个tensorflow API解决方案（在

run

上只有一个调用），而不改变

B+=a

的逻辑

你可以用

运行循环后，

res

包含“a的值”——请注意，

实际上没有变化，仍然包含起始值。

基本上，您正在图形中创建100个赋值和加法操作，这可能不是您想要的

    A = tf.constant(np.random.randn(1000000))
    # B has to be a variable so we can assign to it
    B = tf.Variable(np.random.randn(1000000))

    # Add the assign and addition operators to the graph
    assign_to_B_op = B.assign(tf.add(A, B)) 

    init = tf.global_variables_initializer()
    with tf.Session() as sess:
        # To ensure we don't add new ops to the Graph by mistake.
        sess.graph.finalize()
        sess.run(init)
        for i in range(100):
            sess.run(assign_to_B_op)
            print(B.eval())

这段代码应该满足您的要求

    A = tf.constant(np.random.randn(1000000))
    # B has to be a variable so we can assign to it
    B = tf.Variable(np.random.randn(1000000))

    # Add the assign and addition operators to the graph
    assign_to_B_op = B.assign(tf.add(A, B)) 

    init = tf.global_variables_initializer()
    with tf.Session() as sess:
        # To ensure we don't add new ops to the Graph by mistake.
        sess.graph.finalize()
        sess.run(init)
        for i in range(100):
            sess.run(assign_to_B_op)
            print(B.eval())

关于TF，您需要了解的第一件事是，您需要将图的定义与其执行分开。这将节省您在以后处理实际问题时调试/搜索效率低下的时间

您当前的问题是因为您没有这样做。在循环中，每次（100次）创建一个图节点。如果您愿意-检查您的tensorboard图如果您懒惰，只需将值增加到一个非常大的值，您的程序就会崩溃，例如

graph大于2Gb

要以更好的方式执行此操作，请定义并执行。要重新分配值，请使用“分配”运算符

A = tf.constant(3) # change to your random stuff
B = tf.Variable(1) # change to your random stuff
B_new = B.assign(tf.add(A, B)) 

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    for i in range(100):
        res = sess.run(B_new)
    print res

最后，您显然不需要循环：

A = tf.constant(3)
B = tf.constant(1)
C = 100 * A + B

with tf.Session() as sess:
    print sess.run(C)

您是否考虑过将随机样本创建为张量，然后为循环的每个元素在张量子集上建立索引？我想你可以使用epochs。我从官方网站上看到这个功能是并行运行的。然而，我确实需要迭代之间的依赖关系。你能告诉我如何到达那里吗？并行操作的数量可以通过函数的

parallel\u iterations

参数来控制。如果需要，可以将其设置为

。然而，在目前的情况下，我怀疑你会注意到结果中的任何差异。