Python tensorflow中的触发器：矩阵乘法_Python_Tensorflow

Python tensorflow中的触发器：矩阵乘法

python tensorflow

Python tensorflow中的触发器：矩阵乘法,python,tensorflow,Python,Tensorflow,受此启发，我尝试测量tensorflow矩阵乘法所需的失败次数对于大小分别为（m x p）和（p x n）的两个矩阵A和B，得到的大小为（m x n）的矩阵C=AB有mn个条目。对于每个条目，都需要p乘法和（p-1）求和。因此，操作总数为mn（2p-1）使用链接问题/答案中的代码，tensorflow输出m*n*2p，请参见下面的代码为什么返回的是近似值而不是理论值？在最坏的情况下，p=1，此近似值比正确值大2倍 import numpy as np import tensorflow a

受此启发，我尝试测量tensorflow矩阵乘法所需的失败次数

对于大小分别为（m x p）和（p x n）的两个矩阵A和B，得到的大小为（m x n）的矩阵C=AB有mn个条目。对于每个条目，都需要p乘法和（p-1）求和。因此，操作总数为

mn（2p-1）

使用链接问题/答案中的代码，tensorflow输出

m*n*2p

，请参见下面的代码

为什么返回的是近似值而不是理论值？在最坏的情况下，p=1，此近似值比正确值大2倍

import numpy as np
import tensorflow as tf
g = tf.Graph()
run_meta = tf.RunMetadata()
with g.as_default():
    A=tf.convert_to_tensor(np.random.rand(13,9))
    B=tf.convert_to_tensor(np.random.rand(9,7))
    C = tf.matmul(A,B) # shape=[13,7]

    opts = tf.profiler.ProfileOptionBuilder.float_operation()    
    flops = tf.profiler.profile(g, run_meta=run_meta, cmd='op', options
=opts)
    if flops is not None:
        print('Flops should be ', 13*7*(2*9-1))
        print('Approximation 2*13*7*9=',2*13*7*9) 
        print('TF stats gives',flops.total_float_ops)

#Output: 
#Flops should be  1547
#Approximation 2*13*7*9= 1638
#TF stats gives 1638

我不知道为什么，但我认为这是理论价值：

...

@ops.RegisterStatistics("MatMul", "flops")
def _calc_mat_mul_flops(graph, node):
  """Calculates the compute resources needed for MatMul."""
  transpose_a = node.attr["transpose_a"].b
  a_shape = graph_util.tensor_shape_from_node_def_name(graph, node.input[0])
  a_shape.assert_is_fully_defined()
  if transpose_a:
    k = int(a_shape[0])
  else:
    k = int(a_shape[1])
  output_shape = graph_util.tensor_shape_from_node_def_name(graph, node.name)
  output_shape.assert_is_fully_defined()
  output_count = np.prod(output_shape.as_list())
  return ops.OpStats("flops", (k * output_count * 2))

...

我不知道为什么，但我认为这是理论价值：

...

@ops.RegisterStatistics("MatMul", "flops")
def _calc_mat_mul_flops(graph, node):
  """Calculates the compute resources needed for MatMul."""
  transpose_a = node.attr["transpose_a"].b
  a_shape = graph_util.tensor_shape_from_node_def_name(graph, node.input[0])
  a_shape.assert_is_fully_defined()
  if transpose_a:
    k = int(a_shape[0])
  else:
    k = int(a_shape[1])
  output_shape = graph_util.tensor_shape_from_node_def_name(graph, node.name)
  output_shape.assert_is_fully_defined()
  output_count = np.prod(output_shape.as_list())
  return ops.OpStats("flops", (k * output_count * 2))

...

我认为这是因为在实践中，求和通常是这样编码的（下面是伪代码）：

也就是说，第一个元素

x[0]*y[0]

被求和为

total

（此时为0），从而产生

和，而不是

p-1

你可以试着聪明一点，避免这种额外的总和：

total = x[0] * y[0]
for i in 1...p
  total += x[i] * y[i]

。。。但是如果

p==0

，会发生什么呢？哎哟，我们需要添加一个额外的比较：

if p > 0
  total = x[0] * y[0]
  for i in 1...p
    total += x[i] * y[i]
else
  total = 0

问题是，这个比较不是一个失败，也不会出现在你的失败计数中——然而在实践中，它与一个简单的加法一样昂贵，如果不是更昂贵的话

底线：

如果实现没有“优化”初始和，则flop计算可能是正确的
这种“优化”实际上可能不会加快代码的速度
对失败采取一点谨慎的措施，不要太担心组件的消失