Tensorflow配置文件为Conv2D输出2次触发器，而不是1次_Tensorflow_Profiling_Convolution_Flops

Tensorflow配置文件为Conv2D输出2次触发器，而不是1次

tensorflow

Tensorflow配置文件为Conv2D输出2次触发器，而不是1次,tensorflow,profiling,convolution,flops,Tensorflow,Profiling,Convolution,Flops,我想知道是否有人知道为什么Conv2d操作的失败次数是2而不是1。在下面的示例中，输入是具有1个通道的1x1图像，批大小为1。卷积中的特征数也是1，没有偏差。理想情况下，乘法数应为1。但TF profiler的输出显示失败次数为2次。失败是否包括乘法以外的内容？谢谢以下是一个例子： import os os.environ['CUDA_VISIBLE_DEVICES'] = '0' # assuming you have a gpu0 import tensorflow as tf from

我想知道是否有人知道为什么

Conv2d

操作的失败次数是2而不是1。在下面的示例中，输入是具有1个通道的

1x1

图像，批大小为1。卷积中的特征数也是1，没有偏差。理想情况下，乘法数应为1。但TF profiler的输出显示失败次数为2次。失败是否包括乘法以外的内容？谢谢

以下是一个例子：

import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'  # assuming you have a gpu0
import tensorflow as tf
from keras import backend as K


def load_pb(pb):
    with tf.gfile.GFile(pb, "rb") as f:
        graph_def = tf.GraphDef()
        graph_def.ParseFromString(f.read())
    with tf.Graph().as_default() as graph:
        tf.import_graph_def(graph_def, name='')
        return graph


def freeze_session(session, keep_var_names=None, output_names=None, clear_devices=True):
    from tensorflow.python.framework.graph_util import convert_variables_to_constants
    graph = session.graph
    with graph.as_default():
        freeze_var_names = list(set(v.op.name for v in tf.global_variables()).difference(keep_var_names or []))
        output_names = output_names or []
        output_names += [v.op.name for v in tf.global_variables()]
        input_graph_def = graph.as_graph_def()
        if clear_devices:
            for node in input_graph_def.node:
                node.device = ""
        frozen_graph = convert_variables_to_constants(session, input_graph_def,output_names, freeze_var_names)
        return frozen_graph


# define the model
inp = tf.keras.layers.Input(batch_shape=(1, 1, 1, 1), name='input')
x = tf.keras.layers.Conv2D(1, kernel_size=(1, 1), strides=(1, 1), padding='same', name='conv', use_bias=False)(inp)
out = tf.keras.layers.Flatten(name='output')(x)
model = tf.keras.models.Model(inputs=inp, outputs=out)
model.summary()

# freeze the model
output_graph_def = freeze_session(K.get_session(), output_names=[out.op.name for out in model.outputs])
with tf.gfile.GFile('graph.pb', "wb") as f:
    f.write(output_graph_def.SerializeToString())

# load the protobuf and perform tf profiling
g2 = load_pb('./graph.pb')
with g2.as_default():
    opts = tf.profiler.ProfileOptionBuilder.float_operation()
    flops = tf.profiler.profile(g2, run_meta=tf.RunMetadata(), cmd='scope', options=opts)
    print('FLOP', flops.total_float_ops)

输出为：

_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
input (InputLayer)           (1, 1, 1, 1)              0                                                                                                                                                
_________________________________________________________________                                                                                                                                          
conv (Conv2D)                (1, 1, 1, 1)              1                                                                                                                                                 
_________________________________________________________________          
output (Flatten)             (1, 1)                    0
=================================================================
Total params: 1
Trainable params: 1
Non-trainable params: 0
_________________________________________________________________
Converted 1 variables to const ops.
Parsing Inputs...
=========================Options=============================
-max_depth                  10000
-min_bytes                  0
-min_peak_bytes             0
-min_residual_bytes         0
-min_output_bytes           0
-min_micros                 0
-min_accelerator_micros     0
-min_cpu_micros             0
-min_params                 0
-min_float_ops              1
-min_occurrence             0
-step                       -1
-order_by                   float_ops
-account_type_regexes       .*
-start_name_regexes         .*
-trim_name_regexes          
-show_name_regexes          .*
-hide_name_regexes          
-account_displayed_op_only  true
-select                     float_ops
-output                     stdout:
==================Model Analysis Report======================
Doc:
scope: The nodes in the model graph are organized by their names, which is hierarchical like filesystem.
flops: Number of float operations. Note: Please read the implementation for the math behind it.
Profile:
node name | # float_ops
_TFProfRoot (--/2 flops)
  conv/Conv2D (2/2 flops)
======================End of Report==========================
FLOP 2

考虑与您几乎相同的设置，但相反，卷积有n个通道。然后有n次乘法，然后将所有乘法的结果累加起来。现在可以说，你可以通过第一次乘法的结果来初始化和，然后对剩余的（n-1）次乘法进行累加和。但这将是对第一次乘法的特殊处理，相反，将总和初始化为0，然后将其与所有n次乘法累加起来更有意义。特别是当n=1时，会出现一个荒谬的情况，其中

sum = 0
mult = w1 * a1
sum = sum + mult

这将导致2次触发器，或1个MAC（乘法累加）操作