什么是Tensorflow中的梯度重新打包？_Tensorflow_Benchmarking

什么是Tensorflow中的梯度重新打包？

tensorflow

什么是Tensorflow中的梯度重新打包？,tensorflow,benchmarking,Tensorflow,Benchmarking,从终端运行tensorflow基准测试时，我们可以指定几个参数。有一个参数称为gradient_repacking。它代表什么？人们会如何设置它 python tf_cnn_benchmarks.py --data_format=NCHW --batch_size=256 \ --model=resnet50 --optimizer=momentum --variable_update=replicated \ --nodistortions --gradient_repacking=8 --n

从终端运行tensorflow基准测试时，我们可以指定几个参数。有一个参数称为gradient_repacking。它代表什么？人们会如何设置它

python tf_cnn_benchmarks.py --data_format=NCHW --batch_size=256 \
--model=resnet50 --optimizer=momentum --variable_update=replicated \
--nodistortions --gradient_repacking=8 --num_gpus=8 \
--num_epochs=90 --weight_decay=1e-4 --data_dir=${DATA_DIR} --use_fp16 \
--train_dir=${CKPT_DIR}

对于未来的搜索，梯度重新打包会影响复制模式下的所有reduce。从标志定义中：

flags.DEFINE_integer('gradient_repacking', 0, 'Use gradient repacking. It'
                     'currently only works with replicated mode. At the end of'
                     'of each step, it repacks the gradients for more efficient'
                     'cross-device transportation. A non-zero value specifies'
                     'the number of split packs that will be formed.',
                     lower_bound=0)

至于最佳的，我已经看到了

gradient\u-repacking=8

和

gradient\u-repacking=2

我最好的猜测是，这个参数指的是渐变被分解成碎片的数量，以便在其他工作人员之间共享。在这种情况下，8似乎意味着每个GPU彼此共享GPU（即全部对全部）（对于您的

num\u GPU=8

），而2则意味着仅以环形方式与邻居共享

鉴于Horovod使用自己的all reduce算法，当

--variable\u update=Horovod

时，设置

梯度重新打包

没有效果是有道理的