Python 对于相同的网络、损耗和初始化,TensorFlow始终比Pyrotch获得更小的误差
我在TensorFlow和PyTorch中为CIFAR10实现了两个相同的网络。两者都有相同分布的权重初始化(对于权重-Xavier均匀,对于偏差为零),但TensorFlow似乎在最小测试误差方面始终优于PyTorch(尽管PyTorch似乎几乎Python 对于相同的网络、损耗和初始化,TensorFlow始终比Pyrotch获得更小的误差,python,tensorflow,pytorch,Python,Tensorflow,Pytorch,我在TensorFlow和PyTorch中为CIFAR10实现了两个相同的网络。两者都有相同分布的权重初始化(对于权重-Xavier均匀,对于偏差为零),但TensorFlow似乎在最小测试误差方面始终优于PyTorch(尽管PyTorch似乎几乎9%更快!)。两者都使用SGD进行了优化。即使配备了完全相同的初始化值,TensorFlow仍能获得较小的误差。这些是模拟中的典型曲线: 红色曲线是TensorFlow的测试误差,橙色曲线是PyTorch的测试误差。两个都是用完全相同的值初始化的 由于
9%
更快!)。两者都使用SGD进行了优化。即使配备了完全相同的初始化值,TensorFlow仍能获得较小的误差。这些是模拟中的典型曲线:
红色曲线是TensorFlow的测试误差,橙色曲线是PyTorch的测试误差。两个都是用完全相同的值初始化的
由于代码相对较长,我在这里只提供架构的实现。可以找到两种实现的Jupyter格式的完整可复制代码
TF网络的实施:
def tf_model(graph, init=None):
with graph.as_default():
if init:
conv1_init = init['conv1']
conv2_init = init['conv2']
logits_init = init['logits']
conv1_init = tf.constant_initializer(conv1_init)
conv2_init = tf.constant_initializer(conv2_init)
logits_init = tf.constant_initializer(logits_init)
else:
conv1_init = tf.contrib.layers.xavier_initializer()
conv2_init = tf.contrib.layers.xavier_initializer()
logits_init = tf.contrib.layers.xavier_initializer()
with tf.name_scope('Input'):
x = tf.placeholder(tf.float32, shape=[None, 32, 32, 3], name='x')
y = tf.placeholder(tf.int32, shape=[None], name='y')
keep_prob = tf.placeholder_with_default(1.0 - dropout_rate, shape=())
with tf.device('/device:GPU:0'):
with tf.name_scope('conv1'):
conv1 = tf.layers.conv2d(x,
filters=6,
kernel_size=5,
strides=1,
padding='valid',
kernel_initializer=conv1_init,
bias_initializer=tf.initializers.zeros,
activation=tf.nn.relu,
name='conv1'
)
max_pool1 = tf.nn.max_pool(value=conv1,
ksize=(1, 2, 2, 1),
strides=(1, 2, 2, 1),
padding='SAME',
name='max_pool1')
dropout1 = tf.nn.dropout(max_pool1, keep_prob=keep_prob)
with tf.name_scope('conv2'):
conv2 = tf.layers.conv2d(dropout1,
filters=12,
kernel_size=3,
strides=1,
padding='valid',
bias_initializer=tf.initializers.zeros,
activation=tf.nn.relu,
kernel_initializer=conv2_init,
name='conv2')
max_pool2 = tf.nn.max_pool(value=conv2,
ksize=(1, 2, 2, 1),
strides=(1, 2, 2, 1),
padding='VALID',
name='max_pool2')
dropout2 = tf.nn.dropout(max_pool2, keep_prob=keep_prob)
with tf.name_scope('logits'):
flatten = tf.layers.Flatten()(max_pool2)
logits = tf.layers.dense(flatten,
units=10,
kernel_initializer=logits_init,
bias_initializer=tf.initializers.zeros,
name='logits')
return x, y, keep_prob, logits
PyTorch实施:
class TorchModel(nn.Module):
def __init__(self, dropout_rate=0.0, init=None):
super(TorchModel, self).__init__()
self.conv1 = nn.Sequential(
nn.Conv2d(in_channels=3,
out_channels=6,
kernel_size=5,
padding=0,
bias=True),
nn.ReLU(),
nn.MaxPool2d(2),
nn.Dropout(p=dropout_rate))
if init:
conv1_init = init['conv1']
self.conv1[0].weight = nn.Parameter(torch.FloatTensor(conv1_init))
else:
torch.nn.init.xavier_uniform_(self.conv1[0].weight)
torch.nn.init.zeros_(self.conv1[0].bias)
self.conv2 = nn.Sequential(
nn.Conv2d(in_channels=6,
out_channels=12,
kernel_size=3,
bias=True),
nn.ReLU(),
nn.MaxPool2d(2),
nn.Dropout(p=dropout_rate))
if init:
conv2_init = init['conv2']
self.conv2[0].weight = nn.Parameter(torch.FloatTensor(conv2_init))
else:
torch.nn.init.xavier_uniform_(self.conv2[0].weight)
torch.nn.init.zeros_(self.conv2[0].bias)
self.logits = nn.Linear(432, 10)
if init:
logits_init = init['logits']
logits_init = np.reshape(logits_init, [10, 432])
self.logits.weight = nn.Parameter(torch.FloatTensor(logits_init))
else:
torch.nn.init.xavier_uniform_(self.logits.weight)
torch.nn.init.zeros_(self.logits.bias)
def forward(self, x):
x = self.conv1(x)
x = self.conv2(x)
x = x.view(x.size(0), -1)
x = self.logits(x)
return x
我还想补充一点,我要问的原因是,对于我目前测试的不同优化算法(非常复杂,因此我在这里提供了一个简单的SGD示例),情况正好相反——Pytork在最小误差方面始终优于TF
我很高兴听到你的想法。你有同样的经历吗?我是否在TensorFlow和PyTorch中的SGD实现中遗漏了什么
谢谢。我对tensorflow有经验,但对pytorch没有经验,所以我在谷歌上搜索了一下,发现了一些大致类似的投诉。您可以通过查看以下内容获得一些见解:有人提到他在CPU上而不是在GPU上运行pytorch,或者这里提到tf.contrib.layers.fully_connected是可能的罪魁祸首,以及在双精度方面的差异。@InonPeled谢谢!我来看看。我建议使用相同的种子进行权重初始化,以进行此类比较。在TF中,它将是
TF.random.set\u random\u seed
,而对于PyTorch,它是torch.manual\u seed
。