Python 如何平衡GAN中的发生器和鉴别器性能?

Python 如何平衡GAN中的发生器和鉴别器性能?,python,machine-learning,deep-learning,pytorch,generative-adversarial-network,Python,Machine Learning,Deep Learning,Pytorch,Generative Adversarial Network,这是我第一次与GANs合作,我面临一个问题,即鉴别器的性能一再优于生成器。我正试图从中复制PA模型,希望能帮到我 我已经阅读了很多关于GANs如何工作的论文,并且还学习了一些教程来更好地理解它们。此外,我读过关于如何克服主要不稳定性的文章,但我找不到克服这种行为的方法 在我的环境中,我使用的是PyTorch和BCELoss()。接下来,我将使用以下训练循环: criterion = nn.BCELoss() train_d = False # Discriminator true optim_d

这是我第一次与GANs合作,我面临一个问题,即鉴别器的性能一再优于生成器。我正试图从中复制
PA
模型,希望能帮到我

我已经阅读了很多关于GANs如何工作的论文,并且还学习了一些教程来更好地理解它们。此外,我读过关于如何克服主要不稳定性的文章,但我找不到克服这种行为的方法

在我的环境中,我使用的是
PyTorch
BCELoss()。接下来,我将使用以下训练循环:

criterion = nn.BCELoss()
train_d = False
# Discriminator true
optim_d.zero_grad()
disc_train_real = target.to(device)
batch_size = disc_train_real.size(0)
label = torch.full((batch_size,), 1, device=device).cuda()
output_d = discriminator(disc_train_real).view(-1)
loss_d_real = criterion(output_d, label).cuda()
if lossT:
    loss_d_real *= 2
if loss_d_real.item() > 0.3:
    loss_d_real.backward()
    train_d = True
D_x = output_d.mean().item()
# Discriminator false
output_g = generator(image)
output_d = discriminator(output_g.detach()).view(-1)
label.fill_(0)
loss_d_fake = criterion(output_d, label).cuda()
D_G_z1 = output_d.mean().item()
if lossT:
    loss_d_fake *= 2
loss_d = loss_d_real + loss_d_fake
if loss_d_fake.item() > 0.3:
    loss_d_fake.backward()
    train_d = True
if train_d:
    optim_d.step()

# Generator
label.fill_(1)
output_d = discriminator(output_g).view(-1)
loss_g = criterion(output_d, label).cuda()
D_G_z2 = output_d.mean().item()
if lossT:
    loss_g *= 2

loss_g.backward()
optim_g.step()
而且,经过一段时间的和解,一切似乎都很顺利:

Epoch 1/5 - Step: 1900/9338  Loss G: 3.057388  Loss D: 0.214545  D(x): 0.940985  D(G(z)): 0.114064 / 0.114064
Time for the last step: 51.55 s    Epoch ETA: 01:04:13
Epoch 1/5 - Step: 2000/9338  Loss G: 2.984724  Loss D: 0.222931  D(x): 0.879338  D(G(z)): 0.159163 / 0.159163
Time for the last step: 52.68 s    Epoch ETA: 01:03:24
Epoch 1/5 - Step: 2100/9338  Loss G: 2.824713  Loss D: 0.241953  D(x): 0.905837  D(G(z)): 0.110231 / 0.110231
Time for the last step: 50.91 s    Epoch ETA: 01:02:29
Epoch 1/5 - Step: 2200/9338  Loss G: 2.807455  Loss D: 0.252808  D(x): 0.908131  D(G(z)): 0.218515 / 0.218515
Time for the last step: 51.72 s    Epoch ETA: 01:01:37
Epoch 1/5 - Step: 2300/9338  Loss G: 2.470529  Loss D: 0.569696  D(x): 0.620966  D(G(z)): 0.512615 / 0.350175
Time for the last step: 51.96 s    Epoch ETA: 01:00:46
Epoch 1/5 - Step: 2400/9338  Loss G: 2.148863  Loss D: 1.071563  D(x): 0.809529  D(G(z)): 0.114487 / 0.114487
Time for the last step: 51.59 s    Epoch ETA: 00:59:53
Epoch 1/5 - Step: 2500/9338  Loss G: 2.016863  Loss D: 0.904711  D(x): 0.621433  D(G(z)): 0.440721 / 0.435932
Time for the last step: 52.03 s    Epoch ETA: 00:59:02
Epoch 1/5 - Step: 2600/9338  Loss G: 2.495639  Loss D: 0.949308  D(x): 0.671085  D(G(z)): 0.557924 / 0.420826
Time for the last step: 52.66 s    Epoch ETA: 00:58:12
Epoch 1/5 - Step: 2700/9338  Loss G: 2.519842  Loss D: 0.798667  D(x): 0.775738  D(G(z)): 0.246357 / 0.265839
Time for the last step: 51.20 s    Epoch ETA: 00:57:19
Epoch 1/5 - Step: 2800/9338  Loss G: 2.545630  Loss D: 0.756449  D(x): 0.895455  D(G(z)): 0.403628 / 0.301851
Time for the last step: 51.88 s    Epoch ETA: 00:56:27
Epoch 1/5 - Step: 2900/9338  Loss G: 2.458109  Loss D: 0.653513  D(x): 0.820105  D(G(z)): 0.379199 / 0.103250
Time for the last step: 53.50 s    Epoch ETA: 00:55:39
Epoch 1/5 - Step: 3000/9338  Loss G: 2.030103  Loss D: 0.948208  D(x): 0.445385  D(G(z)): 0.303225 / 0.263652
Time for the last step: 51.57 s    Epoch ETA: 00:54:47
Epoch 1/5 - Step: 3100/9338  Loss G: 1.721604  Loss D: 0.949721  D(x): 0.365646  D(G(z)): 0.090072 / 0.232912
Time for the last step: 52.19 s    Epoch ETA: 00:53:55
Epoch 1/5 - Step: 3200/9338  Loss G: 1.438854  Loss D: 1.142182  D(x): 0.768163  D(G(z)): 0.321164 / 0.237878
Time for the last step: 50.79 s    Epoch ETA: 00:53:01
Epoch 1/5 - Step: 3300/9338  Loss G: 1.924418  Loss D: 0.923860  D(x): 0.729981  D(G(z)): 0.354812 / 0.318090
Time for the last step: 52.59 s    Epoch ETA: 00:52:11
也就是说,发生器上的梯度较高,并且在一段时间后开始减小,同时鉴别器上的梯度升高。至于损耗,发电机下降,而鉴别器上升。如果与教程相比,我想这是可以接受的

这里是我的第一个问题:我注意到在教程中(通常)当
D_G_z1
上升时,
D_G_z2
下降(反之亦然),而在我的示例中,这种情况发生得很少。这只是巧合还是我做错了什么

鉴于此,我让培训程序继续进行,但现在我注意到:

Epoch 3/5 - Step: 1100/9338  Loss G: 4.071329  Loss D: 0.031608  D(x): 0.999969  D(G(z)): 0.024329 / 0.024329
Time for the last step: 51.41 s    Epoch ETA: 01:11:24
Epoch 3/5 - Step: 1200/9338  Loss G: 3.883331  Loss D: 0.036354  D(x): 0.999993  D(G(z)): 0.043874 / 0.043874
Time for the last step: 51.63 s    Epoch ETA: 01:10:29
Epoch 3/5 - Step: 1300/9338  Loss G: 3.468963  Loss D: 0.054542  D(x): 0.999972  D(G(z)): 0.050145 / 0.050145
Time for the last step: 52.47 s    Epoch ETA: 01:09:40
Epoch 3/5 - Step: 1400/9338  Loss G: 3.504971  Loss D: 0.053683  D(x): 0.999972  D(G(z)): 0.052180 / 0.052180
Time for the last step: 50.75 s    Epoch ETA: 01:08:41
Epoch 3/5 - Step: 1500/9338  Loss G: 3.437765  Loss D: 0.056286  D(x): 0.999941  D(G(z)): 0.058839 / 0.058839
Time for the last step: 52.20 s    Epoch ETA: 01:07:50
Epoch 3/5 - Step: 1600/9338  Loss G: 3.369209  Loss D: 0.062133  D(x): 0.955688  D(G(z)): 0.058773 / 0.058773
Time for the last step: 51.05 s    Epoch ETA: 01:06:54
Epoch 3/5 - Step: 1700/9338  Loss G: 3.290109  Loss D: 0.065704  D(x): 0.999975  D(G(z)): 0.056583 / 0.056583
Time for the last step: 51.27 s    Epoch ETA: 01:06:00
Epoch 3/5 - Step: 1800/9338  Loss G: 3.286248  Loss D: 0.067969  D(x): 0.993238  D(G(z)): 0.063815 / 0.063815
Time for the last step: 52.28 s    Epoch ETA: 01:05:09
Epoch 3/5 - Step: 1900/9338  Loss G: 3.263996  Loss D: 0.065335  D(x): 0.980270  D(G(z)): 0.037717 / 0.037717
Time for the last step: 51.59 s    Epoch ETA: 01:04:16
Epoch 3/5 - Step: 2000/9338  Loss G: 3.293503  Loss D: 0.065291  D(x): 0.999873  D(G(z)): 0.070188 / 0.070188
Time for the last step: 51.85 s    Epoch ETA: 01:03:25
Epoch 3/5 - Step: 2100/9338  Loss G: 3.184164  Loss D: 0.070931  D(x): 0.999971  D(G(z)): 0.059657 / 0.059657
Time for the last step: 52.14 s    Epoch ETA: 01:02:34
Epoch 3/5 - Step: 2200/9338  Loss G: 3.116310  Loss D: 0.080597  D(x): 0.999850  D(G(z)): 0.074931 / 0.074931
Time for the last step: 51.85 s    Epoch ETA: 01:01:42
Epoch 3/5 - Step: 2300/9338  Loss G: 3.142180  Loss D: 0.073999  D(x): 0.995546  D(G(z)): 0.054752 / 0.054752
Time for the last step: 51.76 s    Epoch ETA: 01:00:50
Epoch 3/5 - Step: 2400/9338  Loss G: 3.185711  Loss D: 0.072601  D(x): 0.999992  D(G(z)): 0.076053 / 0.076053
Time for the last step: 50.53 s    Epoch ETA: 00:59:54
Epoch 3/5 - Step: 2500/9338  Loss G: 3.027437  Loss D: 0.083906  D(x): 0.997390  D(G(z)): 0.082501 / 0.082501
Time for the last step: 52.06 s    Epoch ETA: 00:59:03
Epoch 3/5 - Step: 2600/9338  Loss G: 3.052374  Loss D: 0.085030  D(x): 0.999924  D(G(z)): 0.073295 / 0.073295
Time for the last step: 52.37 s    Epoch ETA: 00:58:12
不仅
D(x)
再次增加并且几乎保持在1,而且
D_G_z1
D_G_z2
始终显示相同的值。此外,从损失来看,很明显鉴别器的性能优于生成器。这种行为在这个时代的其余部分和下一个时代一直持续到训练结束

因此,我的第二个问题是:这正常吗?如果没有,我在程序中做错了什么?如何实现更稳定的训练

编辑:我已尝试按照建议使用
MSELoss()
对网络进行训练,以下是输出:

Epoch 1/1 - Step: 100/9338  Loss G: 0.800785  Loss D: 0.404525  D(x): 0.844653  D(G(z)): 0.030439 / 0.016316
Time for the last step: 55.22 s    Epoch ETA: 01:25:01
Epoch 1/1 - Step: 200/9338  Loss G: 1.196659  Loss D: 0.014051  D(x): 0.999970  D(G(z)): 0.006543 / 0.006500
Time for the last step: 51.41 s    Epoch ETA: 01:21:11
Epoch 1/1 - Step: 300/9338  Loss G: 1.197319  Loss D: 0.000806  D(x): 0.999431  D(G(z)): 0.004821 / 0.004724
Time for the last step: 51.79 s    Epoch ETA: 01:19:32
Epoch 1/1 - Step: 400/9338  Loss G: 1.198960  Loss D: 0.000720  D(x): 0.999612  D(G(z)): 0.000000 / 0.000000
Time for the last step: 51.47 s    Epoch ETA: 01:18:09
Epoch 1/1 - Step: 500/9338  Loss G: 1.212810  Loss D: 0.000021  D(x): 0.999938  D(G(z)): 0.000000 / 0.000000
Time for the last step: 52.18 s    Epoch ETA: 01:17:11
Epoch 1/1 - Step: 600/9338  Loss G: 1.216168  Loss D: 0.000000  D(x): 0.999945  D(G(z)): 0.000000 / 0.000000
Time for the last step: 51.24 s    Epoch ETA: 01:16:02
Epoch 1/1 - Step: 700/9338  Loss G: 1.212301  Loss D: 0.000000  D(x): 0.999970  D(G(z)): 0.000000 / 0.000000
Time for the last step: 51.61 s    Epoch ETA: 01:15:02
Epoch 1/1 - Step: 800/9338  Loss G: 1.214397  Loss D: 0.000005  D(x): 0.999973  D(G(z)): 0.000000 / 0.000000
Time for the last step: 51.58 s    Epoch ETA: 01:14:04
Epoch 1/1 - Step: 900/9338  Loss G: 1.212016  Loss D: 0.000003  D(x): 0.999932  D(G(z)): 0.000000 / 0.000000
Time for the last step: 52.20 s    Epoch ETA: 01:13:13
Epoch 1/1 - Step: 1000/9338  Loss G: 1.215162  Loss D: 0.000000  D(x): 0.999988  D(G(z)): 0.000000 / 0.000000
Time for the last step: 52.28 s    Epoch ETA: 01:12:23
Epoch 1/1 - Step: 1100/9338  Loss G: 1.216291  Loss D: 0.000000  D(x): 0.999983  D(G(z)): 0.000000 / 0.000000
Time for the last step: 51.78 s    Epoch ETA: 01:11:28
Epoch 1/1 - Step: 1200/9338  Loss G: 1.215526  Loss D: 0.000000  D(x): 0.999978  D(G(z)): 0.000000 / 0.000000
Time for the last step: 51.88 s    Epoch ETA: 01:10:35

可以看出,情况变得更糟。此外,重新阅读一遍,第4.2.4节(对抗性训练)指出,使用的对抗性损失函数是一个
BCELoss()
,因为我希望解决
MSELoss()

解释GAN损失时出现的消失梯度问题,因为实际损失值有点像黑色艺术

问题1:鉴别器/发生器优势之间的摆动频率将根据几个主要因素(以我的经验)而变化:学习率和批量大小,这将影响传播损失。使用的特定损失指标将影响D&G网络培训方式的差异。EnhanceNet论文(用于基线)和教程也使用均方误差损失-您使用的是二进制交叉熵损失,这将改变网络收敛的速率。我不是专家,所以这里有一个很好的例子。当你改变损失函数时,你会很好奇你的网络是否会有不同的行为——试试看,然后更新问题

问题2:随着时间的推移,D和G损失都应该稳定在一个值上,但是,很难判断它们是否收敛于强性能,或者它们是否收敛于模式崩溃/梯度减小()之类的原因。我发现最好的方法是实际检查生成图像的横截面,或者目视检查输出,或者在生成的图像集中使用某种感知度量(SSIM、PSNR、PIQ等)

在寻找ans时,您可能会发现一些其他有用的线索:

在解释损失方面有一些相当好的建议


伊恩·古德费罗对于如何平衡D&G培训也有一些可靠的想法。

@经验丰富的投稿人:如果我的回答很奇怪,请提前道歉-这对于造成堆栈溢出来说是一个全新的问题-因此我非常乐意接受关于如何改进我的答案的提示。谢谢你的帮助。我已经更新了问题,提供了您建议的输出。长话短说:使用
MSELoss()。请检查一下。此外,我已经一遍又一遍地阅读了这篇论文,我必须说,我认为我所联系的实现在使用这种损失和学习率方面是错误的,这在论文中有明确的规定。我真正感兴趣的是批量大小:我使用的是
batch\u size=16
,你认为这会影响培训吗?你有什么建议?