Python 处理我的nn模型产生的负值
我有一个简单的Python 处理我的nn模型产生的负值,python,machine-learning,pytorch,Python,Machine Learning,Pytorch,我有一个简单的nn模型,看起来像这样 class TestRNN(nn.Module): def __init__(self, batch_size, n_steps, n_inputs, n_neurons, n_outputs): super(TestRNN, self).__init__() ... self.basic_rnn = nn.RNN(self.n_inputs, self.n_neurons) self.
nn
模型,看起来像这样
class TestRNN(nn.Module):
def __init__(self, batch_size, n_steps, n_inputs, n_neurons, n_outputs):
super(TestRNN, self).__init__()
...
self.basic_rnn = nn.RNN(self.n_inputs, self.n_neurons)
self.FC = nn.Linear(self.n_neurons, self.n_outputs)
def forward(self, X):
...
lstm_out, self.hidden = self.basic_rnn(X, self.hidden)
out = self.FC(self.hidden)
return out.view(-1, self.n_outputs)
我使用criteria=nn.CrossEntropyLoss()
来计算我的错误。操作顺序如下所示:
# get the inputs
x, y = data
# forward + backward + optimize
outputs = model(x)
loss = criterion(outputs, y)
tensor([[[7.0711e-01, 7.0711e-01, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00],
[2.6164e-02, 2.6164e-02, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 1.3108e-05],
[7.0711e-01, 7.0711e-01, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00],
[9.5062e-01, 3.1036e-01, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00],
[0.0000e+00, 1.3717e-05, 3.2659e-07, ..., 0.0000e+00,
0.0000e+00, 3.2659e-07]],
[[5.1934e-01, 5.4041e-01, 6.8083e-06, ..., 0.0000e+00,
0.0000e+00, 6.8083e-06],
[5.2340e-01, 6.0007e-01, 2.7062e-06, ..., 0.0000e+00,
0.0000e+00, 2.7062e-06],
[8.1923e-01, 5.7346e-01, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00],
[7.0711e-01, 7.0711e-01, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00],
[7.0711e-01, 7.0711e-01, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00]],
[[7.0711e-01, 7.0711e-01, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00],
[7.0714e-01, 7.0708e-01, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00],
[7.0711e-01, 7.0711e-01, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00],
[7.0711e-01, 7.0711e-01, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 7.0407e-06],
[7.0711e-01, 7.0711e-01, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00]],
...,
[[7.0711e-01, 7.0711e-01, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00],
[7.1852e-01, 2.3411e-02, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00],
[7.0775e-01, 7.0646e-01, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 3.9888e-06],
[7.0711e-01, 7.0711e-01, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00],
[7.0711e-01, 7.0711e-01, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00]],
[[5.9611e-01, 5.8796e-01, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00],
[7.0711e-01, 7.0710e-01, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00],
[7.7538e-01, 2.4842e-01, 1.7787e-06, ..., 0.0000e+00,
0.0000e+00, 1.7787e-06],
[7.0711e-01, 7.0711e-01, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00],
[7.0711e-01, 7.0711e-01, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00]],
[[5.2433e-01, 5.2433e-01, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00],
[7.0711e-01, 7.0711e-01, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00],
[1.3155e-01, 1.3155e-01, 0.0000e+00, ..., 8.6691e-02,
9.7871e-01, 0.0000e+00],
[7.4412e-01, 6.6311e-01, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00],
[7.0711e-01, 7.0711e-01, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 9.6093e-07]]])
tensor([[-0.0513],
[-0.0445],
[-0.0514],
[-0.0579],
[-0.0539],
[-0.0323],
[-0.0521],
[-0.0294],
[-0.0372],
[-0.0518],
[-0.0516],
[-0.0501],
[-0.0312],
[-0.0496],
[-0.0436],
[-0.0514],
[-0.0518],
[-0.0465],
[-0.0530],
[-0.0471],
[-0.0344],
[-0.0502],
[-0.0536],
[-0.0594],
[-0.0356],
[-0.0371],
[-0.0513],
[-0.0528],
[-0.0621],
[-0.0404],
[-0.0403],
[-0.0562],
[-0.0510],
[-0.0580],
[-0.0516],
[-0.0556],
[-0.0063],
[-0.0459],
[-0.0494],
[-0.0460],
[-0.0631],
[-0.0525],
[-0.0454],
[-0.0509],
[-0.0522],
[-0.0426],
[-0.0527],
[-0.0423],
[-0.0572],
[-0.0308],
[-0.0452],
[-0.0555],
[-0.0479],
[-0.0513],
[-0.0514],
[-0.0498],
[-0.0514],
[-0.0471],
[-0.0505],
[-0.0467],
[-0.0485],
[-0.0520],
[-0.0517],
[-0.0442]], device='cuda:0', grad_fn=<ViewBackward>)
tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], device='cuda:0')
其中,我的训练数据x
被标准化,如下所示:
# get the inputs
x, y = data
# forward + backward + optimize
outputs = model(x)
loss = criterion(outputs, y)
tensor([[[7.0711e-01, 7.0711e-01, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00],
[2.6164e-02, 2.6164e-02, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 1.3108e-05],
[7.0711e-01, 7.0711e-01, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00],
[9.5062e-01, 3.1036e-01, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00],
[0.0000e+00, 1.3717e-05, 3.2659e-07, ..., 0.0000e+00,
0.0000e+00, 3.2659e-07]],
[[5.1934e-01, 5.4041e-01, 6.8083e-06, ..., 0.0000e+00,
0.0000e+00, 6.8083e-06],
[5.2340e-01, 6.0007e-01, 2.7062e-06, ..., 0.0000e+00,
0.0000e+00, 2.7062e-06],
[8.1923e-01, 5.7346e-01, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00],
[7.0711e-01, 7.0711e-01, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00],
[7.0711e-01, 7.0711e-01, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00]],
[[7.0711e-01, 7.0711e-01, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00],
[7.0714e-01, 7.0708e-01, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00],
[7.0711e-01, 7.0711e-01, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00],
[7.0711e-01, 7.0711e-01, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 7.0407e-06],
[7.0711e-01, 7.0711e-01, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00]],
...,
[[7.0711e-01, 7.0711e-01, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00],
[7.1852e-01, 2.3411e-02, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00],
[7.0775e-01, 7.0646e-01, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 3.9888e-06],
[7.0711e-01, 7.0711e-01, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00],
[7.0711e-01, 7.0711e-01, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00]],
[[5.9611e-01, 5.8796e-01, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00],
[7.0711e-01, 7.0710e-01, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00],
[7.7538e-01, 2.4842e-01, 1.7787e-06, ..., 0.0000e+00,
0.0000e+00, 1.7787e-06],
[7.0711e-01, 7.0711e-01, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00],
[7.0711e-01, 7.0711e-01, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00]],
[[5.2433e-01, 5.2433e-01, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00],
[7.0711e-01, 7.0711e-01, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00],
[1.3155e-01, 1.3155e-01, 0.0000e+00, ..., 8.6691e-02,
9.7871e-01, 0.0000e+00],
[7.4412e-01, 6.6311e-01, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00],
[7.0711e-01, 7.0711e-01, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 9.6093e-07]]])
tensor([[-0.0513],
[-0.0445],
[-0.0514],
[-0.0579],
[-0.0539],
[-0.0323],
[-0.0521],
[-0.0294],
[-0.0372],
[-0.0518],
[-0.0516],
[-0.0501],
[-0.0312],
[-0.0496],
[-0.0436],
[-0.0514],
[-0.0518],
[-0.0465],
[-0.0530],
[-0.0471],
[-0.0344],
[-0.0502],
[-0.0536],
[-0.0594],
[-0.0356],
[-0.0371],
[-0.0513],
[-0.0528],
[-0.0621],
[-0.0404],
[-0.0403],
[-0.0562],
[-0.0510],
[-0.0580],
[-0.0516],
[-0.0556],
[-0.0063],
[-0.0459],
[-0.0494],
[-0.0460],
[-0.0631],
[-0.0525],
[-0.0454],
[-0.0509],
[-0.0522],
[-0.0426],
[-0.0527],
[-0.0423],
[-0.0572],
[-0.0308],
[-0.0452],
[-0.0555],
[-0.0479],
[-0.0513],
[-0.0514],
[-0.0498],
[-0.0514],
[-0.0471],
[-0.0505],
[-0.0467],
[-0.0485],
[-0.0520],
[-0.0517],
[-0.0442]], device='cuda:0', grad_fn=<ViewBackward>)
tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], device='cuda:0')
而传递给标准函数的典型输出
和y
如下所示:
# get the inputs
x, y = data
# forward + backward + optimize
outputs = model(x)
loss = criterion(outputs, y)
tensor([[[7.0711e-01, 7.0711e-01, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00],
[2.6164e-02, 2.6164e-02, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 1.3108e-05],
[7.0711e-01, 7.0711e-01, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00],
[9.5062e-01, 3.1036e-01, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00],
[0.0000e+00, 1.3717e-05, 3.2659e-07, ..., 0.0000e+00,
0.0000e+00, 3.2659e-07]],
[[5.1934e-01, 5.4041e-01, 6.8083e-06, ..., 0.0000e+00,
0.0000e+00, 6.8083e-06],
[5.2340e-01, 6.0007e-01, 2.7062e-06, ..., 0.0000e+00,
0.0000e+00, 2.7062e-06],
[8.1923e-01, 5.7346e-01, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00],
[7.0711e-01, 7.0711e-01, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00],
[7.0711e-01, 7.0711e-01, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00]],
[[7.0711e-01, 7.0711e-01, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00],
[7.0714e-01, 7.0708e-01, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00],
[7.0711e-01, 7.0711e-01, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00],
[7.0711e-01, 7.0711e-01, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 7.0407e-06],
[7.0711e-01, 7.0711e-01, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00]],
...,
[[7.0711e-01, 7.0711e-01, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00],
[7.1852e-01, 2.3411e-02, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00],
[7.0775e-01, 7.0646e-01, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 3.9888e-06],
[7.0711e-01, 7.0711e-01, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00],
[7.0711e-01, 7.0711e-01, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00]],
[[5.9611e-01, 5.8796e-01, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00],
[7.0711e-01, 7.0710e-01, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00],
[7.7538e-01, 2.4842e-01, 1.7787e-06, ..., 0.0000e+00,
0.0000e+00, 1.7787e-06],
[7.0711e-01, 7.0711e-01, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00],
[7.0711e-01, 7.0711e-01, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00]],
[[5.2433e-01, 5.2433e-01, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00],
[7.0711e-01, 7.0711e-01, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00],
[1.3155e-01, 1.3155e-01, 0.0000e+00, ..., 8.6691e-02,
9.7871e-01, 0.0000e+00],
[7.4412e-01, 6.6311e-01, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00],
[7.0711e-01, 7.0711e-01, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 9.6093e-07]]])
tensor([[-0.0513],
[-0.0445],
[-0.0514],
[-0.0579],
[-0.0539],
[-0.0323],
[-0.0521],
[-0.0294],
[-0.0372],
[-0.0518],
[-0.0516],
[-0.0501],
[-0.0312],
[-0.0496],
[-0.0436],
[-0.0514],
[-0.0518],
[-0.0465],
[-0.0530],
[-0.0471],
[-0.0344],
[-0.0502],
[-0.0536],
[-0.0594],
[-0.0356],
[-0.0371],
[-0.0513],
[-0.0528],
[-0.0621],
[-0.0404],
[-0.0403],
[-0.0562],
[-0.0510],
[-0.0580],
[-0.0516],
[-0.0556],
[-0.0063],
[-0.0459],
[-0.0494],
[-0.0460],
[-0.0631],
[-0.0525],
[-0.0454],
[-0.0509],
[-0.0522],
[-0.0426],
[-0.0527],
[-0.0423],
[-0.0572],
[-0.0308],
[-0.0452],
[-0.0555],
[-0.0479],
[-0.0513],
[-0.0514],
[-0.0498],
[-0.0514],
[-0.0471],
[-0.0505],
[-0.0467],
[-0.0485],
[-0.0520],
[-0.0517],
[-0.0442]], device='cuda:0', grad_fn=<ViewBackward>)
tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], device='cuda:0')
张量([-0.0513],
[-0.0445],
[-0.0514],
[-0.0579],
[-0.0539],
[-0.0323],
[-0.0521],
[-0.0294],
[-0.0372],
[-0.0518],
[-0.0516],
[-0.0501],
[-0.0312],
[-0.0496],
[-0.0436],
[-0.0514],
[-0.0518],
[-0.0465],
[-0.0530],
[-0.0471],
[-0.0344],
[-0.0502],
[-0.0536],
[-0.0594],
[-0.0356],
[-0.0371],
[-0.0513],
[-0.0528],
[-0.0621],
[-0.0404],
[-0.0403],
[-0.0562],
[-0.0510],
[-0.0580],
[-0.0516],
[-0.0556],
[-0.0063],
[-0.0459],
[-0.0494],
[-0.0460],
[-0.0631],
[-0.0525],
[-0.0454],
[-0.0509],
[-0.0522],
[-0.0426],
[-0.0527],
[-0.0423],
[-0.0572],
[-0.0308],
[-0.0452],
[-0.0555],
[-0.0479],
[-0.0513],
[-0.0514],
[-0.0498],
[-0.0514],
[-0.0471],
[-0.0505],
[-0.0467],
[-0.0485],
[-0.0520],
[-0.0517],
[-0.0442]],device='cuda:0',grad_fn=)
张量([0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0,
0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0],device='cuda:0')
当应用该标准时,我得到以下错误(在CUDA_LAUNCH_BLOCKING=1的情况下运行):
/opt/conda/conda bld/pytorch\u 1549628766161/work/aten/src/THCUNN/ClassNLLCriterion.cu:105:void-cunn\u ClassNLLCriterion\u updateOutput\u内核(Dtype*,Dtype*,Dtype*,long*,Dtype*,int,int,int,int,long)[Dtype=float,Acctype=float]:块:[0,0,0],线程:[7,0,0]断言`t>=0&&t=0&&t
我的模型输出负值的事实导致了上述错误消息,我如何解决此问题?TL;DR
您有两个选择:
输出的第二维度的大小为2而不是1
nn.BCEWithLogitsLoss
代替nn.CrossEntropyLoss
我认为问题不在于负数。它是
输出的形状
查看您的数组y
,我发现您有两个不同的类(可能更多,但假设是2)。这意味着输出的最后一个维度应该是2。原因是,输出
需要为两个不同类别中的每一个给出“分数”(请参阅)。分数可以是负值、零或正值。但是输出的形状是[64,1]
,而不是所需的[64,2]
nn.CrossEntropyLoss()
对象的步骤之一是将这些分数转换为两个类的概率分布。这是使用softmax操作完成的。然而,当进行二元分类时(即,如我们目前的情况,仅使用2个类进行分类),还有另一种选择:仅为一个类给出分数,使用sigmoid函数将其转换为该类的概率,然后在此基础上执行“1-p”,以获得另一个类的概率。此选项意味着,输出
只需要对两个类中的一个类进行评分,就像您当前的情况一样。要选择此选项,您需要使用nn.BCEWithLogitsLoss
更改nn.CrossEntropyLoss
。然后,您可以像当前所做的那样将outputs
和y
传递给它(但是请注意,outputs
的形状需要正好是y
的形状,因此在您的示例中,您需要传递outputs[:,0]
而不是输出
。您还需要将y
转换为浮点:y.float()
。因此,调用是标准(输出[:,0],y.float())
)我需要在两个类中进行选择,True(1)和Zero(0),尽管您所说的非常有意义。