Python 使用多个GPU运行LSTM获取；“输入张量和隐藏张量不在同一设备上”；_Python_Gpu_Lstm_Pytorch

Python 使用多个GPU运行LSTM获取；“输入张量和隐藏张量不在同一设备上”；

python pytorch

Python 使用多个GPU运行LSTM获取；“输入张量和隐藏张量不在同一设备上”；,python,gpu,lstm,pytorch,Python,Gpu,Lstm,Pytorch,我正在尝试在pytorch中训练LSTM层。我使用的是4个GPU。初始化时，我添加了.cuda（）函数，将隐藏层移动到GPU。但是，当我使用多个GPU运行代码时，会出现以下运行时错误： RuntimeError: Input and hidden tensors are not at the same device 我试图通过在forward函数中使用.cuda（）函数来解决此问题，如下所示： self.hidden = (self.hidden[0].type(torch.FloatTens

我正在尝试在pytorch中训练LSTM层。我使用的是4个GPU。初始化时，我添加了.cuda（）函数，将隐藏层移动到GPU。但是，当我使用多个GPU运行代码时，会出现以下运行时错误：

RuntimeError: Input and hidden tensors are not at the same device

我试图通过在forward函数中使用.cuda（）函数来解决此问题，如下所示：

self.hidden = (self.hidden[0].type(torch.FloatTensor).cuda(), self.hidden[1].type(torch.FloatTensor).cuda())

这一行似乎解决了这个问题，但它引起了我的关注，如果更新后的隐藏层出现在不同的GPU中。我是否应该在批处理的forward函数结束时将向量移回cpu，或者是否有其他方法来解决此问题。

当您在张量上调用

.cuda（）

时，Pytorch默认情况下会将其移到cpu（GPU-0）。因此，由于数据并行性，当您的模型转到另一个GPU时，您的数据位于不同的GPU中，这将导致您面临的运行时错误

为递归神经网络实现数据并行的正确方法如下：

from torch.nn.utils.rnn import pack_padded_sequence, pad_packed_sequence

class MyModule(nn.Module):
    # ... __init__, other methods, etc.

    # padded_input is of shape [B x T x *] (batch_first mode) and contains
    # the sequences sorted by lengths
    #   B is the batch size
    #   T is max sequence length
    def forward(self, padded_input, input_lengths):
        total_length = padded_input.size(1)  # get the max sequence length
        packed_input = pack_padded_sequence(padded_input, input_lengths,
                                            batch_first=True)
        packed_output, _ = self.my_lstm(packed_input)
        output, _ = pad_packed_sequence(packed_output, batch_first=True,
                                        total_length=total_length)
        return output

m = MyModule().cuda()
dp_m = nn.DataParallel(m)

对于多GPU设置，您还需要相应地设置

CUDA\u VISIBLE\u DEVICES

环境变量

参考资料：

首先，您是如何在多个GPU上运行它的？你在使用吗？是的，在初始化模型后，我运行这一行：model=torch.nn.DataParallel（model），我在运行代码时设置CUDA\u VISIBLE\u设备。你解决了这个问题吗？谢谢