在pytorch中使用多cuda（Dataparallel）时大小不匹配_Pytorch

在pytorch中使用多cuda（Dataparallel）时大小不匹配

pytorch

在pytorch中使用多cuda（Dataparallel）时大小不匹配,pytorch,Pytorch,我用的是这样的定制模型 class SimpleNN(nn.Module): def __init__(self, vectors_size, features_size, hidden_size=15, dropout_rate=0.1): super(SimpleNN, self).__init__() self.vectors_size = vectors_size self.features_size = feat

我用的是这样的定制模型

class SimpleNN(nn.Module):
    def __init__(self, vectors_size, features_size, hidden_size=15, dropout_rate=0.1):
        super(SimpleNN, self).__init__()
        
        self.vectors_size = vectors_size
        self.features_size = features_size
        self.hidden_size = hidden_size
        self.dropout_rate = dropout_rate
        
        self.vectors_hidden = nn.Sequential(
            nn.Dropout(self.dropout_rate),
            nn.Linear(vectors_size, vectors_size//2),
            nn.Tanh(),
            nn.Linear(vectors_size//2, features_size),
            nn.Tanh()
        )
        self.hidden = nn.Sequential(
            nn.Linear(features_size*2, hidden_size),
            nn.ReLU(),
        )
         
        self.output = nn.Linear(hidden_size, 2)
        
    def forward(self, pairs, features):
        """
        features: (n_samples, features_size)
        """
        vectors = pairs2vectors(train_pub, pairs).to(device)
        embedding_features = self.vectors_hidden(vectors)
        combined_features = torch.cat([features, embedding_features], dim=1)
        return self.output(self.hidden(combined_features))

当我只使用一个cuda时，这个模型工作得很好，但是在像下面这样使用“DataParallel”之后，它总是告诉我

特征的大小

和

嵌入的特征

不匹配，我发现特征的n_样本形状不像另一批数据一样符合我的期望，我不知道为什么以及如何解决这个问题

    if torch.cuda.device_count() > 1:
            print("Let's use", torch.cuda.device_count(), "GPUs!")
            model = nn.DataParallel(model)

顺便说一句，这是错误消息的图片实际上，对于正向方法中的参数

对

和

特征

，它们的大小是

（批次大小，对）

和

（批次大小，特征大小）

，在my

train

函数中，代码如下：

for batch_pairs, batch_data, batch_labels in tqdm(batch_iter(train_pairs, train_data, train_labels, train_batch_size), desc='Epoch'):
            train_iter += 1
            
            optimizer.zero_grad()
            
            batch_size = len(data)
            cum_examples += batch_size
            pred_labels = model(batch_pairs, batch_data)
            
            # Loss use torch.nn.CrossEntropyLoss
            loss_func = torch.nn.CrossEntropyLoss(weight=weight)
            loss = loss_func(pred_labels, batch_labels)
            
            # Backpropagation
            loss.backward()
            
            # Gradient clip
            grad_norm = torch.nn.utils.clip_grad_norm_(model.parameters(), clip_grad)
            
            # Update gradient
            optimizer.step()

我想我的

batch\u对

和

batch\u功能

已经具备了批量大小，是吗

这是

torch.nn.DataParallel

，它说我的输入将被分块成几个部分来处理，在我的情况下，似乎只有

特性_size

参数被分块，但

对

不是，也许它不是

torch.Tensor

实例？我会有更多的尝试，我认为这种机制是不合理的

我尝试将对转换为外部向量，并将其传递给

forward

方法，但仍然不起作用，问题是因为批处理维度没有传递到输入数据中。如果是,

 nn.DataParallel

可能在错误的维度上分裂。你也提到过

features: (n_samples, features_size)

这意味着批量大小不会传递到输入中。请在您的数据中添加批次维度。对于批量大小为1的情况，输入形状应为

[1，功能]

。因此，对于您的情况，它将是

[1，n\u样本，功能\u大小]

希望这能对您有所帮助。

检查一下，问题是因为批处理维度没有传递到输入数据中。如果是,

 nn.DataParallel

可能在错误的维度上分裂。你也提到过

features: (n_samples, features_size)

这意味着批量大小不会传递到输入中。请在您的数据中添加批次维度。对于批量大小为1的情况，输入形状应为

[1，功能]

。因此，对于您的情况，它将是

[1，n\u样本，功能\u大小]

希望这能对您有所帮助。

这个问题是关于PyTorch中的一个错误。它与CUDA编程无关，不应被标记为CUDA编程。请不要重新添加itOk，我怀疑这个问题可能会因为cuda而发生，所以我添加了cuda标签。。sry，谢谢你提醒我这个问题是关于PyTorch中的一个错误。它与CUDA编程无关，不应被标记为CUDA编程。请不要重新添加itOk，我怀疑这个问题可能会因为cuda而发生，所以我添加了cuda标签。。sry，谢谢提醒可能我表达错误，

n\u样本

是批量大小，我会在我的帖子中添加更多信息。感谢可能我表达错误，

n\u样本

是批量大小，我会在我的帖子中添加更多信息。谢谢