Pytorch Python和LibTorch C+之间的输出不一致+；为iOS导出时_Pytorch_Objective C++_Libtorch

Pytorch Python和LibTorch C+之间的输出不一致+；为iOS导出时

pytorch

Pytorch Python和LibTorch C+之间的输出不一致+；为iOS导出时,pytorch,objective-c++,libtorch,Pytorch,Objective C++,Libtorch,我已经为我的数据训练了HuggingFace-RoBERTa模型（这是一个非常特殊的用法——因此是小模型/词汇表！），并在Python上成功地进行了测试。我将跟踪模型导出到iOS的LibTorch，但设备上的预测结果与Python中的结果不匹配（给出了不同的argmax令牌索引）。我的转换脚本： # torch = 1.5.0 # transformers = 3.2.0 config = RobertaConfig( vocab_size=858, max_position_

我已经为我的数据训练了HuggingFace-RoBERTa模型（这是一个非常特殊的用法——因此是小模型/词汇表！），并在Python上成功地进行了测试。我将跟踪模型导出到iOS的LibTorch，但设备上的预测结果与Python中的结果不匹配（给出了不同的argmax令牌索引）。我的转换脚本：

# torch = 1.5.0
# transformers = 3.2.0

config = RobertaConfig(
    vocab_size=858,
    max_position_embeddings=258,
    num_attention_heads=6,
    num_hidden_layers=4,
    type_vocab_size=1,
    torchscript=True,
)

model = RobertaForMaskedLM(config=config).from_pretrained('./trained_RoBERTa')
model.cpu()
model.eval()

example_input = torch.LongTensor(1, 256).random_(0, 857).cpu()
traced_model = torch.jit.trace(model, example_input)
traced_model.save('./exports/trained_RoBERTa.pt')

我曾经在另一个（vision）模型上遇到过问题，我在Python+GPU中进行了培训，并将其转换为iOS版的LibTorch，通过在我的转换脚本中将

map_location={'cuda:0'：'cpu'}

添加到

torch.load（）

调用中来解决。因此，我想知道：1）这在这种情况下作为一种可能的解释是否有意义？2）在使用

.from\u pretrained（）

语法加载时，如何添加

映射位置

选项

万一我的Obj-C++对预测结果的处理是错误的，下面是在设备上运行的Obj-C++代码：

- (NSArray<NSArray<NSNumber*>*>*)predictText:(NSArray<NSNumber*>*)tokenIDs {
    try {
        long count = tokenIDs.count;
        long* buffer = new long[count];
        for(int i=0; i < count;  i++) {
            buffer[i] = tokenIDs[i].intValue;
        }
        at::Tensor tensor = torch::from_blob(buffer, {1, (int64_t)count}, at::kLong);
        torch::autograd::AutoGradMode guard(false);
        at::AutoNonVariableTypeMode non_var_type_mode(true);
        auto outputTuple = _impl.forward({tensor}).toTuple();

        auto outputTensor = outputTuple->elements()[0].toTensor();
        auto sizes = outputTensor.sizes();
        // len will be tokens * vocab size -- sizes[1] * sizes[2] (sizes[0] is batch_size = 1)
        auto positions = sizes[1];
        auto tokens = sizes[2];
        float* floatBuffer = outputTensor.data_ptr<float>();
        if (!floatBuffer) {
            return nil;
        }
        // MARK: This is probably a slow way to create this 2D NSArray
        NSMutableArray* results = [[NSMutableArray alloc] initWithCapacity: positions];
        for (int i = 0; i < positions; i++) {
            NSMutableArray* weights = [[NSMutableArray alloc] initWithCapacity: tokens];
            for (int j = 0; j < tokens; j++) {
                [weights addObject:@(floatBuffer[i*positions + j])];
            }
            [results addObject:weights];
        }
        return [results copy];
    } catch (const std::exception& exception) {
        NSLog(@"%s", exception.what());
    }
    return nil;
}

这让我觉得iOS Obj-C++的执行有些问题。加载跟踪模型/导出的代码确实在模型上调用了

.eval（）

，顺便说一句（我意识到这可能是对不同输出的一种解释）：

更新3:Uhhhmmm。。。这绝对是一个脸掌心的时刻（在浪费了一个周末之后）。。。我决定从Obj-C返回一个平面NSArray，并用Swift进行2D数组重塑，除了移动一个标记（我认为它只是[CLS]），输出现在是正确的。我想我的Obj-C真的那么生锈了。遗憾的是，我仍然看不到这个问题，但它现在起作用了，所以我要投降

traced_test = traced_model(input)
pred = torch.argmax(traced_test[0], dim=2).squeeze(0)
pred_str = tokenizer.decode(pred[1:-1].tolist())
print(pred_str)

- (nullable instancetype)initWithFileAtPath:(NSString*)filePath {
    self = [super init];
    if (self) {
        try {
            auto qengines = at::globalContext().supportedQEngines();
            if (std::find(qengines.begin(), qengines.end(), at::QEngine::QNNPACK) != qengines.end()) {
                at::globalContext().setQEngine(at::QEngine::QNNPACK);
            }
            _impl = torch::jit::load(filePath.UTF8String);
            _impl.eval();
        } catch (const std::exception& exception) {
            NSLog(@"%s", exception.what());
            return nil;
        }
    }
    return self;
}