C++ 在一个协议缓冲区二进制文件中存储多条消息_C++_Python_Protocol Buffers

C++ 在一个协议缓冲区二进制文件中存储多条消息

c++ python protocol-buffers

C++ 在一个协议缓冲区二进制文件中存储多条消息,c++,python,protocol-buffers,C++,Python,Protocol Buffers,我想将重复的消息存储在一个文件中。目前，我必须将此重复消息包装在另一条消息中。有办法解决这个问题吗 package foo; message Box { required int32 tl_x = 1; required int32 tl_y = 2; required int32 w = 3; required int32 h = 4; } message Boxes { repeated Box boxes = 1; } 以下是协议缓冲区文档中关于重复消息的部分：

我想将重复的消息存储在一个文件中。目前，我必须将此重复消息包装在另一条消息中。有办法解决这个问题吗

package foo;

message Box {
  required int32 tl_x = 1;
  required int32 tl_y = 2;
  required int32 w = 3;
  required int32 h = 4;
}

message Boxes {
  repeated Box boxes = 1;
}

以下是协议缓冲区文档中关于重复消息的部分：

如果你想写多条信息对于单个文件或流，它是向上的给你，让你知道我在哪里消息结束，下一个开始。这个协议缓冲区连线格式不正确自定界，so协议缓冲区解析器无法确定消息以自己的方式结束。最容易解决这个问题的方法是写您面前每条消息的大小写下信息本身。当你把信息读回来，你读了吗大小，然后将字节读入分离缓冲区，然后从中解析缓冲器（如果要避免复制字节到一个单独的缓冲区，签出 CodedInputStream类（在可以告诉的C++和java语言将读取限制为一定数量的字节。）

在C++和java中也有一种传统的实现方法。查看此堆栈溢出线程以了解详细信息：

Protobuf不支持此功能。它可以用于仅序列化一条消息，但此序列化消息不包含有关其类型（一个或多个框）和长度的信息。所以，若要存储多条消息，还必须包括消息的类型和长度。编写算法（伪语言）可能如下所示：

for every message {
    write(type_of_message) // 1 byte long
    write(length_of_serialized_message) // 4 bytes long
    write(serialized_message)
}

加载算法：

while(end_of_file) {

    type = read(1) // 1 byte
    length = read(4) // 4 bytes
    buffer = read(length)
    switch (type) {
      case 1:
         deserialise_message_1(buffer)
      case 2:
         deserialise_message_2(buffer)
    }
}

在java中，可以使用分隔消息。对于C++，请参阅< /P> 基本上在C++中根据上述

const unsigned bufLength = 256;
unsigned char buffer[bufLength];
Message protoMessage;

google::protobuf::io::ArrayOutputStream arrayOutput(buffer, bufLength);
google::protobuf::io::CodedOutputStream codedOutput(&arrayOutput);

codedOutput.WriteLittleEndian32(protoMessage.ByteSize());
protoMessage.SerializeToCodedStream(&codedOutput);

python你需要解决它

我只是在解决这个问题，最后还是用了。Parquet非常适合将一组Protobuf消息存储在文件中，并使以后使用它们更加容易

这段代码将创建拼花地板文件：

Path path = new Path("/tmp/mydata.parq");
CompressionCodecName codecName = CompressionCodecName.SNAPPY;
int blockSize = 134217728;
int pageSize = 1048576;
boolean enableDictionary = true;
boolean validating = false;

ProtoParquetWriter<Message> writer
    = new ProtoParquetWriter<>(
        path,
        Box.class,
        codecName,
        blockSize,
        pageSize,
        enableDictionary,
        validating
    );

for (Message message : messages) {
    writer.write(message);
}

writer.close();

路径路径=新路径（“/tmp/mydata.parq”）； CompressionCodecName codecName=CompressionCodecName.SNAPPY； int blockSize=134217728； int pageSize=1048576；布尔启用字典=真；布尔验证=假；原型拼花编剧 =新的原型镶花匠( 路径 Box.class，编解码器名称，块大小，页面大小，启用字典，验证 ); 用于（消息：消息）{ 写（信息）； } writer.close（）；它可能不适合您的用例，但我认为在这里值得一提