Java 如何基于给定的.proto编写有效的解码文件，从.pb读取_Java_Protocol Buffers

Java 如何基于给定的.proto编写有效的解码文件，从.pb读取

java protocol-buffers

Java 如何基于给定的.proto编写有效的解码文件，从.pb读取,java,protocol-buffers,Java,Protocol Buffers,基于对这个问题的回答，我认为我已经为我的.pb文件提供了一个“错误的解码器” 根据中提供的示例，我尝试编写类似于开始分离数据的内容，我编写了以下内容： import cc.refectorie.proj.relation.protobuf.DocumentProtos.Document; import cc.refectorie.proj.relation.protobuf.DocumentProtos.Document.Sentence; import java.io.FileInput

基于对这个问题的回答，我认为我已经为我的.pb文件提供了一个“错误的解码器”

根据中提供的示例，我尝试编写类似于开始分离数据的内容，我编写了以下内容：

import cc.refectorie.proj.relation.protobuf.DocumentProtos.Document;
import cc.refectorie.proj.relation.protobuf.DocumentProtos.Document.Sentence;

import java.io.FileInputStream;
import java.io.IOException;
import java.io.PrintStream;


public class ListDocument
{
    // Iterates though all people in the AddressBook and prints info about them.
    static void Print(Document document)
    {
        for ( Sentence sentence: document.getSentencesList() )
        {
            for(int i=0; i < sentence.getTokensCount(); i++)
            {
                System.out.println(" getTokens(" + i + ": " + sentence.getTokens(i) );
            }
        }
    }

    // Main function:  Reads the entire address book from a file and prints all
    //   the information inside.
    public static void main(String[] args) throws Exception {
        if (args.length != 1) {
            System.err.println("Usage:  ListPeople ADDRESS_BOOK_FILE");
            System.exit(-1);
        }

        // Read the existing address book.
        Document addressBook =
                Document.parseFrom(new FileInputStream(args[0]));

        Print(addressBook);
    }
}

所以，正如我上面所说的，我认为这与我没有正确定义解码器有关。有没有什么方法可以查看我试图使用的.proto文件，并找到一种读取所有数据的方法

有没有办法看看这个.proto文件，看看我做错了什么

以下是我要读取的文件的前几行：

Ü
&/guid/9202a8c04000641f8000000003221072&/guid/9202a8c04000641f80000000004cfd50NA"Ö

S/m/vinci8/data1/riedel/projects/relation/kb/nyt1/docstore/2007-joint/1850511.xml.pb„€€€øÿÿÿÿƒ€€€øÿÿÿÿ"PERSON->PERSON"'inverse_false|PERSON|on bass and|PERSON"/inverse_false|with|PERSON|on bass and|PERSON|on"7inverse_false|, with|PERSON|on bass and|PERSON|on drums"$inverse_false|PERSON|IN NN CC|PERSON",inverse_false|with|PERSON|IN NN CC|PERSON|on"4inverse_false|, with|PERSON|IN NN CC|PERSON|on drums"`str:Dave[NMOD]->|PERSON|[PMOD]->with[ADV]->was[ROOT]<-on[PRD]<-bass[PMOD]<-|PERSON|[NMOD]->Barry"]str:Dave[NMOD]->|PERSON|[PMOD]->with[ADV]->was[ROOT]<-on[PRD]<-bass[PMOD]<-|PERSON|[NMOD]->on"Rstr:Dave[NMOD]->|PERSON|[PMOD]->with[ADV]->was[ROOT]<-on[PRD]<-bass[PMOD]<-|PERSON"Adep:[NMOD]->|PERSON|[PMOD]->[ADV]->[ROOT]<-[PRD]<-[PMOD]<-|PERSON"dir:->|PERSON|->-><-<-<-|PERSON"Sstr:PERSON|[PMOD]->with[ADV]->was[ROOT]<-on[PRD]<-bass[PMOD]<-|PERSON|[NMOD]->Barry"Adep:PERSON|[PMOD]->[ADV]->[ROOT]<-[PRD]<-[PMOD]<-|PERSON|[NMOD]->"dir:PERSON|->-><-<-<-|PERSON|->"Pstr:PERSON|[PMOD]->with[ADV]->was[ROOT]<-on[PRD]<-bass[PMOD]<-|PERSON|[NMOD]->on"Adep:PERSON|[PMOD]->[ADV]->[ROOT]<-[PRD]<-[PMOD]<-|PERSON|[NMOD]->"dir:PERSON|->-><-<-<-|PERSON|->"Estr:PERSON|[PMOD]->with[ADV]->was[ROOT]<-on[PRD]<-bass[PMOD]<-|PERSON*ŒThe occasion was suitably exceptional : a reunion of the 1970s-era Sam Rivers Trio , with Dave Holland on bass and Barry Altschul on drums ."¬
S/m/vinci8/data1/riedel/projects/relation/kb/nyt1/docstore/2007-joint/1849689.xml.pb†€€€øÿÿÿÿ…€€€øÿÿÿÿ"PERSON->PERSON"'inverse_false|PERSON|on bass and|PERSON"/inverse_false|with|PERSON|on bass and|PERSON|on"7inverse_false|, with|PERSON|on bass and|PERSON|on drums"$inverse_false|PERSON|IN NN CC|PERSON",inverse_false|with|PERSON|IN NN CC|PERSON|on"4inverse_false|, with|PERSON|IN NN CC|PERSON|on drums"cstr:Dave[NMOD]->|PERSON|[PMOD]->with[NMOD]->Trio[NULL]<-on[NMOD]<-bass[PMOD]<-|PERSON|[NMOD]->Barry"`str:Dave[NMOD]->|PERSON|[PMOD]->with[NMOD]->Trio[NULL]<-on[NMOD]<-bass[PMOD]<-|PERSON|[NMOD]->on"Ustr:Dave[NMOD]->|PERSON|[PMOD]->with[NMOD]->Trio[NULL]<-on[NMOD]<-bass[PMOD]<-|PERSON"Cdep:[NMOD]->|PERSON|[PMOD]->[NMOD]->[NULL]<-[NMOD]<-[PMOD]<-|PERSON"dir:->|PERSON|->-><-<-<-|PERSON"Vstr:PERSON|[PMOD]->with[NMOD]->Trio[NULL]<-on[NMOD]<-bass[PMOD]<-|PERSON|[NMOD]->Barry"Cdep:PERSON|[PMOD]->[NMOD]->[NULL]<-[NMOD]<-[PMOD]<-|PERSON|[NMOD]->"dir:PERSON|->-><-<-<-|PERSON|->"Sstr:PERSON|[PMOD]->with[NMOD]->Trio[NULL]<-on[NMOD]<-bass[PMOD]<-|PERSON|[NMOD]->on"Cdep:PERSON|[PMOD]->[NMOD]->[NULL]<-[NMOD]<-[PMOD]<-|PERSON|[NMOD]->"dir:PERSON|->-><-<-<-|PERSON|->"Hstr:PERSON|[PMOD]->with[NMOD]->Trio[NULL]<-on[NMOD]<-bass[PMOD]<-|PERSON*ÊTonight he brings his energies and expertise to the Miller Theater for the festival 's thrilling finale : a reunion of the 1970s Sam Rivers Trio , with Dave Holland on bass and Barry Altschul on drums .â
&/guid/9202a8c04000641f80000000004cfd50&/guid/9202a8c04000641f8000000003221072NA"Ù

Ü
&/guid/9202A8C04000641F80000003221072和/guid/9202A8C04000641F80000004CFD50NA“Ö
1/m/ViVinci8/m/ViViViViViVici8/数据1/5/数据1/数据1/////m/m/ViViViVici8/数据1//数据1///////维维维维吉尼亚8/数据1/数据1/数据1/数据1/瑞代尔/里德尔/里德尔/项目/项目/项目/项目/数据1/1/1/1/1/1/1/1/1/1/4/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/5 5 5 5 5 5 5/4/5 5 5 5 5 5/4/4/4/1/5|在NN CC |个人“，在NN CC | PERSON |上用| PERSON |在NN CC | PERSON |在“4reverse | false |上用| PERSON |在NN CC | PERSON |上用| PERSON |在鼓上”`str；这里的混乱有两点：

根对象是关系
，而不是文档
（实际上，甚至只使用关系
和关系引用
）
pb文件实际上是多个对象，每个变量都是分隔的，即以变量表示的长度作为前缀

因此，Relation.parseDelimitedFrom
应该可以工作，我得到：

老的；过时的；探索性：
我提取了您的4个文档，并通过一个小测试平台运行它们：
        ProcessFile("testNegative.pb");
        ProcessFile("testPositive.pb");
        ProcessFile("trainNegative.pb");
        ProcessFile("trainPositive.pb");

其中ProcessFile
首先将前10个字节转储为十六进制，然后尝试通过ProtoReader
对其进行处理。结果如下：
Processing: testNegative.pb
dc 16 0a 26 2f 67 75 69 64 2f
> Document
Unexpected end-group in source data; this usually means the source data is corru
pt

是的；同意；DC为导线类型4（端组），字段27；您的文档没有定义字段27，即使它定义了字段27：以结束组开始也是毫无意义的
Processing: testPositive.pb
d5 0f 0a 26 2f 67 75 69 64 2f
> Document
250: Fixed32, Unexpected field
14: Fixed32, Unexpected field
6: String, Unexpected field
6: Variant, Unexpected field
Unexpected end-group in source data; this usually means the source data is corru
pt

在这里，我们无法在十六进制转储中看到有问题的数据，但同样：这些初始字段看起来与您的数据完全不同，并且读取器很容易确认数据已损坏
Processing: trainNegative.pb
d1 09 0a 26 2f 67 75 69 64 2f
> Document
154: Fixed64, Unexpected field
7: Fixed64, Unexpected field
6: Variant, Unexpected field
6: Variant, Unexpected field
Unexpected end-group in source data; this usually means the source data is corru
pt

同上
Processing: trainPositive.pb
cf 75 0a 26 2f 67 75 69 64 2f
> Document
1881: 7, Unexpected field
Invalid wire-type; this usually means you have over-written a file without trunc
ating or setting the length; see http://stackoverflow.com/q/2152978/23354

CF 75是一个两字节变量，导线类型为7（规范中未定义）
你的数据完全是垃圾。对不起

通过评论中的test-multiple.pb奖励回合（gz解压后）：
这与testNegative.pb相同，因此由于完全相同的原因而失败。
我知道这已经超过两年了，但这里我提供了一种在python中读取此分隔协议缓冲区的通用方法。您提到的函数：parseDelimitedFrom
，在协议缓冲区的python实现中不可用。但对于可能需要它的人来说，这里有一个小的解决办法。此代码是对以下代码的改编：
是什么生成了你要传递的文件？你能提供一个小文件来说明这个问题吗？您链接到的tgz包含4个单独的文件-哪一个是问题的根源？（除了错误消息和方法名之外，您得到的代码看起来很好。）当您说“what generated”时，您指的是像示例中的AddPerson.java文件这样的文件，不是吗？我不得不说我真的不知道，因为这是一篇研究论文的数据，我正试图复制这篇论文的结果。我刚刚发布了我试图读取的.pb文件的文本，不管怎样，它的前几行。不，我指的是你传递的文件。原型数据文件不是文本，而是二进制数据。。。但是如果您不知道是什么生成了它，就很难知道代码是否正确。（您正试图将整个文件解析为单个文档
对象……这是您所期望的吗？）是的，.pb文件就是我要传递的文件，这就是我在那里发布的，它的前几行。生成它的文件相当于文档中的AddPerson.java文件，但我没有这个文件。但是我想我或多或少能够根据这个.proto文件来判断其中的内容，不是吗？你想处理4个文件中的哪一个？testNegative.pb？testPositive.pb？trainNegative.pb？阳性。pb？嗯，这很有趣，可能会引起头痛。不过，非常感谢。您是否也可以检查一下？@S.Matthew_English sure；正在下载now@S.Matthew_English顺便提一下我还试着通过gz解压运行其他文件，以防它们的命名混乱，但“幻数”是错误的，因此它们不是gz数据。有人检查过数据是否可能是分隔格式的，即前缀为不同大小的消息流吗@请尝试使用parseDelimitedFrom（）
而不是parseFrom（）
。这只是一个猜测，但它很常见。如果文件名
值类似于/guid/9202A8C04000641F80000003221072，那么@KentonVarda关于变量长度前缀是正确的；我的问题越来越深，但看。。。
Processing: testPositive.pb
d5 0f 0a 26 2f 67 75 69 64 2f
> Document
250: Fixed32, Unexpected field
14: Fixed32, Unexpected field
6: String, Unexpected field
6: Variant, Unexpected field
Unexpected end-group in source data; this usually means the source data is corru
pt

Processing: trainNegative.pb
d1 09 0a 26 2f 67 75 69 64 2f
> Document
154: Fixed64, Unexpected field
7: Fixed64, Unexpected field
6: Variant, Unexpected field
6: Variant, Unexpected field
Unexpected end-group in source data; this usually means the source data is corru
pt

Processing: trainPositive.pb
cf 75 0a 26 2f 67 75 69 64 2f
> Document
1881: 7, Unexpected field
Invalid wire-type; this usually means you have over-written a file without trunc
ating or setting the length; see http://stackoverflow.com/q/2152978/23354

Processing: test-multiple.pb
dc 16 0a 26 2f 67 75 69 64 2f
> Document
Unexpected end-group in source data; this usually means the source data is corru
pt

def read_serveral_pbfs(filename, class_of_pb):
result = []
with open(filename, 'rb') as f:
    buf = f.read()
    n = 0
    while n < len(buf):
        msg_len, new_pos = _DecodeVarint32(buf, n)
        n = new_pos
        msg_buf = buf[n:n+msg_len]
        n += msg_len
        read_data = class_of_pb()
        read_data.ParseFromString(msg_buf)
        result.append(read_data)

return result

import Document_pb2
from google.protobuf.internal.encoder import _VarintBytes
from google.protobuf.internal.decoder import _DecodeVarint32
filename = "trainPositive.pb"
relations = read_serveral_pbfs(filename,Document_pb2.Relation)