Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/powershell/11.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 写入无效的avro文件:长度为负:-40_Python_Python 2.7_Avro_Avro Tools - Fatal编程技术网

Python 写入无效的avro文件:长度为负:-40

Python 写入无效的avro文件:长度为负:-40,python,python-2.7,avro,avro-tools,Python,Python 2.7,Avro,Avro Tools,我正试图从python编写一个avro文件,大部分内容都遵循以下步骤 我有一个有效的模式: {"namespace": "example.avro", "type": "record", "name": "Stock", "fields": [ {"name": "ticker_symbol", "type": "string"}, {"name": "sector", "type": "string"}, {"name": "change", "type"

我正试图从python编写一个avro文件,大部分内容都遵循以下步骤

我有一个有效的模式:

{"namespace": "example.avro",
 "type": "record",
 "name": "Stock",
 "fields": [
     {"name": "ticker_symbol", "type": "string"},
     {"name": "sector",  "type": "string"},
     {"name": "change", "type": "float"},
     {"name": "price",  "type": "float"}
 ]
}
这是相关代码

avro_schema = schema.parse(open("stock.avsc", "rb").read())
output = BytesIO()
writer = DataFileWriter(output, DatumWriter(), avro_schema)

for i in range(1000):
    writer.append(_generate_fake_data())
writer.flush()

with open('record.avro', 'wb') as f:
    f.write(output.getvalue())
但是,当我尝试使用cli avro工具读取此文件的输出时:

avro-tools fragtojson --schema-file stock.avsc ./record.avro  --no-pretty
我得到以下错误:

log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.hadoop.security.authentication.util.KerberosUtil (file:/usr/local/Cellar/avro-tools/1.8.2/libexec/avro-tools-1.8.2.jar) to method sun.security.krb5.Config.getInstance()
WARNING: Please consider reporting this to the maintainers of org.apache.hadoop.security.authentication.util.KerberosUtil
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
Exception in thread "main" org.apache.avro.AvroRuntimeException: Malformed data. Length is negative: -40
    at org.apache.avro.io.BinaryDecoder.doReadBytes(BinaryDecoder.java:336)
    at org.apache.avro.io.BinaryDecoder.readString(BinaryDecoder.java:263)
    at org.apache.avro.io.ResolvingDecoder.readString(ResolvingDecoder.java:201)
    at org.apache.avro.generic.GenericDatumReader.readString(GenericDatumReader.java:422)
    at org.apache.avro.generic.GenericDatumReader.readString(GenericDatumReader.java:414)
    at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:181)
    at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153)
    at org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:232)
    at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:222)
    at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:175)
    at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153)
    at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:145)
    at org.apache.avro.tool.BinaryFragmentToJsonTool.run(BinaryFragmentToJsonTool.java:82)
    at org.apache.avro.tool.Main.run(Main.java:87)
    at org.apache.avro.tool.Main.main(Main.java:76)
我很确定相关的错误是

 Malformed data. Length is negative: -40
但我不知道我做错了什么。我怀疑我写的avro文件不正确


我想写入字节数组(而不是像示例中那样直接写入文件),因为最终我将使用
boto3

将此avro缓冲区发送到AWS Kinesis Firehose。我使用了错误的工具读取文件。我应该用

avro-tools tojson ./record.avro
而不是像问题中那样的
fragtojson
。区别在于
fragtojson
用于单个avro数据,而
tojson
用于整个文件

我想写入字节数组(而不是像示例中那样直接写入文件),因为最终我将使用boto3将这个avro缓冲区发送到AWS Kinesis Firehose

因此,您不需要使用DataFileWriter,您需要的是:

datum_writer = io.DatumWriter(avro_schema)

output = io.BytesIO()
encoder = avro.io.BinaryEncoder(output)
for i in range(1000):
    datum_writer.write(_generate_fake_data(), encoder)

data_bytes = output.getvalue()

如果要打印数据字节的内容,只需使用BinaryDecoder对其进行解码

标记就足以表示您正在使用python。不要不必要地将其添加到标题中。如果您想知道原因,请参阅部分。听起来不错,谢谢您的编辑。我使用了错误的工具来读取文件。我应该使用
avro工具来处理json./record.avro
而不是问题中的fragtojson。不同之处在于fragtojson用于单个avro数据,而tojson用于整个文件。如果我这样做,模式不会包含在输出avro文件中,对吗?我的理解是,模式应该包含在我发送到的有效负载中aws@ErtySeidohl是的,不包括模式,只包括二进制数据,但是,没有什么可以阻止您以另一种方式传输模式,例如,将数据字节与其他元数据包装在同一个动势流中。。。值得一试:)