Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/fsharp/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
使用python将具有不同模式的项附加到Avro中的现有文件_Python_Append_Avro - Fatal编程技术网

使用python将具有不同模式的项附加到Avro中的现有文件

使用python将具有不同模式的项附加到Avro中的现有文件,python,append,avro,Python,Append,Avro,我刚刚开始使用Avro(和python)。我想检查模式的演变。我准备了两个模式,首先用第一个模式保存数据,然后追加新数据并用模式2保存。我没有通过写入得到任何错误,但我无法反序列化数据。我想我的语法错了。如何将具有新架构的项添加到现有文件中 schema = avro.schema.Parse(open('user.avsc', "r").read()) writer = DataFileWriter(open("users.avro", "wb"), DatumWriter(), schem

我刚刚开始使用Avro(和python)。我想检查模式的演变。我准备了两个模式,首先用第一个模式保存数据,然后追加新数据并用模式2保存。我没有通过写入得到任何错误,但我无法反序列化数据。我想我的语法错了。如何将具有新架构的项添加到现有文件中

schema = avro.schema.Parse(open('user.avsc', "r").read())

writer = DataFileWriter(open("users.avro", "wb"), DatumWriter(), schema)
writer.append({"name": "Anna", "favorite_number": 1})
writer.append({"name": "Jan", "favorite_number": 13, "favorite_color": "blue"})
writer.close()

reader = DataFileReader(open("users.avro", "rb"), DatumReader())
for user in reader:
    print (user)
reader.close()
---------------------------------------------------------------------------
UnicodeDecodeError回溯(最近一次呼叫最后一次)
在里面
1 reader=DataFileReader(打开(“users.avro”、“rb”)、DatumReader()
---->2对于读卡器中的用户:
3打印(用户)
4.读卡器关闭()
~\Anaconda3\lib\site packages\avro\datafile.py in\uuuuuu next\uuuuuuuu(self)
524自读块头()
525
-->526 datum=自数据读取器读取(自数据解码器)
527自块计数-=1
528返回基准
~\Anaconda3\lib\site packages\avro\io.py处于读取状态(self,解码器)
487如果self.reader_架构为无:
488 self.reader\u schema=self.writer\u schema
-->489返回self.read\u数据(self.writer\u模式、self.reader\u模式、解码器)
490
491 def read_数据(self、writer_模式、reader_模式、解码器):
读取数据中的~\Anaconda3\lib\site packages\avro\io.py(self、writer\u模式、reader\u模式、解码器)
532返回self.read\u联合(writer\u模式、reader\u模式、解码器)
533 elif writer_schema.type in['record','error','request']:
-->534返回self.read\u记录(writer\u模式、reader\u模式、解码器)
535其他:
536 fail\u msg=“无法读取未知架构类型:%s”%writer\u schema.type
读取记录中的~\Anaconda3\lib\site packages\avro\io.py(self、writer\u模式、reader\u模式、解码器)
732 readers\u field=readers\u fields\u dict.get(field.name)
733如果readers_字段不是None:
-->734 field\u val=self.read\u数据(field.type、readers\u field.type、解码器)
735读取记录[field.name]=字段值
736其他:
读取数据中的~\Anaconda3\lib\site packages\avro\io.py(self、writer\u模式、reader\u模式、解码器)
510返回解码器。读取布尔值()
511 elif writer_schema.type=='string':
-->512返回解码器。读取\u utf8()
513 elif writer_schema.type==“int”:
514返回解码器。read_int()
~\Anaconda3\lib\site packages\avro\io.py处于读取状态\u utf8(self)
260除UnicodeDecodeError外,其他为exn:
261记录器。错误('无效的UTF-8输入字节:%r',输入_字节)
-->262升exn
263
264 def check_crc32(自身,字节):
~\Anaconda3\lib\site packages\avro\io.py处于读取状态\u utf8(self)
257 input_bytes=self.read_bytes()
258试试:
-->259返回输入字节。解码('utf-8')
260除UnicodeDecodeError外,其他为exn:
261记录器。错误('无效的UTF-8输入字节:%r',输入_字节)
UnicodeDecodeError:“utf-8”编解码器无法解码位置30中的字节0xf0:无效的连续字节

根据规范,Avro对象文件只能包含一个模式

进化的过程被定义为拥有一个不同于writer模式的reader模式,但仍然能够读取旧数据


例如,您可以读取一个没有最喜欢的电影的文件,但是阅读器模式将默认最喜欢的电影定义为“无”

Hi,非常感谢。这一部分是我用另一个模式阅读的,但我认为我可以在向文件追加新项目时以某种方式覆盖该模式。显然不是。谢谢你的回答。
{'name': 'Anna', 'favorite_number': 1, 'favorite_color': None}
{'name': 'Jan', 'favorite_number': 13, 'favorite_color': 'blue'}
schema2 = avro.schema.Parse(open('user2.avsc', "r").read())

writer = DataFileWriter(open("users.avro", "ab"), DatumWriter(), schema2)
writer.append({"name": "Eva", "favorite_number": 5, "favorite_food":"raclette"})
writer.append({"name": "Adam", "favorite_number": 122, "favorite_color": "black", "favorite_film": "Gone with the wind"})
writer.close()

reader = DataFileReader(open("users.avro", "rb"), DatumReader())
for user in reader:
    print (user)
reader.close()
Invalid UTF-8 input bytes: b'\x01\x04\x14avro.codec\x08null\x16avro.schema\xf0\x05{"type": "record", "n'


{'name': 'Anna', 'favorite_number': 1, 'favorite_color': None}
{'name': 'Jan', 'favorite_number': 13, 'favorite_color': 'blue'}
---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-128-cbc8ab11fe9f> in <module>
      1 reader = DataFileReader(open("users.avro", "rb"), DatumReader())
----> 2 for user in reader:
      3     print (user)
      4 reader.close()

~\Anaconda3\lib\site-packages\avro\datafile.py in __next__(self)
    524         self._read_block_header()
    525 
--> 526     datum = self.datum_reader.read(self.datum_decoder)
    527     self._block_count -= 1
    528     return datum

~\Anaconda3\lib\site-packages\avro\io.py in read(self, decoder)
    487     if self.reader_schema is None:
    488       self.reader_schema = self.writer_schema
--> 489     return self.read_data(self.writer_schema, self.reader_schema, decoder)
    490 
    491   def read_data(self, writer_schema, reader_schema, decoder):

~\Anaconda3\lib\site-packages\avro\io.py in read_data(self, writer_schema, reader_schema, decoder)
    532       return self.read_union(writer_schema, reader_schema, decoder)
    533     elif writer_schema.type in ['record', 'error', 'request']:
--> 534       return self.read_record(writer_schema, reader_schema, decoder)
    535     else:
    536       fail_msg = "Cannot read unknown schema type: %s" % writer_schema.type

~\Anaconda3\lib\site-packages\avro\io.py in read_record(self, writer_schema, reader_schema, decoder)
    732       readers_field = readers_fields_dict.get(field.name)
    733       if readers_field is not None:
--> 734         field_val = self.read_data(field.type, readers_field.type, decoder)
    735         read_record[field.name] = field_val
    736       else:

~\Anaconda3\lib\site-packages\avro\io.py in read_data(self, writer_schema, reader_schema, decoder)
    510       return decoder.read_boolean()
    511     elif writer_schema.type == 'string':
--> 512       return decoder.read_utf8()
    513     elif writer_schema.type == 'int':
    514       return decoder.read_int()

~\Anaconda3\lib\site-packages\avro\io.py in read_utf8(self)
    260     except UnicodeDecodeError as exn:
    261       logger.error('Invalid UTF-8 input bytes: %r', input_bytes)
--> 262       raise exn
    263 
    264   def check_crc32(self, bytes):

~\Anaconda3\lib\site-packages\avro\io.py in read_utf8(self)
    257     input_bytes = self.read_bytes()
    258     try:
--> 259       return input_bytes.decode('utf-8')
    260     except UnicodeDecodeError as exn:
    261       logger.error('Invalid UTF-8 input bytes: %r', input_bytes)

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf0 in position 30: invalid continuation byte