Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/278.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Hadoop Streaming mangles Python生产的Avro_Python_Hadoop_Mapreduce_Avro - Fatal编程技术网

Hadoop Streaming mangles Python生产的Avro

Hadoop Streaming mangles Python生产的Avro,python,hadoop,mapreduce,avro,Python,Hadoop,Mapreduce,Avro,我有一个相当简单的脚本,它以JSON格式获取Twitter数据,并将其转换为Avro文件 from avro import schema, datafile, io import json, sys from types import * def main(): if len(sys.argv) < 2: print "Usage: cat input.json | python2.7 JSONtoAvro.py output" return

我有一个相当简单的脚本,它以JSON格式获取Twitter数据,并将其转换为Avro文件

from avro import schema, datafile, io
import json, sys
from types import *

def main():
    if len(sys.argv) < 2:
        print "Usage: cat input.json | python2.7 JSONtoAvro.py output"
        return

    s = schema.parse(open("tweet.avsc").read())
    f = open(sys.argv[1], 'wb')

    writer = datafile.DataFileWriter(f, io.DatumWriter(), s, codec = 'deflate')

    failed = 0

    for line in sys.stdin:
        line = line.strip()

    try:
        data = json.loads(line)
    except ValueError as detail:
        continue

    try:
        writer.append(data)
    except io.AvroTypeException as detail:
        print line
        failed += 1

writer.close()

print str(failed) + " failed in schema"

if __name__ == '__main__':
    main()

我正在努力解决这个问题。任何建议都将不胜感激。

不确定您是否仍在寻找答案。\u似乎是unicode字符。试试像这样的东西

   resp = json.dumps(line) 
   data = json.loads(resp)
如果是unicode表示法导致了错误,dumps将解决该问题

   resp = json.dumps(line) 
   data = json.loads(resp)