在python中将Bson转换为Json
我的情况是,我只能以字节的形式读取BSON文件(它是Apache Beam字节编码器)。因此,我将BSON文件内容作为字节。现在我尝试将其转换为JSON。我的代码是:在python中将Bson转换为Json,python,json,bson,Python,Json,Bson,我的情况是,我只能以字节的形式读取BSON文件(它是Apache Beam字节编码器)。因此,我将BSON文件内容作为字节。现在我尝试将其转换为JSON。我的代码是: from bson import json_util import apache_beam as beam class ParseBsontoJson(beam.DoFn): def process(self, element): print(type(element)) # data =
from bson import json_util
import apache_beam as beam
class ParseBsontoJson(beam.DoFn):
def process(self, element):
print(type(element))
# data = bson.BSON.decode(bson.BSON(element))
data = element.decode('utf-8')
# data = bson.decode_all(element)
# data2 = json_util.dumps(data)
# print(type(data))
return [data]
p = beam.Pipeline(options=pipeline_options)
# This gives me Pcollection of bytes (elements)
test = (p | 'test_r' >> beam.io.ReadFromText(known_args.input + '/' + 'test.bson', coder=coders.BytesCoder()
| 'test_parse' >> beam.ParDo(ParseBsontoJson())) - here I have problem
data = element.decode('latin-1').encode("utf-8")
data2 = json_util.dumps(data)
print(data2)
其中元素
是BSON文件中的一行
我得到的是:
{
"$binary": "w5MBAAAHX2lkAF5cw5/Dvj3Cu3FsIcKMOsO+Am5hbWUABgAAAHNjZW5lAAdhY2NvdW50SWQAXlzDn8OKw6B1OEp4TcKmXQdkYXRhQ2VudGVyAF5cw5/DizpiwrIWY8KLw5EIB3RzaGlydFNpemUAXlzDn8OLPcK7cWwhwow6Sgdvc0ltYWdlAF5cw5/Diz3Cu3FsIcKMOksHcHJvdmlzaW9uZWRCeVdvcmtPcmRlcgBeXMOfw5Q9wrtxbCHCjDpcB3BoeXNpY2FsU2VydmVyTm9kZQBeXMOfw5U9wrtxbCHCjDpmBGJvbmRzAHoAAAADMAByAAAAEGluZGV4AAAAAAAEbmljcwAoAAAAAzAAIAAAAAJtYWMAEgAAADhjOmZkOjFiOjAwOjlmOjk5AAAAB25ldHdvcmsAXlzDn8OmOmLCshZjwovDkTwCaXB2NEFkZHJlc3MADgAAADM4LjEzMy4xNjQuNjAAAAAEbHVucwAUAAAABzAAXlzDn8O+PcK7cWwhwow6w70AAnN0YXR1cwAOAAAAZGVwcm92aXNpb25lZAAJX3VwZGF0ZWQAMMO4w4rCmnABAAAJX2NyZWF0ZWQAMMO4w4rCmnABAAACX2V0YWcAKQAAAGMxN2QyZGNjOGU2ZGQ3ZGQ1NGI1ZGQzMjVlYjkzMDcyZTE2NWVmZjEAAMK5AgAAB19pZABeXMOgCTpiwrIWY8KLw5HCsQJuYW1lAAUAAABhd2F5AAdhY2NvdW50SWQAXlzDn8OKOmLCshZjwovDkQMHZGF0YUNlbnRlcgBeXMOfw4s6YsKyFmPCi8ORCQd0c2hpcnRTaXplAF5cw5/DjDpiwrIWY8KLw5E=",
"$type": "00"
}
我尝试了其他建议,例如类似的StackOverflow响应:
bson.decode_all(element)
或
但它不会将其转换为JSON视图
element.decode('latin-1')
给予
“b²cÓ:nameimpactaccountId^\a=»ql!您能展示一下如何获得元素
?您使用的是哪种Python版本?编辑以提问。Python 3.7。我使用的是apache beam ReadFromTExt和字节解码器,而不是普通的dict
(或类似dict的对象)?类型(元素)
?
element.decode('latin-1')
"b²cÓ:nameimpactaccountId^\à=»ql!<RdataCenter^\à=»ql!<UtshirtSize^\à:b²cÒÞosImage^\à:b²cÒßprovisionedByWorkOrder^\à:b²cÒäphysicalServerNode^\à=»ql!<]bonds|0tindexnics(0 mac5d:b1:d1:82:d5:99network^"