Python 2.7 将带有dict属性的JSON写入Google云数据存储_Python 2.7_Google Cloud Datastore_Google Cloud Dataflow

Python 2.7 将带有dict属性的JSON写入Google云数据存储

python-2.7 google-cloud-dataflow

Python 2.7 将带有dict属性的JSON写入Google云数据存储,python-2.7,google-cloud-datastore,google-cloud-dataflow,Python 2.7,Google Cloud Datastore,Google Cloud Dataflow,使用ApacheBeam（Python2.7SDK），我试图将JSON文件作为实体写入Google云数据存储示例JSON： { "CustId": "005056B81111", "Name": "John Smith", "Phone": "827188111", "Email": "john@xxx.com", "addresses": [ {"type": "Billing", "streetAddress": "Street 7", "city": "Malmo", "po

使用ApacheBeam（Python2.7SDK），我试图将JSON文件作为实体写入Google云数据存储

示例JSON：

{
"CustId": "005056B81111",
"Name": "John Smith", 
"Phone": "827188111",
"Email": "john@xxx.com", 
"addresses": [
    {"type": "Billing", "streetAddress": "Street 7", "city": "Malmo", "postalCode": "CR0 4UZ"},
    {"type": "Shipping", "streetAddress": "Street 6", "city": "Stockholm", "postalCode": "YYT IKO"}
]
}

我编写了一个Apache Beam管道，主要包括3个步骤

beam.io.ReadFromText（输入文件路径）

beam.ParDo（CreateEntities（））

WriteToDatastore（项目）

在步骤2中，我将JSON对象（dict）转换为一个实体

class CreateEntities(beam.DoFn):
  def process(self, element):
    element = element.encode('ascii','ignore')
    element = json.loads(element)
    Id = element.pop('CustId')
    entity = entity_pb2.Entity()
    datastore_helper.add_key_path(entity.key, 'CustomerDF', Id)
    datastore_helper.add_properties(entity, element)
    return [entity]

这适用于基本属性。然而，由于地址本身是一个dict对象，所以它失败了。我读过一篇类似的文章

但是，没有获得转换dict->entity的确切代码

尝试在下面将地址元素设置为实体，但不起作用

element['addresses'] = entity_pb2.Entity()

其他参考资料：

是否尝试将其存储为重复的结构化属性

ndb.StructuredProperty

s显示在数据流中，键被展平，对于重复的结构化属性，结构化属性对象中的每个单独属性都成为一个数组。所以我认为你需要这样写：

datastore_helper.add_properties(entity, {
    ...
    "addresses.type": ["Billing", "Shipping"],
    "addresses.streetAddress": ["Street 7", "Street 6"],
    "addresses.city": ["Malmo", "Stockholm"],
    "addresses.postalCode": ["CR0 4UZ", "YYT IKO"],
})

或者，如果您试图将其另存为ndb.json属性，则可以执行以下操作：

datastore_helper.add_properties(entity, {
        ...
        "addresses": json.dumps(element['addresses']),
    })

我知道这是一个老问题，但我有一个类似的问题（尽管是Python3.6和NDB），并编写了一个函数将

dict

中的所有

dict

转换为

实体。这将使用递归遍历所有节点，并根据需要进行转换：
def dict_to_entity(data):

    # the data can be a dict or a list, and they are iterated over differently
    # also create a new object to store the child objects
    if type(data) == dict:
        childiterator = data.items()
        new_data = {}
    elif type(data) == list:
        childiterator = enumerate(data)
        new_data = []
    else:
        return

    for i, child in childiterator:

        # if the child is a dict or a list, continue drilling...
        if type(child) in [dict, list]:
            new_child = dict_to_entity(child)
        else:
            new_child = child

        # add the child data to the new object
        if type(data) == dict:
            new_data[i] = new_child
        else:
            new_data.append(new_child)

    # convert the new object to Entity if needed
    if type(data) == dict:
        child_entity = datastore.Entity()
        child_entity.update(new_data)
        return child_entity
    else:
        return new_data

谢谢@Alex。我现在正在尝试使用NDB。为此，我使用命令“from google.appengine.ext import NDB”导入NDB，当我尝试执行“pip install google.appengine.ext”时，它失败地说“找不到满足要求的版本google.appengine.ext（from versions:）找不到与google.appengine.ext“I did，”gcloud components install app engine python“匹配的发行版，但仍然存在相同的错误。Oh ndb只能从app engine standard中使用。但是我假设你有一个应用程序引擎应用程序，你正在尝试将这些记录发送到我没有使用AppEngine。我正在使用Python（2.7）SDK运行ApacheBeam管道。这个管道从云存储读取.JSON文件并将其写入云数据存储。所以从您的回复来看，ndb似乎无法使用。对，但一旦这些记录进入数据存储。您计划如何读取/使用它们？是的，我们有一个AppEngine来访问数据存储中的记录。它是使用Node.js构建的。那么，我可以尝试用Apache Beam解决这个问题吗？