Google cloud dataflow 如何使用Dataflow Python SDK读取BigQuery嵌套表_Google Cloud Dataflow_Apache Beam

Google cloud dataflow 如何使用Dataflow Python SDK读取BigQuery嵌套表

google-cloud-dataflow

Google cloud dataflow 如何使用Dataflow Python SDK读取BigQuery嵌套表,google-cloud-dataflow,apache-beam,Google Cloud Dataflow,Apache Beam,如何使用Apache Beam Python SDK读取嵌套结构 lines = p | io.Read(io.BigQuerySource('project:test.beam_in')) 导致 "reason": "invalidQuery", "message": "Cannot output multiple independently repeated fields at the same time. Found classification_item_distribution an

如何使用Apache Beam Python SDK读取嵌套结构

lines = p | io.Read(io.BigQuerySource('project:test.beam_in'))

导致

"reason": "invalidQuery",
"message": "Cannot output multiple independently repeated fields at the same time. Found classification_item_distribution and category_cat_name"

可以读取嵌套结构吗？

这是BigQuery的一个属性。执行此类查询的两种方法是禁用结果展平（通过BigQuery）或显式展平查询中的字段

对于当前的Python SDK，只有后者可用-有关在何处以及如何调用

flant

函数的指南，请参见“”

禁用展平的功能被归档，就好像您想要订阅更新或讨论一样。

您现在可以通过在创建源代码时添加

flatten\u results=False来直接在Beam Python中读取嵌套结果：
lines = p | io.Read(io.BigQuerySource('project:test.beam_in', flatten_results=False))

请参阅资料来源。
谢谢。有趣的是，JavaSDK允许嵌套结构而无需任何额外配置。