Google bigquery 使用自定义列名将Avro文件加载到带有嵌套记录的GCS_Google Bigquery_Google Cloud Storage_Avro

Google bigquery 使用自定义列名将Avro文件加载到带有嵌套记录的GCS

google-bigquery google-cloud-storage

Google bigquery 使用自定义列名将Avro文件加载到带有嵌套记录的GCS,google-bigquery,google-cloud-storage,avro,Google Bigquery,Google Cloud Storage,Avro,我试图加载一个带有嵌套记录的Avro文件。其中一条记录有一个模式的并集。加载到BigQuery时，它在每个union元素上创建了一个很长的名称，如com\u mycompany\u data\u nestedClassname\u值。这个名字很长。想知道是否有一种方法可以在不加完整包名前缀的情况下指定名称比如说。下面是Avro模式 { "type": "record", "name": "EventRecording", "namespace": "com.someth

我试图加载一个带有嵌套记录的Avro文件。其中一条记录有一个模式的并集。加载到BigQuery时，它在每个union元素上创建了一个很长的名称，如com\u mycompany\u data\u nestedClassname\u值。这个名字很长。想知道是否有一种方法可以在不加完整包名前缀的情况下指定名称

比如说。下面是Avro模式

{
    "type": "record",
    "name": "EventRecording",
    "namespace": "com.something.event",
    "fields": [
        {
            "name": "eventName",
            "type": "string"
        },
        {
            "name": "eventTime",
            "type": "long"
        },
        {
            "name": "userId",
            "type": "string"
        },
        {
            "name": "eventDetail",
            "type": [
                {
                    "type": "record",
                    "name": "Network",
                    "namespace": "com.something.event",
                    "fields": [
                        {
                            "name": "hostName",
                            "type": "string"
                        },
                        {
                            "name": "ipAddress",
                            "type": "string"
                        }
                    ]
                },
                {
                    "type": "record",
                    "name": "DiskIO",
                    "namespace": "com.something.event",
                    "fields": [
                        {
                            "name": "path",
                            "type":  "string"
                        },
                        {
                            "name": "bytesRead",
                            "type": "long"
                        }
                    ]
                }
            ]
        }
    ]
}

想出

是否可以将长字段名（如eventDetail.com\u something\u event\u Network\u value）设置为类似eventDetail.NetworkAvro加载没有BigQuery中应有的灵活性（基本示例是它不支持加载字段子集（读卡器架构）。此外，BigQuery目前不支持对列进行重命名。唯一的选项是使用专有名称重新创建表（从现有表创建新表）或者从以前的表中重新创建表
看起来没有文档记录，但实际上可以在加载Avro文件时指定一个架构以加载字段的子集。如果加载到架构为Avro文件子集的现有表，这也适用。只会加载指定的/table架构中的字段。即测试。您是否有一个如何使用模式子集加载数据的示例？这是来自示例Avro文件的模式。它表示一个带有两个字段re和im的复数。
{“type”：“record”，“name”：“cpx”，“fields”：[{“name”：“re”，“type”：“long”，“doc”：“re field doc.”}，{“name”：“im”，“type”：“long”、“doc”：“im field doc.”}]}
此命令仅加载re字段：
bq load--source\u format=AVRO dataset.table path/to/file.AVRO're:integer'
将仅使用一个字段re:integer创建表。