Google bigquery BigQueryIO write无法添加新字段,即使已设置“允许添加字段”
我使用Apache Beam的BigqueryIO加载到bigquery中,但加载作业失败,出现错误:Google bigquery BigQueryIO write无法添加新字段,即使已设置“允许添加字段”,google-bigquery,apache-beam,apache-beam-io,Google Bigquery,Apache Beam,Apache Beam Io,我使用Apache Beam的BigqueryIO加载到bigquery中,但加载作业失败,出现错误: "message": "Error while reading data, error message: JSON parsing error in row starting at position 0: No such field: Field_name.", 以下是加载作业的完整配置: "configuration"
"message": "Error while reading data, error message: JSON parsing error in row starting at position 0: No such field: Field_name.",
以下是加载作业的完整配置:
"configuration": {
"jobType": "LOAD",
"load": {
"createDisposition": "CREATE_NEVER",
"destinationTable": {
"datasetId": "people",
"projectId": "my_project",
"tableId": "beam_load_test"
},
"ignoreUnknownValues": false,
"schema": {
"fields": [
{
"mode": "NULLABLE",
"name": "First_name",
"type": "STRING"
},
{
"mode": "NULLABLE",
"name": "Last_name",
"type": "STRING"
}
]
},
"schemaUpdateOptions": [
"ALLOW_FIELD_ADDITION"
],
"sourceFormat": "NEWLINE_DELIMITED_JSON",
"sourceUris": [
"gs://tmp_bucket/BigQueryWriteTemp/beam_load/043518a3-7bae-48ac-8068-f97430c32f58"
],
"useAvroLogicalTypes": false,
"writeDisposition": "WRITE_APPEND"
}
我可以看到它在GSC中创建的临时文件看起来应该是这样的,并且还提供了模式,并且正在使用useBeamSchema()进行推断
以下是我写入BigQuery的管道代码:
pipeline.apply(
"Write data to BQ",
BigQueryIO
.<GenericRecord>write()
.optimizedWrites()
.useBeamSchema()
.withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_NEVER)
.withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_APPEND)
.withSchemaUpdateOptions(ImmutableSet.of(BigQueryIO.Write.SchemaUpdateOption.ALLOW_FIELD_ADDITION))
.withCustomGcsTempLocation(options.getGcsTempLocation())
.withNumFileShards(options.getNumShards().get())
.withMethod(FILE_LOADS)
.withTriggeringFrequency(Utils.parseDuration("10s"))
.to(new TableReference()
.setProjectId(options.getGcpProjectId().get())
.setDatasetId(options.getGcpDatasetId().get())
.setTableId(options.getGcpTableId().get()))
)
pipeline.apply(
“将数据写入BQ”,
比格奎奥
.write()
.optimizedWrites()
.useBeamSchema()
.withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE\u NEVER)
.withWriteDisposition(BigQueryIO.Write.WriteDisposition.Write\u追加)
.WithChemaUpdateOptions(ImmutableSet.of(BigQueryIO.Write.SchemaUpdateOption.ALLOW_FIELD_ADDITION))
.withCustomGcsTempLocation(options.getGcsTempLocation())
.withNumFileShards(options.getNumShards().get())
.withMethod(文件加载)
.带触发频率(Utils.parseDuration(“10s”))
.to(新表格参考()
.setProjectId(options.getGcpProjectId().get())
.setDatasetId(options.getGcpDatasetId().get())
.setTableId(options.getGcpTableId().get())
)
关于为什么不添加新字段有什么想法吗?您可以共享相关的管道代码,扩展
BigqueryIO
类主体吗?@mk_sta,我已经添加了写入BigQueryId的管道代码。您定义了字段名称吗如果在JSON文件中指定模式,则必须在其中定义新列。如果缺少新的列定义,则在尝试追加数据时会返回以下错误:读取数据时出错,错误消息:从int位置开始的行中分析错误:无此类字段:字段。
只要您在作业上加载数据。insert
方法,在我看来,@Peter Kim的解决方案是合理的。您在输入文件中指定了吗?@artofdoe,您的问题解决了吗!?