Google bigquery BigQueryIO write无法添加新字段,即使已设置“允许添加字段”

Google bigquery BigQueryIO write无法添加新字段,即使已设置“允许添加字段”,google-bigquery,apache-beam,apache-beam-io,Google Bigquery,Apache Beam,Apache Beam Io,我使用Apache Beam的BigqueryIO加载到bigquery中,但加载作业失败,出现错误: "message": "Error while reading data, error message: JSON parsing error in row starting at position 0: No such field: Field_name.", 以下是加载作业的完整配置: "configuration"

我使用Apache Beam的BigqueryIO加载到bigquery中,但加载作业失败,出现错误:

"message": "Error while reading data, error message: JSON parsing error in row starting at position 0: No such field: Field_name.",
以下是加载作业的完整配置:

      "configuration": {
    "jobType": "LOAD",
    "load": {
      "createDisposition": "CREATE_NEVER",
      "destinationTable": {
        "datasetId": "people",
        "projectId": "my_project",
        "tableId": "beam_load_test"
      },
      "ignoreUnknownValues": false,
      "schema": {
        "fields": [
          {
            "mode": "NULLABLE",
            "name": "First_name",
            "type": "STRING"
          },
          {
            "mode": "NULLABLE",
            "name": "Last_name",
            "type": "STRING"
          }
        ]
      },
      "schemaUpdateOptions": [
        "ALLOW_FIELD_ADDITION"
      ],
      "sourceFormat": "NEWLINE_DELIMITED_JSON",
      "sourceUris": [
        "gs://tmp_bucket/BigQueryWriteTemp/beam_load/043518a3-7bae-48ac-8068-f97430c32f58"
      ],
      "useAvroLogicalTypes": false,
      "writeDisposition": "WRITE_APPEND"
    }
  
我可以看到它在GSC中创建的临时文件看起来应该是这样的,并且还提供了模式,并且正在使用useBeamSchema()进行推断

以下是我写入BigQuery的管道代码:

pipeline.apply(
            "Write data to BQ",
            BigQueryIO
                    .<GenericRecord>write()
                    .optimizedWrites()
                    .useBeamSchema()
                    .withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_NEVER)
                    .withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_APPEND)
                    .withSchemaUpdateOptions(ImmutableSet.of(BigQueryIO.Write.SchemaUpdateOption.ALLOW_FIELD_ADDITION))
                    .withCustomGcsTempLocation(options.getGcsTempLocation())
                    .withNumFileShards(options.getNumShards().get())
                    .withMethod(FILE_LOADS)
                    .withTriggeringFrequency(Utils.parseDuration("10s"))
                    .to(new TableReference()
                            .setProjectId(options.getGcpProjectId().get())
                            .setDatasetId(options.getGcpDatasetId().get())
                            .setTableId(options.getGcpTableId().get()))
    )
pipeline.apply(
“将数据写入BQ”,
比格奎奥
.write()
.optimizedWrites()
.useBeamSchema()
.withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE\u NEVER)
.withWriteDisposition(BigQueryIO.Write.WriteDisposition.Write\u追加)
.WithChemaUpdateOptions(ImmutableSet.of(BigQueryIO.Write.SchemaUpdateOption.ALLOW_FIELD_ADDITION))
.withCustomGcsTempLocation(options.getGcsTempLocation())
.withNumFileShards(options.getNumShards().get())
.withMethod(文件加载)
.带触发频率(Utils.parseDuration(“10s”))
.to(新表格参考()
.setProjectId(options.getGcpProjectId().get())
.setDatasetId(options.getGcpDatasetId().get())
.setTableId(options.getGcpTableId().get())
)

关于为什么不添加新字段有什么想法吗?

您可以共享相关的管道代码,扩展
BigqueryIO
类主体吗?@mk_sta,我已经添加了写入BigQueryId的管道代码。您定义了字段名称吗
如果在JSON文件中指定模式,则必须在其中定义新列。如果缺少新的列定义,则在尝试追加数据时会返回以下错误:读取数据时出错,错误消息:从int位置开始的行中分析错误:无此类字段:字段。
只要您在
作业上加载数据。insert
方法,在我看来,@Peter Kim的解决方案是合理的。您在输入文件中指定了吗?@artofdoe,您的问题解决了吗!?