Java ApacheBeam Bigquery.fromQuery类CastException_Java_Google Bigquery_Google Cloud Dataflow_Apache Beam

Java ApacheBeam Bigquery.fromQuery类CastException

java google-bigquery google-cloud-dataflow

Java ApacheBeam Bigquery.fromQuery类CastException,java,google-bigquery,google-cloud-dataflow,apache-beam,Java,Google Bigquery,Google Cloud Dataflow,Apache Beam,我试图对BigQuery表执行一个查询，提取一列并填充到一个文件中。下面的代码抛出一个异常。我可能是错的，但进程似乎正在尝试将临时结果以avro格式写入临时位置，从中读取数据并引发强制转换异常 pipeLine.apply( BigQueryIO.read( (SchemaAndRecord elem) -> { GenericRecord record = elem.getRecord();

我试图对

BigQuery

表执行一个查询，提取一列并填充到一个文件中。下面的代码抛出一个异常。我可能是错的，但进程似乎正在尝试将临时结果以avro格式写入临时位置，从中读取数据并引发强制转换异常

pipeLine.apply(
        BigQueryIO.read(
                (SchemaAndRecord elem) -> {
                  GenericRecord record = elem.getRecord();
                  return (String) record.get("column");
                })
                .fromQuery("SELECT column FROM `project.dataset.table`")
                .usingStandardSql()
                .withCoder(AvroCoder.of(String.class)))
        .apply(TextIO.write().to("gs://bucket/test/result/data")
                .withSuffix(TXT_EXT)
                .withCompression(Compression.GZIP));

原因：java.lang.ClassCastException:org.apache.avro.util.Utf8 无法在处强制转换为java.lang.String xxxxx.xxx.xxx.sampling.dataflow.samplingextractor.service.BigQueryExportService.lambda$export$43268ee4$1（BigQueryExportService.java:137）在 org.apache.beam.sdk.io.gcp.bigquery.BigQuerySourceBase$1.apply（BigQuerySourceBase.java:242）在 org.apache.beam.sdk.io.gcp.bigquery.BigQuerySourceBase$1.apply（BigQuerySourceBase.java:235）在 org.apache.beam.sdk.io.AvroSource$AvroBlock.readNextRecord（AvroSource.java:597）在 org.apache.beam.sdk.io.BlockBasedSource$BlockBasedReader.readNextRecord（BlockBasedSource.java:209）在 org.apache.beam.sdk.io.FileBasedSource$FileBasedDrader.advanceImpl（FileBasedSource.java:484）在 org.apache.beam.sdk.io.FileBasedSource$FileBasedDrader.startImpl（FileBasedSource.java:479）在 org.apache.beam.sdk.io.OffsetBasedSource$OffsetBasedReader.start（OffsetBasedSource.java:249）在 org.apache.beam.runners.dataflow.worker.WorkerCustomSources$boundedreaderiator.start（WorkerCustomSources.java:601）

我认为它建议您使用

.withCoder（AvroCoder.of（org.apache.avro.util.Utf8.class））

，因为字符串不能直接从avro Utf8类转换而来。

从外观上看，您似乎只想使用该类

pipeLine.apply(
BigQueryIO.read(
（方案和记录要素）->{
GenericRecord=elem.getRecord（）；
return（String）record.get（“column”）；
})
.fromQuery（“从'project.dataset.table`'中选择列”）
.usingStandardSql（）
.withCoder（StringUtf8Coder.of（））
.apply（TextIO.write（）.to）（“gs://bucket/test/result/data”）
.with suffix（TXT\U EXT）
.withCompression（Compression.GZIP））；

这种方法似乎是正确的@Jay Yoo。你测试过吗？你测试过@Haris Nadeem方法吗？似乎是对的，它不起作用。在发布这个问题之前我试过了。你也有同样的错误吗？是的。它抛出了相同的错误。您是在DataFlow或DirectRunner上运行它吗？无法将其放在一起AvroCoder.of（org.apache.avro.util.Utf8.class））