Google cloud dataflow 运行数据流作业(java)时PubsubIO.readAvrogeneriRecords上出现空指针异常

Google cloud dataflow 运行数据流作业(java)时PubsubIO.readAvrogeneriRecords上出现空指针异常,google-cloud-dataflow,avro,apache-beam,apache-beam-io,Google Cloud Dataflow,Avro,Apache Beam,Apache Beam Io,我有以下apache beam管道: package ch.mycompany.bb8; import ch.mycompany.bb8.transforms.LogRecords; import java.io.File; import java.io.IOException; import org.apache.avro.Schema; import org.apache.beam.sdk.Pipeline; import org.apache.beam.sdk.PipelineRes

我有以下apache beam管道:

package ch.mycompany.bb8;

import ch.mycompany.bb8.transforms.LogRecords;

import java.io.File;
import java.io.IOException;

import org.apache.avro.Schema;
import org.apache.beam.sdk.Pipeline;
import org.apache.beam.sdk.PipelineResult;
import org.apache.beam.sdk.io.gcp.pubsub.PubsubIO;
import org.apache.beam.sdk.options.PipelineOptionsFactory;
import org.apache.beam.sdk.transforms.ParDo;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

public class Bb8Pipeline {
    private static final Logger LOG = LoggerFactory.getLogger(Bb8Pipeline.class);

    /**
     * Runs the pipeline with the supplied options.
     *
     * @param options The execution parameters to the pipeline.
     * @return The result of the pipeline execution.
     */
    public static PipelineResult run(CustomOptions options) {

        Pipeline pipeline = Pipeline.create(options);

        String schemaJson = "{"
        +    "\"type\": \"record\","
        +    "\"namespace\": \"com.google.cloud.pso\","
        +    "\"name\": \"User\","
        +    "\"fields\": ["
        +      "{"
        +        "\"name\": \"name\","
        +        "\"type\": \"string\""
        +      "},"
        +      "{"
        +        "\"name\": \"surname\","
        +        "\"type\": \"string\""
        +      "},"
        +      "{"
        +          "\"name\": \"age\","
        +          "\"type\": \"int\""
        +      "},"
        +      "{"
        +          "\"name\": \"retired\","
        +          "\"type\": \"boolean\""
        +      "}"
        +    "]"
        +  "}";

        Schema avroSchema = new Schema.Parser().parse(schemaJson);

        LOG.info(avroSchema.toString());

        pipeline.apply("Read PubSub record strings", 
        PubsubIO.readAvroGenericRecords(avroSchema)
        .fromSubscription(options.getInputSubscription()))
                .apply("Simply log records", ParDo.of(new LogRecords()))
                .apply("Write PubSub records", PubsubIO.writeStrings().to(options.getOutputTopic()));

        return pipeline.run();
    }

    /**
     * Main entry point for executing the pipeline.
     *
     * @param args The command-line arguments to the pipeline.
     */
    public static void main(String[] args) {
        CustomOptions options = PipelineOptionsFactory.fromArgs(args).withValidation().as(CustomOptions.class);

        options.setStreaming(true);

        run(options);
    }
}
我使用maven运行管道,如下所示:

mvn compile exec:java \
 -Dexec.mainClass=ch.mycompany.bb8.Bb8Pipeline \
      -Dexec.args="--project=t2-prod \
      --stagingLocation=gs://bb-8-staging/staging/  \
      --tempLocation=gs://bb-8-staging/staging/ \
      --runner=DataflowRunner \
      --region=europe-west1 \
      --jobName=bb-8-avro-test \
      --outputTopic=projects/t2-prod/topics/bb-8-output \
      --inputSubscription=projects/t2-prod/subscriptions/bb-8-ingest \
      --maxNumWorkers=1"
我得到以下空指针异常:

INFO: {"type":"record","name":"User","namespace":"com.google.cloud.pso","fields":[{"name":"name","type":"string"},{"name":"surname","type":"string"},{"name":"age","type":"int"},{"name":"retired","type":"boolean"}]}
[WARNING] 
java.lang.NullPointerException
    at java.util.concurrent.ConcurrentHashMap.get (ConcurrentHashMap.java:936)
    at java.util.concurrent.ConcurrentHashMap.containsKey (ConcurrentHashMap.java:964)
    at org.apache.avro.LogicalTypes.fromSchemaImpl (LogicalTypes.java:73)
    at org.apache.avro.LogicalTypes.fromSchema (LogicalTypes.java:47)
    at org.apache.beam.sdk.schemas.utils.AvroUtils.toFieldType (AvroUtils.java:673)
    at org.apache.beam.sdk.schemas.utils.AvroUtils.toBeamField (AvroUtils.java:290)
    at org.apache.beam.sdk.schemas.utils.AvroUtils.toBeamSchema (AvroUtils.java:313)
    at org.apache.beam.sdk.schemas.utils.AvroUtils.getSchema (AvroUtils.java:415)
    at org.apache.beam.sdk.io.gcp.pubsub.PubsubIO.readAvroGenericRecords (PubsubIO.java:592)
    at ch.mycompany.bb8.Bb8Pipeline.run (Bb8Pipeline.java:68)
    at ch.mycompany.bb8.Bb8Pipeline.main (Bb8Pipeline.java:86)
    at sun.reflect.NativeMethodAccessorImpl.invoke0 (Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke (Method.java:498)
    at org.codehaus.mojo.exec.ExecJavaMojo$1.run (ExecJavaMojo.java:282)
    at java.lang.Thread.run (Thread.java:748)
如上面的堆栈跟踪所示,模式按预期记录,因此模式不是空的

是否有人知道如何修复此错误,或者我如何进一步调试

mvn -version
Apache Maven 3.6.0 (97c98ec64a1fdfee7767ce5ffb20918da4f719f3; 2018-10-24T20:41:47+02:00)
Maven home: /opt/apache-maven
Java version: 1.8.0_191, vendor: Oracle Corporation, runtime: /usr/lib/jvm/java-8-oracle/jre
Default locale: en_ZA, platform encoding: UTF-8
OS name: "linux", version: "4.15.0-88-generic", arch: "amd64", family: "unix"
光束版本
2.19.0


org.apache.avro版本
1.8.0

这似乎是一个依赖冲突相关的问题:

  • Beam 2.19.0依赖于Avro 1.8.2(),它具有正确的实现(),因此不会导致问题

  • 但是您提到您使用的是AVRO1.8.0,它的实现()不正确,可能引发
    NullPointerException


因此,解决此问题的一个简单方法是将您使用的Avro版本升级到1.8.2

这似乎是一个依赖冲突相关的问题:

  • Beam 2.19.0依赖于Avro 1.8.2(),它具有正确的实现(),因此不会导致问题

  • 但是您提到您使用的是AVRO1.8.0,它的实现()不正确,可能引发
    NullPointerException


因此,解决这个问题的一个简单方法是将您使用的Avro版本升级到1.8.2

您是否尝试过这样构建模式?您是否尝试过这样构建模式?