Google cloud dataflow 数据流:无法初始化类org.xerial.snappy.snappy

Google cloud dataflow 数据流:无法初始化类org.xerial.snappy.snappy,google-cloud-dataflow,apache-beam,Google Cloud Dataflow,Apache Beam,我的管道通过Pub\Sub方式从GCS读取数据,然后将数据接收到redis。开始时,它似乎在数据流中运行良好。但是,在运行两天后,在我的管道中发现以下异常 java.lang.NoClassDefFoundError: Could not initialize class org.xerial.snappy.Snappy org.xerial.snappy.SnappyOutputStream.<init>(SnappyOutputStream.java:97)

我的管道通过
Pub\Sub
方式从GCS读取数据,然后将数据接收到redis。开始时,它似乎在
数据流中运行良好。但是,在运行两天后,在我的管道中发现以下异常


java.lang.NoClassDefFoundError: Could not initialize class org.xerial.snappy.Snappy
        org.xerial.snappy.SnappyOutputStream.<init>(SnappyOutputStream.java:97)
        org.xerial.snappy.SnappyOutputStream.<init>(SnappyOutputStream.java:89)
        org.xerial.snappy.SnappyOutputStream.<init>(SnappyOutputStream.java:79)
        org.apache.beam.sdk.util.SerializableUtils.serializeToByteArray(SerializableUtils.java:50)
        org.apache.beam.runners.core.construction.WindowingStrategyTranslation.toProto(WindowingStrategyTranslation.java:216)
        org.apache.beam.runners.core.construction.WindowingStrategyTranslation.toProto(WindowingStrategyTranslation.java:294)
        org.apache.beam.runners.core.construction.WindowingStrategyTranslation.toMessageProto(WindowingStrategyTranslation.java:280)
        org.apache.beam.runners.dataflow.worker.graph.RegisterNodeFunction.apply(RegisterNodeFunction.java:205)
        org.apache.beam.runners.dataflow.worker.graph.RegisterNodeFunction.apply(RegisterNodeFunction.java:97)
        java.util.function.Function.lambda$andThen$1(Function.java:88)
        org.apache.beam.runners.dataflow.worker.graph.CreateRegisterFnOperationFunction.apply(CreateRegisterFnOperationFunction.java:207)
        org.apache.beam.runners.dataflow.worker.graph.CreateRegisterFnOperationFunction.apply(CreateRegisterFnOperationFunction.java:74)
        java.util.function.Function.lambda$andThen$1(Function.java:88)
        java.util.function.Function.lambda$andThen$1(Function.java:88)
        org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.process(StreamingDataflowWorker.java:1172)
        org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.access$1000(StreamingDataflowWorker.java:149)
        org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker$6.run(StreamingDataflowWorker.java:1028)
        java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        java.lang.Thread.run(Thread.java:745)

java.lang.NoClassDefFoundError:无法初始化类org.xerial.snappy.snappy
SnappyOutputStream.(SnappyOutputStream.java:97)
SnappyOutputStream.(SnappyOutputStream.java:89)
SnappyOutputStream.(SnappyOutputStream.java:79)
org.apache.beam.sdk.util.SerializableUtils.SerializationToByteArray(SerializableUtils.java:50)
org.apache.beam.runners.core.construction.WindowingStrategyTranslation.toProto(WindowingStrategyTranslation.java:216)
org.apache.beam.runners.core.construction.WindowingStrategyTranslation.toProto(WindowingStrategyTranslation.java:294)
org.apache.beam.runners.core.construction.WindowingStrategyTranslation.toMessageProto(WindowingStrategyTranslation.java:280)
org.apache.beam.runners.dataflow.worker.graph.RegisterNodeFunction.apply(RegisterNodeFunction.java:205)
org.apache.beam.runners.dataflow.worker.graph.RegisterNodeFunction.apply(RegisterNodeFunction.java:97)
lambda$和第1个$1(function.java:88)
org.apache.beam.runners.dataflow.worker.graph.createRegisterNoperationFunction.apply(createRegisterNoperationFunction.java:207)
org.apache.beam.runners.dataflow.worker.graph.createRegisterNoperationFunction.apply(createRegisterNoperationFunction.java:74)
lambda$和第1个$1(function.java:88)
lambda$和第1个$1(function.java:88)
org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.process(StreamingDataflowWorker.java:1172)
org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.access$1000(StreamingDataflowWorker.java:149)
org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker$6.run(StreamingDataflowWorker.java:1028)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
run(Thread.java:745)

这是数据流还是我的管道问题?

根据谷歌的支持,这个问题是由内存不足引起的。有几种解决方案可以如下所示

  • 以减少管道中的内存需求
  • 使用具有更高内存分配的VM
  • 使用流式自动缩放(Apache Beam SDK for Python不支持)。[3]
  • 使用流媒体引擎[4]。这允许将管道执行从工作虚拟机移出并移入云数据流服务后端,从而减少工作虚拟机上CPU、内存和持久磁盘存储资源的消耗

所以我添加了
--maxNumWorkers=15--autoscalingAlgorithm=THROUGHPUT\u-BASED
来启动数据流作业。它现在运行良好。

根据谷歌的支持,这个问题是由内存不足引起的。有几种解决方案可以如下所示

  • 以减少管道中的内存需求
  • 使用具有更高内存分配的VM
  • 使用流式自动缩放(Apache Beam SDK for Python不支持)。[3]
  • 使用流媒体引擎[4]。这允许将管道执行从工作虚拟机移出并移入云数据流服务后端,从而减少工作虚拟机上CPU、内存和持久磁盘存储资源的消耗
所以我添加了
--maxNumWorkers=15--autoscalingAlgorithm=THROUGHPUT\u-BASED
来启动数据流作业。它现在运行良好