Google cloud dataflow 数据流:无法初始化类org.xerial.snappy.snappy
我的管道通过Google cloud dataflow 数据流:无法初始化类org.xerial.snappy.snappy,google-cloud-dataflow,apache-beam,Google Cloud Dataflow,Apache Beam,我的管道通过Pub\Sub方式从GCS读取数据,然后将数据接收到redis。开始时,它似乎在数据流中运行良好。但是,在运行两天后,在我的管道中发现以下异常 java.lang.NoClassDefFoundError: Could not initialize class org.xerial.snappy.Snappy org.xerial.snappy.SnappyOutputStream.<init>(SnappyOutputStream.java:97)
Pub\Sub
方式从GCS读取数据,然后将数据接收到redis。开始时,它似乎在数据流中运行良好。但是,在运行两天后,在我的管道中发现以下异常
java.lang.NoClassDefFoundError: Could not initialize class org.xerial.snappy.Snappy
org.xerial.snappy.SnappyOutputStream.<init>(SnappyOutputStream.java:97)
org.xerial.snappy.SnappyOutputStream.<init>(SnappyOutputStream.java:89)
org.xerial.snappy.SnappyOutputStream.<init>(SnappyOutputStream.java:79)
org.apache.beam.sdk.util.SerializableUtils.serializeToByteArray(SerializableUtils.java:50)
org.apache.beam.runners.core.construction.WindowingStrategyTranslation.toProto(WindowingStrategyTranslation.java:216)
org.apache.beam.runners.core.construction.WindowingStrategyTranslation.toProto(WindowingStrategyTranslation.java:294)
org.apache.beam.runners.core.construction.WindowingStrategyTranslation.toMessageProto(WindowingStrategyTranslation.java:280)
org.apache.beam.runners.dataflow.worker.graph.RegisterNodeFunction.apply(RegisterNodeFunction.java:205)
org.apache.beam.runners.dataflow.worker.graph.RegisterNodeFunction.apply(RegisterNodeFunction.java:97)
java.util.function.Function.lambda$andThen$1(Function.java:88)
org.apache.beam.runners.dataflow.worker.graph.CreateRegisterFnOperationFunction.apply(CreateRegisterFnOperationFunction.java:207)
org.apache.beam.runners.dataflow.worker.graph.CreateRegisterFnOperationFunction.apply(CreateRegisterFnOperationFunction.java:74)
java.util.function.Function.lambda$andThen$1(Function.java:88)
java.util.function.Function.lambda$andThen$1(Function.java:88)
org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.process(StreamingDataflowWorker.java:1172)
org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.access$1000(StreamingDataflowWorker.java:149)
org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker$6.run(StreamingDataflowWorker.java:1028)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
java.lang.Thread.run(Thread.java:745)
java.lang.NoClassDefFoundError:无法初始化类org.xerial.snappy.snappy
SnappyOutputStream.(SnappyOutputStream.java:97)
SnappyOutputStream.(SnappyOutputStream.java:89)
SnappyOutputStream.(SnappyOutputStream.java:79)
org.apache.beam.sdk.util.SerializableUtils.SerializationToByteArray(SerializableUtils.java:50)
org.apache.beam.runners.core.construction.WindowingStrategyTranslation.toProto(WindowingStrategyTranslation.java:216)
org.apache.beam.runners.core.construction.WindowingStrategyTranslation.toProto(WindowingStrategyTranslation.java:294)
org.apache.beam.runners.core.construction.WindowingStrategyTranslation.toMessageProto(WindowingStrategyTranslation.java:280)
org.apache.beam.runners.dataflow.worker.graph.RegisterNodeFunction.apply(RegisterNodeFunction.java:205)
org.apache.beam.runners.dataflow.worker.graph.RegisterNodeFunction.apply(RegisterNodeFunction.java:97)
lambda$和第1个$1(function.java:88)
org.apache.beam.runners.dataflow.worker.graph.createRegisterNoperationFunction.apply(createRegisterNoperationFunction.java:207)
org.apache.beam.runners.dataflow.worker.graph.createRegisterNoperationFunction.apply(createRegisterNoperationFunction.java:74)
lambda$和第1个$1(function.java:88)
lambda$和第1个$1(function.java:88)
org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.process(StreamingDataflowWorker.java:1172)
org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.access$1000(StreamingDataflowWorker.java:149)
org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker$6.run(StreamingDataflowWorker.java:1028)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
run(Thread.java:745)
这是数据流还是我的管道问题?根据谷歌的支持,这个问题是由内存不足引起的。有几种解决方案可以如下所示
- 以减少管道中的内存需求
- 使用具有更高内存分配的VM
- 使用流式自动缩放(Apache Beam SDK for Python不支持)。[3]
- 使用流媒体引擎[4]。这允许将管道执行从工作虚拟机移出并移入云数据流服务后端,从而减少工作虚拟机上CPU、内存和持久磁盘存储资源的消耗
所以我添加了--maxNumWorkers=15--autoscalingAlgorithm=THROUGHPUT\u-BASED
来启动数据流作业。它现在运行良好。根据谷歌的支持,这个问题是由内存不足引起的。有几种解决方案可以如下所示
- 以减少管道中的内存需求
- 使用具有更高内存分配的VM
- 使用流式自动缩放(Apache Beam SDK for Python不支持)。[3]
- 使用流媒体引擎[4]。这允许将管道执行从工作虚拟机移出并移入云数据流服务后端,从而减少工作虚拟机上CPU、内存和持久磁盘存储资源的消耗
所以我添加了--maxNumWorkers=15--autoscalingAlgorithm=THROUGHPUT\u-BASED
来启动数据流作业。它现在运行良好