Hadoop 尝试在Dataproc上运行Gobblin时出现NoSuchMethodError

Hadoop 尝试在Dataproc上运行Gobblin时出现NoSuchMethodError,hadoop,bigdata,google-cloud-dataproc,gobblin,Hadoop,Bigdata,Google Cloud Dataproc,Gobblin,我试图在GoogleDataProc上运行,但我遇到了这个NoSuchMethodError错误,无法找到解决方法 Waiting for job output... ... Exception in thread "main" java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ... Caused b

我试图在GoogleDataProc上运行,但我遇到了这个NoSuchMethodError错误,无法找到解决方法

Waiting for job output...
...
Exception in thread "main" java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        ...
Caused by: java.lang.NoSuchMethodError: org.apache.commons.cli.Option.builder()Lorg/apache/commons/cli/Option$Builder;
        at gobblin.runtime.cli.CliOption
        ...
同样的作业(下面的内容)在我的本地hadoop设置(在我的笔记本电脑上)上运行良好,但在dataproc上不运行。是否有人试图在Dataproc上运行Gobblin

这是我的gobblin工作文件:

job.name=kafka2gcs
job.group=gkafka2gcs
job.description=Gobblin job to read messages from Kafka and save as is on GCS
job.lock.enabled=false

kafka.brokers=mykafka:9092
topic.whitelist=mytopic
bootstrap.with.offset=earliest

source.class=gobblin.source.extractor.extract.kafka.KafkaDeserializerSource
kafka.deserializer.type=BYTE_ARRAY
extract.namespace=nskafka2gcs

writer.builder.class=gobblin.writer.SimpleDataWriterBuilder
writer.destination.type=HDFS
mr.job.max.mappers=2
writer.output.format=txt
data.publisher.type=gobblin.publisher.BaseDataPublisher
metrics.enabled=false

fs.uri=file:///.
writer.fs.uri=${fs.uri}
mr.job.root.dir=gobblin
writer.output.dir=${mr.job.root.dir}/out
writer.staging.dir=${mr.job.root.dir}/stg

fs.gs.project.id=my-test-project
data.publisher.fs.uri=gs://my-bucket
state.store.fs.uri=${data.publisher.fs.uri}
data.publisher.final.dir=gobblin/pub
state.store.dir=gobblin/state
下面是我为dataproc发出的命令:

gcloud dataproc clusters create myspark \
  --image-version 1.1 \
  --master-machine-type n1-standard-4 \
  --master-boot-disk-size 10 \
  --num-workers 2 \
  --worker-machine-type n1-standard-4 \
  --worker-boot-disk-size 10 
gcloud dataproc jobs submit hadoop --cluster=myspark \
  --class gobblin.runtime.mapreduce.CliMRJobLauncher \
  --jars /opt/gobblin-dist/lib/gobblin-runtime-0.10.0.jar,/opt/gobblin-dist/lib/gobblin-api-0.10.0.jar,/opt/gobblin-dist/lib/gobblin-avro-json-0.10.0.jar,/opt/gobblin-dist/lib/gobblin-codecs-0.10.0.jar,/opt/gobblin-dist/lib/gobblin-core-0.10.0.jar,/opt/gobblin-dist/lib/gobblin-core-base-0.10.0.jar,/opt/gobblin-dist/lib/gobblin-crypto-0.10.0.jar,/opt/gobblin-dist/lib/gobblin-crypto-provider-0.10.0.jar,/opt/gobblin-dist/lib/gobblin-data-management-0.10.0.jar,/opt/gobblin-dist/lib/gobblin-metastore-0.10.0.jar,/opt/gobblin-dist/lib/gobblin-metrics-0.10.0.jar,/opt/gobblin-dist/lib/gobblin-metrics-base-0.10.0.jar,/opt/gobblin-dist/lib/gobblin-metadata-0.10.0.jar,/opt/gobblin-dist/lib/gobblin-utility-0.10.0.jar,/opt/gobblin-dist/lib/avro-1.8.1.jar,/opt/gobblin-dist/lib/avro-mapred-1.8.1.jar,/opt/gobblin-dist/lib/commons-lang3-3.4.jar,/opt/gobblin-dist/lib/config-1.2.1.jar,/opt/gobblin-dist/lib/data-2.6.0.jar,/opt/gobblin-dist/lib/gson-2.6.2.jar,/opt/gobblin-dist/lib/guava-15.0.jar,/opt/gobblin-dist/lib/guava-retrying-2.0.0.jar,/opt/gobblin-dist/lib/joda-time-2.9.3.jar,/opt/gobblin-dist/lib/javassist-3.18.2-GA.jar,/opt/gobblin-dist/lib/kafka_2.11-0.8.2.2.jar,/opt/gobblin-dist/lib/kafka-clients-0.8.2.2.jar,/opt/gobblin-dist/lib/metrics-core-2.2.0.jar,/opt/gobblin-dist/lib/metrics-core-3.1.0.jar,/opt/gobblin-dist/lib/metrics-graphite-3.1.0.jar,/opt/gobblin-dist/lib/scala-library-2.11.8.jar,/opt/gobblin-dist/lib/influxdb-java-2.1.jar,/opt/gobblin-dist/lib/okhttp-2.4.0.jar,/opt/gobblin-dist/lib/okio-1.4.0.jar,/opt/gobblin-dist/lib/retrofit-1.9.0.jar,/opt/gobblin-dist/lib/reflections-0.9.10.jar \
  --properties mapreduce.job.user.classpath.first=true \
  -- -jobconfig gs://my-bucket/gobblin-kafka-gcs.job
我已经尝试在dataproc集群的所有机器上复制
/usr/lib/hadoop/lib
中的所有gobblins lib jar,但也没有成功

有什么想法吗

gobblin 0.10.0
hadoop 2.7.3
dataproc image 1.1

Hadoop发行版可能正在将其“commons cli”版本泄漏到您的类路径中,并且与Gobblin编译时所使用的版本冲突。和

通常,如果这些依赖项来自您自己的应用程序,您会使用类似于。如果您是从源代码构建Gobblin,您可以看到它是使用commons cli 1.2编译的,还是实际上是一个硬依赖项

如果commons cli 1.3.1完全向后兼容,则可以尝试删除
/usr/lib/hadoop/lib/commons-cli-1.2.jar
并添加您自己下载的
commons-cli-1.3.1.jar

hadoop发行版可能正在将其“commons-cli”版本泄漏到您的类路径中,并且与编译Gobblin时使用的版本冲突。和

通常,如果这些依赖项来自您自己的应用程序,您会使用类似于。如果您是从源代码构建Gobblin,您可以看到它是使用commons cli 1.2编译的,还是实际上是一个硬依赖项

如果commons cli 1.3.1完全向后兼容,则可以尝试删除
/usr/lib/hadoop/lib/commons-cli-1.2.jar
并添加您自己下载的
commons-cli-1.3.1.jar

谢谢您的详细回答。通过从路径中删除commons-cli-1.2和其他一些jar并替换为特定于gobblin的jar,我能够克服这个错误。但我仍然无法在dataproc:-(我正在尝试now)上成功运行它。感谢您提供的详细答案。我能够通过从路径中删除commons-cli-1.2和一些其他JAR并替换为gobblin特定的JAR来克服此错误。但我仍然无法在dataproc:-(我正在尝试