Apache spark 将卡夫卡2.11-0.10.0.1版与spark streaming 2.1.1版集成

Apache spark 将卡夫卡2.11-0.10.0.1版与spark streaming 2.1.1版集成,apache-spark,apache-kafka,spark-streaming,spark-streaming-kafka,Apache Spark,Apache Kafka,Spark Streaming,Spark Streaming Kafka,我尝试在独立集群模式下使用spark 2.1.1版在spark流媒体中运行KafkaWordCount示例。因为我试图集成的服务器上的卡夫卡版本是2.11-0.10.0.1。根据标准,有两个单独的软件包,一个用于0.8.2.1或更高版本,另一个用于0.10.0或更高版本 我已将以下jar添加到spark home中的jars文件夹中: 卡夫卡2.11-0.10.0.1.jar spark-streaming-kafka-0-10-assembly_2.11-2.1.1.jar spark-str

我尝试在独立集群模式下使用spark 2.1.1版在spark流媒体中运行KafkaWordCount示例。因为我试图集成的服务器上的卡夫卡版本是2.11-0.10.0.1。根据标准,有两个单独的软件包,一个用于0.8.2.1或更高版本,另一个用于0.10.0或更高版本

我已将以下jar添加到spark home中的jars文件夹中:

卡夫卡2.11-0.10.0.1.jar spark-streaming-kafka-0-10-assembly_2.11-2.1.1.jar spark-streaming-kafka-0-10_2.11-2.1.1.jar

运行此命令:

/usr/local/spark/bin/spark-submit --num-executors 1 --executor-memory 20G --total-executor-cores 4 --class org.apache.spark.examples.streaming.KafkaWordCount /usr/local/spark/examples/jars/spark-examples_2.11-2.1.1.jar 10.0.16.96:2181 group_test topic 6
显示线程“main”java.lang.NoClassDefFoundError中的异常:org/apache/spark/streaming/kafka/KafkaUtils$

还有其他我错过的罐子吗

日志:

    /usr/local/spark/bin/spark-submit --num-executors 1 --executor-memory 20G --total-executor-cores 4 --class org.apache.spark.examples.streaming.KafkaWordCount /usr/local/spark/examples/jars/spark-examples_2.11-2.1.1.jar 10.0.16.96:2181 group_test streams 6
Warning: Ignoring non-spark config property: fs.s3.awsAccessKeyId=AKIAIETFDAABYC23XVSQ
Warning: Ignoring non-spark config property: fs.s3.awsSecretAccessKey=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
17/07/11 08:04:31 INFO spark.SparkContext: Running Spark version 2.1.1
17/07/11 08:04:31 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/07/11 08:04:31 INFO spark.SecurityManager: Changing view acls to: mahendra
17/07/11 08:04:31 INFO spark.SecurityManager: Changing modify acls to: mahendra
17/07/11 08:04:31 INFO spark.SecurityManager: Changing view acls groups to:
17/07/11 08:04:31 INFO spark.SecurityManager: Changing modify acls groups to:
17/07/11 08:04:31 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(mahendra); groups with view permissions: Set(); users  with modify permissions: Set(mahendra); groups with modify permissions: Set()
17/07/11 08:04:32 INFO util.Utils: Successfully started service 'sparkDriver' on port 38173.
17/07/11 08:04:32 INFO spark.SparkEnv: Registering MapOutputTracker
17/07/11 08:04:32 INFO spark.SparkEnv: Registering BlockManagerMaster
17/07/11 08:04:32 INFO storage.BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
17/07/11 08:04:32 INFO storage.BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
17/07/11 08:04:32 INFO storage.DiskBlockManager: Created local directory at /tmp/blockmgr-241eda29-1cb3-4364-859c-79ba86689fbf
17/07/11 08:04:32 INFO memory.MemoryStore: MemoryStore started with capacity 5.2 GB
17/07/11 08:04:32 INFO spark.SparkEnv: Registering OutputCommitCoordinator
17/07/11 08:04:32 INFO util.log: Logging initialized @1581ms
17/07/11 08:04:32 INFO server.Server: jetty-9.2.z-SNAPSHOT
17/07/11 08:04:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@a7e2d9d{/jobs,null,AVAILABLE,@Spark}
17/07/11 08:04:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@754777cd{/jobs/json,null,AVAILABLE,@Spark}
17/07/11 08:04:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@2b52c0d6{/jobs/job,null,AVAILABLE,@Spark}
17/07/11 08:04:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@372ea2bc{/jobs/job/json,null,AVAILABLE,@Spark}
17/07/11 08:04:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@4cc76301{/stages,null,AVAILABLE,@Spark}
17/07/11 08:04:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@2f08c4b{/stages/json,null,AVAILABLE,@Spark}
17/07/11 08:04:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@3f19b8b3{/stages/stage,null,AVAILABLE,@Spark}
17/07/11 08:04:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@7de0c6ae{/stages/stage/json,null,AVAILABLE,@Spark}
17/07/11 08:04:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@a486d78{/stages/pool,null,AVAILABLE,@Spark}
17/07/11 08:04:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@cdc3aae{/stages/pool/json,null,AVAILABLE,@Spark}
17/07/11 08:04:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@7ef2d7a6{/storage,null,AVAILABLE,@Spark}
17/07/11 08:04:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@5dcbb60{/storage/json,null,AVAILABLE,@Spark}
17/07/11 08:04:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@4c36250e{/storage/rdd,null,AVAILABLE,@Spark}
17/07/11 08:04:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@21526f6c{/storage/rdd/json,null,AVAILABLE,@Spark}
17/07/11 08:04:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@49f5c307{/environment,null,AVAILABLE,@Spark}
17/07/11 08:04:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@299266e2{/environment/json,null,AVAILABLE,@Spark}
17/07/11 08:04:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@5471388b{/executors,null,AVAILABLE,@Spark}
17/07/11 08:04:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@66ea1466{/executors/json,null,AVAILABLE,@Spark}
17/07/11 08:04:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@1601e47{/executors/threadDump,null,AVAILABLE,@Spark}
17/07/11 08:04:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@3bffddff{/executors/threadDump/json,null,AVAILABLE,@Spark}
17/07/11 08:04:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@66971f6b{/static,null,AVAILABLE,@Spark}
17/07/11 08:04:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@50687efb{/,null,AVAILABLE,@Spark}
17/07/11 08:04:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@517bd097{/api,null,AVAILABLE,@Spark}
17/07/11 08:04:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@142eef62{/jobs/job/kill,null,AVAILABLE,@Spark}
17/07/11 08:04:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@4a9cc6cb{/stages/stage/kill,null,AVAILABLE,@Spark}
17/07/11 08:04:32 INFO server.ServerConnector: Started Spark@6de54b40{HTTP/1.1}{0.0.0.0:4040}
17/07/11 08:04:32 INFO server.Server: Started @1696ms
17/07/11 08:04:32 INFO util.Utils: Successfully started service 'SparkUI' on port 4040.
17/07/11 08:04:32 INFO ui.SparkUI: Bound SparkUI to 0.0.0.0, and started at http://10.0.16.15:4040
17/07/11 08:04:32 INFO spark.SparkContext: Added JAR file:/usr/local/spark/examples/jars/spark-examples_2.11-2.1.1.jar at spark://10.0.16.15:38173/jars/spark-examples_2.11-2.1.1.jar with timestamp 1499760272476
17/07/11 08:04:32 INFO client.StandaloneAppClient$ClientEndpoint: Connecting to master spark://ip-10-0-16-15.ap-southeast-1.compute.internal:7077...
17/07/11 08:04:32 INFO client.TransportClientFactory: Successfully created connection to ip-10-0-16-15.ap-southeast-1.compute.internal/10.0.16.15:7077 after 27 ms (0 ms spent in bootstraps)
17/07/11 08:04:32 INFO cluster.StandaloneSchedulerBackend: Connected to Spark cluster with app ID app-20170711080432-0038
17/07/11 08:04:32 INFO client.StandaloneAppClient$ClientEndpoint: Executor added: app-20170711080432-0038/0 on worker-20170707101056-10.0.16.51-40051 (10.0.16.51:40051) with 4 cores
17/07/11 08:04:32 INFO cluster.StandaloneSchedulerBackend: Granted executor ID app-20170711080432-0038/0 on hostPort 10.0.16.51:40051 with 4 cores, 20.0 GB RAM
17/07/11 08:04:32 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 35723.
17/07/11 08:04:32 INFO netty.NettyBlockTransferService: Server created on 10.0.16.15:35723
17/07/11 08:04:32 INFO storage.BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
17/07/11 08:04:32 INFO storage.BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 10.0.16.15, 35723, None)
17/07/11 08:04:32 INFO client.StandaloneAppClient$ClientEndpoint: Executor updated: app-20170711080432-0038/0 is now RUNNING
17/07/11 08:04:32 INFO storage.BlockManagerMasterEndpoint: Registering block manager 10.0.16.15:35723 with 5.2 GB RAM, BlockManagerId(driver, 10.0.16.15, 35723, None)
17/07/11 08:04:32 INFO storage.BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 10.0.16.15, 35723, None)
17/07/11 08:04:32 INFO storage.BlockManager: Initialized BlockManager: BlockManagerId(driver, 10.0.16.15, 35723, None)
17/07/11 08:04:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@34448e6c{/metrics/json,null,AVAILABLE,@Spark}
17/07/11 08:04:32 INFO cluster.StandaloneSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
17/07/11 08:04:33 WARN fs.FileSystem: Cannot load filesystem
java.util.ServiceConfigurationError: org.apache.hadoop.fs.FileSystem: Provider org.apache.hadoop.fs.s3a.S3AFileSystem could not be instantiated
    at java.util.ServiceLoader.fail(ServiceLoader.java:232)
    at java.util.ServiceLoader.access$100(ServiceLoader.java:185)
    at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:384)
    at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404)
    at java.util.ServiceLoader$1.next(ServiceLoader.java:480)
    at org.apache.hadoop.fs.FileSystem.loadFileSystems(FileSystem.java:2631)
    at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2650)
    at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2667)
    at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94)
    at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703)
    at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:172)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:357)
    at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
    at org.apache.spark.streaming.StreamingContext.checkpoint(StreamingContext.scala:238)
    at org.apache.spark.examples.streaming.KafkaWordCount$.main(KafkaWordCount.scala:54)
    at org.apache.spark.examples.streaming.KafkaWordCount.main(KafkaWordCount.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:743)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.NoClassDefFoundError: org/apache/hadoop/fs/StorageStatistics
    at java.lang.Class.getDeclaredConstructors0(Native Method)
    at java.lang.Class.privateGetDeclaredConstructors(Class.java:2671)
    at java.lang.Class.getConstructor0(Class.java:3075)
    at java.lang.Class.newInstance(Class.java:412)
    at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:380)
    ... 24 more
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.fs.StorageStatistics
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    ... 29 more
17/07/11 08:04:33 WARN spark.SparkContext: Spark is not running in local mode, therefore the checkpoint directory must not be on the local filesystem. Directory 'file:/home/mahendra/checkpoint' appears to be on the local filesystem.
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/streaming/kafka/KafkaUtils$
    at org.apache.spark.examples.streaming.KafkaWordCount$.main(KafkaWordCount.scala:57)
    at org.apache.spark.examples.streaming.KafkaWordCount.main(KafkaWordCount.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:743)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.streaming.kafka.KafkaUtils$
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    ... 11 more
17/07/11 08:04:33 INFO spark.SparkContext: Invoking stop() from shutdown hook
17/07/11 08:04:33 INFO server.ServerConnector: Stopped Spark@6de54b40{HTTP/1.1}{0.0.0.0:4040}
17/07/11 08:04:33 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@4a9cc6cb{/stages/stage/kill,null,UNAVAILABLE,@Spark}
17/07/11 08:04:33 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@142eef62{/jobs/job/kill,null,UNAVAILABLE,@Spark}
17/07/11 08:04:33 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@517bd097{/api,null,UNAVAILABLE,@Spark}
17/07/11 08:04:33 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@50687efb{/,null,UNAVAILABLE,@Spark}
17/07/11 08:04:33 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@66971f6b{/static,null,UNAVAILABLE,@Spark}
17/07/11 08:04:33 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@3bffddff{/executors/threadDump/json,null,UNAVAILABLE,@Spark}
17/07/11 08:04:33 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@1601e47{/executors/threadDump,null,UNAVAILABLE,@Spark}
17/07/11 08:04:33 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@66ea1466{/executors/json,null,UNAVAILABLE,@Spark}
17/07/11 08:04:33 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@5471388b{/executors,null,UNAVAILABLE,@Spark}
17/07/11 08:04:33 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@299266e2{/environment/json,null,UNAVAILABLE,@Spark}
17/07/11 08:04:33 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@49f5c307{/environment,null,UNAVAILABLE,@Spark}
17/07/11 08:04:33 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@21526f6c{/storage/rdd/json,null,UNAVAILABLE,@Spark}
17/07/11 08:04:33 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@4c36250e{/storage/rdd,null,UNAVAILABLE,@Spark}
17/07/11 08:04:33 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@5dcbb60{/storage/json,null,UNAVAILABLE,@Spark}
17/07/11 08:04:33 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@7ef2d7a6{/storage,null,UNAVAILABLE,@Spark}
17/07/11 08:04:33 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@cdc3aae{/stages/pool/json,null,UNAVAILABLE,@Spark}
17/07/11 08:04:33 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@a486d78{/stages/pool,null,UNAVAILABLE,@Spark}
17/07/11 08:04:33 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@7de0c6ae{/stages/stage/json,null,UNAVAILABLE,@Spark}
17/07/11 08:04:33 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@3f19b8b3{/stages/stage,null,UNAVAILABLE,@Spark}
17/07/11 08:04:33 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@2f08c4b{/stages/json,null,UNAVAILABLE,@Spark}
17/07/11 08:04:33 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@4cc76301{/stages,null,UNAVAILABLE,@Spark}
17/07/11 08:04:33 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@372ea2bc{/jobs/job/json,null,UNAVAILABLE,@Spark}
17/07/11 08:04:33 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@2b52c0d6{/jobs/job,null,UNAVAILABLE,@Spark}
17/07/11 08:04:33 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@754777cd{/jobs/json,null,UNAVAILABLE,@Spark}
17/07/11 08:04:33 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@a7e2d9d{/jobs,null,UNAVAILABLE,@Spark}
17/07/11 08:04:33 INFO ui.SparkUI: Stopped Spark web UI at http://10.0.16.15:4040
17/07/11 08:04:33 INFO cluster.StandaloneSchedulerBackend: Shutting down all executors
17/07/11 08:04:33 INFO cluster.CoarseGrainedSchedulerBackend$DriverEndpoint: Asking each executor to shut down
17/07/11 08:04:33 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
17/07/11 08:04:33 INFO memory.MemoryStore: MemoryStore cleared
17/07/11 08:04:33 INFO storage.BlockManager: BlockManager stopped
17/07/11 08:04:33 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
17/07/11 08:04:33 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
17/07/11 08:04:33 INFO spark.SparkContext: Successfully stopped SparkContext
17/07/11 08:04:33 INFO util.ShutdownHookManager: Shutdown hook called
17/07/11 08:04:33 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-a7875c5c-cdfc-486e-bf7d-7fe0a7cff228

谢谢

您是如何为
spark-examples\u 2.11-2.1.1.JAR
创建uber JAR的?你能给我们看看你的build.sbt吗?@YuvalItzchakov我正在使用已经创建的jar,它位于
SPARK\u HOME/examples/jars/SPARK-examples\u 2.11-2.1.jar
。在自己创建jar的其他尝试中,我在
SPARK_HOME/pom.xml
中添加了maven dependency(),运行了
mvn包
,然后使用
SPARK_HOME/target/original-SPARK-examples_2.11-2.1.jar
,但它也显示了相同的结果。我以前没有使用maven或sbt,我主要使用SPARK的python api,但对卡夫卡来说不幸的是。0.10及更高版本的用于spark流媒体的python api不可用。如果你能解释更多关于如何正确构建这些uber jar的内容,那将是非常有用的。我不认为
spark-examples_2.11-2.1.1.jar
包含Kafka的jar。您可以在线阅读使用sbt构建JAR时如何使用
sbt程序集