Apache spark 运行spark submit时出错:java.lang.NoClassDefFoundError:kafka/common/TopicAndPartition

Apache spark 运行spark submit时出错:java.lang.NoClassDefFoundError:kafka/common/TopicAndPartition,apache-spark,pyspark,apache-kafka,spark-streaming,Apache Spark,Pyspark,Apache Kafka,Spark Streaming,我运行了上面的代码,但我不知道为什么会出现这个错误。我花了好几个小时来修理,但我做不到。 我使用的是Spark 2.4.4和Scala 2.13.0。我试图在spark配置文件中设置spark.executor.memory和spark.driver.memory,但仍然无法解决问题 以下是错误: spark-submit --jars spark-streaming-kafka-0-8_2.11-2.4.4.jar direct_approach.py localhost:9092 new_t

我运行了上面的代码,但我不知道为什么会出现这个错误。我花了好几个小时来修理,但我做不到。 我使用的是Spark 2.4.4和Scala 2.13.0。我试图在spark配置文件中设置spark.executor.memory和spark.driver.memory,但仍然无法解决问题

以下是错误:

spark-submit --jars spark-streaming-kafka-0-8_2.11-2.4.4.jar direct_approach.py localhost:9092 new_topic
(教程环境)(基础)harry@harry-坏蛋:~/Desktop/twitter\u项目$spark提交--jars spark-streaming-kafka-0-8\u 2.11-2.4.4.jar direct\u approach.py localhost:9092新主题
19/12/14 14:27:23警告Utils:您的主机名harry badass解析为环回地址:127.0.1.1;改用220.149.84.46(在接口enp4s0上)
19/12/14 14:27:23警告Utils:如果需要绑定到其他地址,请设置SPARK_LOCAL_IP
警告:发生了非法的反射访问操作
警告:org.apache.spark.unsafe.Platform(文件:/home/harry/tutorial env/lib/python3.7/site packages/pyspark/jars/spark-unsafe_2.11-2.4.4.jar)对java.nio.Bits.unaligned()方法的非法反射访问
警告:请考虑将此报告给Or.ApHEC.SPARK.UNSAFE平台的维护者。
警告:使用--invalize access=warn以启用对进一步非法访问操作的警告
警告:所有非法访问操作将在未来版本中被拒绝
19/12/14 14:27:24警告NativeCodeLoader:无法为您的平台加载本机hadoop库。。。在适用的情况下使用内置java类
使用Spark的默认log4j配置文件:org/apache/Spark/log4j-defaults.properties
19/12/14 14:27:24信息SparkContext:运行Spark版本2.4.4
19/12/14 14:27:24信息SparkContext:提交的申请:PythonStreamingDirectKafkaWordCount
19/12/14 14:27:24信息安全管理器:将视图ACL更改为:harry
19/12/14 14:27:24信息安全管理器:将修改ACL更改为:harry
19/12/14 14:27:24信息安全管理器:将视图ACL组更改为:
19/12/14 14:27:24信息安全管理器:将修改ACL组更改为:
19/12/14 14:27:24信息安全管理器:安全管理器:身份验证已禁用;ui ACL被禁用;具有查看权限的用户:Set(harry);具有查看权限的组:Set();具有修改权限的用户:设置(harry);具有修改权限的组:Set()
19/12/14 14:27:24信息实用程序:已在端口41699上成功启动服务“sparkDriver”。
19/12/14 14:27:24信息SparkEnv:注册MapOutputRacker
19/12/14 14:27:24信息SparkEnv:注册BlockManagerMaster
19/12/14 14:27:24信息BlockManagerMasterEndpoint:使用org.apache.spark.storage.DefaultTopologyMapper获取拓扑信息
19/12/14 14:27:24信息BlockManagerMasterEndpoint:BlockManagerMasterEndpoint向上
19/12/14 14:27:24信息DiskBlockManager:已在/tmp/blockmgr-2067d2bb-4b7c-49d8-8f02-f20e8467b21e创建本地目录
19/12/14 14:27:24信息MemoryStore:MemoryStore以434.4 MB的容量启动
19/12/14 14:27:24信息SparkEnv:正在注册OutputCommitCoordinator
19/12/14 14:27:24警告Utils:服务“SparkUI”无法在端口4040上绑定。正在尝试端口4041。
19/12/14 14:27:24信息提示:已在端口4041上成功启动服务“SparkUI”。
19/12/14 14:27:24信息斯巴库:将斯巴库绑定到0.0.0.0,并从http://220.149.84.46:4041
19/12/14 14:27:24信息SparkContext:添加了JARfile:///home/harry/Desktop/twitter_project/spark-streaming-kafka-0-8_2.11-2.4.4.jar 在spark://220.149.84.46:41699/jars/spark-streaming-kafka-0-8_2.11-2.4.4.jar,时间戳为1576301244901
19/12/14 14:27:24信息执行器:正在主机localhost上启动执行器ID驱动程序
19/12/14 14:27:25信息实用程序:已在端口46637上成功启动服务“org.apache.spark.network.netty.NettyBlockTransferService”。
19/12/14 14:27:25信息NettyBlockTransferService:服务器创建于220.149.84.46:46637
19/12/14 14:27:25信息块管理器:使用org.apache.spark.storage.RandomBlockReplicationPolicy作为块复制策略
19/12/14 14:27:25信息BlockManagerMaster:注册BlockManager BlockManagerId(驱动程序,220.149.84.46637,无)
19/12/14 14:27:25信息BlockManagerMasterEndpoint:使用434.4 MB RAM注册块管理器220.149.84.46:46637,BlockManagerId(驱动程序,220.149.84.46,46637,无)
19/12/14 14:27:25信息BlockManagerMaster:Registered BlockManager BlockManagerRid(驱动程序,220.149.84.46637,无)
19/12/14 14:27:25信息块管理器:初始化的块管理器:块管理器ID(驱动程序,220.149.84.46637,无)
线程“thread-5”java.lang.NoClassDefFoundError中出现异常:kafka/common/TopicAndPartition
位于java.base/java.lang.Class.getDeclaredMethods0(本机方法)
位于java.base/java.lang.Class.privateGetDeclaredMethods(Class.java:3139)
位于java.base/java.lang.Class.privateGetPublicMethods(Class.java:3164)
位于java.base/java.lang.Class.getMethods(Class.java:1861)
位于py4j.reflection.ReflectionEngine.getMethodsByNameAndLength(ReflectionEngine.java:345)
位于py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:305)
位于py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:326)
在py4j.Gateway.invoke处(Gateway.java:274)
位于py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
在py4j.commands.CallCommand.execute(CallCommand.java:79)
在py4j.GatewayConnection.run处(GatewayConnection.java:238)
位于java.base/java.lang.Thread.run(Thread.java:844)
原因:java.lang.ClassNotFoundException:kafka.common.TopicAndPartition
位于java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:466)
位于java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:563)
位于java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:496)
... 还有12个
错误:root:发送命令时发生异常。
回溯(最近一次呼叫最后一次):
文件“/home/harry/tutorial env/lib/python3.7/site packages/pyspark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py”,第1159行,在send_命令中
raise Py4JNetworkError(“来自Java端的答案为空”)
py4j.protocol.py4jnetworker错误:A
(tutorial-env) (base) harry@harry-badass:~/Desktop/twitter_project$ spark-submit --jars spark-streaming-kafka-0-8_2.11-2.4.4.jar direct_approach.py localhost:9092 new_topic
19/12/14 14:27:23 WARN Utils: Your hostname, harry-badass resolves to a loopback address: 127.0.1.1; using 220.149.84.46 instead (on interface enp4s0)
19/12/14 14:27:23 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/home/harry/tutorial-env/lib/python3.7/site-packages/pyspark/jars/spark-unsafe_2.11-2.4.4.jar) to method java.nio.Bits.unaligned()
WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
19/12/14 14:27:24 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
19/12/14 14:27:24 INFO SparkContext: Running Spark version 2.4.4
19/12/14 14:27:24 INFO SparkContext: Submitted application: PythonStreamingDirectKafkaWordCount
19/12/14 14:27:24 INFO SecurityManager: Changing view acls to: harry
19/12/14 14:27:24 INFO SecurityManager: Changing modify acls to: harry
19/12/14 14:27:24 INFO SecurityManager: Changing view acls groups to: 
19/12/14 14:27:24 INFO SecurityManager: Changing modify acls groups to: 
19/12/14 14:27:24 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(harry); groups with view permissions: Set(); users  with modify permissions: Set(harry); groups with modify permissions: Set()
19/12/14 14:27:24 INFO Utils: Successfully started service 'sparkDriver' on port 41699.
19/12/14 14:27:24 INFO SparkEnv: Registering MapOutputTracker
19/12/14 14:27:24 INFO SparkEnv: Registering BlockManagerMaster
19/12/14 14:27:24 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
19/12/14 14:27:24 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
19/12/14 14:27:24 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-2067d2bb-4b7c-49d8-8f02-f20e8467b21e
19/12/14 14:27:24 INFO MemoryStore: MemoryStore started with capacity 434.4 MB
19/12/14 14:27:24 INFO SparkEnv: Registering OutputCommitCoordinator
19/12/14 14:27:24 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
19/12/14 14:27:24 INFO Utils: Successfully started service 'SparkUI' on port 4041.
19/12/14 14:27:24 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://220.149.84.46:4041
19/12/14 14:27:24 INFO SparkContext: Added JAR file:///home/harry/Desktop/twitter_project/spark-streaming-kafka-0-8_2.11-2.4.4.jar at spark://220.149.84.46:41699/jars/spark-streaming-kafka-0-8_2.11-2.4.4.jar with timestamp 1576301244901
19/12/14 14:27:24 INFO Executor: Starting executor ID driver on host localhost
19/12/14 14:27:25 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 46637.
19/12/14 14:27:25 INFO NettyBlockTransferService: Server created on 220.149.84.46:46637
19/12/14 14:27:25 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
19/12/14 14:27:25 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 220.149.84.46, 46637, None)
19/12/14 14:27:25 INFO BlockManagerMasterEndpoint: Registering block manager 220.149.84.46:46637 with 434.4 MB RAM, BlockManagerId(driver, 220.149.84.46, 46637, None)
19/12/14 14:27:25 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 220.149.84.46, 46637, None)
19/12/14 14:27:25 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 220.149.84.46, 46637, None)
Exception in thread "Thread-5" java.lang.NoClassDefFoundError: kafka/common/TopicAndPartition
    at java.base/java.lang.Class.getDeclaredMethods0(Native Method)
    at java.base/java.lang.Class.privateGetDeclaredMethods(Class.java:3139)
    at java.base/java.lang.Class.privateGetPublicMethods(Class.java:3164)
    at java.base/java.lang.Class.getMethods(Class.java:1861)
    at py4j.reflection.ReflectionEngine.getMethodsByNameAndLength(ReflectionEngine.java:345)
    at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:305)
    at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:326)
    at py4j.Gateway.invoke(Gateway.java:274)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.GatewayConnection.run(GatewayConnection.java:238)
    at java.base/java.lang.Thread.run(Thread.java:844)
Caused by: java.lang.ClassNotFoundException: kafka.common.TopicAndPartition
    at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:466)
    at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:563)
    at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:496)
    ... 12 more
ERROR:root:Exception while sending command.
Traceback (most recent call last):
  File "/home/harry/tutorial-env/lib/python3.7/site-packages/pyspark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1159, in send_command
    raise Py4JNetworkError("Answer from Java side is empty")
py4j.protocol.Py4JNetworkError: Answer from Java side is empty

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/harry/tutorial-env/lib/python3.7/site-packages/pyspark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 985, in send_command
    response = connection.send_command(command)
  File "/home/harry/tutorial-env/lib/python3.7/site-packages/pyspark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1164, in send_command
    "Error while receiving", e, proto.ERROR_ON_RECEIVE)
py4j.protocol.Py4JNetworkError: Error while receiving
Traceback (most recent call last):
  File "/home/harry/Desktop/twitter_project/direct_approach.py", line 9, in <module>
    kvs = KafkaUtils.createDirectStream(ssc, [topic],{"metadata.broker.list": brokers})
  File "/home/harry/tutorial-env/lib/python3.7/site-packages/pyspark/python/lib/pyspark.zip/pyspark/streaming/kafka.py", line 146, in createDirectStream
  File "/home/harry/tutorial-env/lib/python3.7/site-packages/pyspark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
  File "/home/harry/tutorial-env/lib/python3.7/site-packages/pyspark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 336, in get_return_value
py4j.protocol.Py4JError: An error occurred while calling o24.createDirectStreamWithoutMessageHandler
19/12/14 14:27:25 INFO SparkContext: Invoking stop() from shutdown hook
19/12/14 14:27:25 INFO SparkUI: Stopped Spark web UI at http://220.149.84.46:4041
19/12/14 14:27:25 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
19/12/14 14:27:25 INFO MemoryStore: MemoryStore cleared
19/12/14 14:27:25 INFO BlockManager: BlockManager stopped
19/12/14 14:27:25 INFO BlockManagerMaster: BlockManagerMaster stopped
19/12/14 14:27:25 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
19/12/14 14:27:25 INFO SparkContext: Successfully stopped SparkContext
19/12/14 14:27:25 INFO ShutdownHookManager: Shutdown hook called
19/12/14 14:27:25 INFO ShutdownHookManager: Deleting directory /tmp/spark-8e271f94-bec9-4f7e-aad0-1f3b651e9b29
19/12/14 14:27:25 INFO ShutdownHookManager: Deleting directory /tmp/spark-747cc9ca-bca4-42a7-ad82-d6a055727394
19/12/14 14:27:25 INFO ShutdownHookManager: Deleting directory /tmp/spark-747cc9ca-bca4-42a7-ad82-d6a055727394/pyspark-83cc90cc-1aaa-4dea-b364-4b66487be18f