Pyspark 运行StreamingContext.start()时发生异常
在Windows 10中运行python代码时发生异常。我正在使用ApacheKafka和PySpark 从Kafka读取数据的Python代码段Pyspark 运行StreamingContext.start()时发生异常,pyspark,apache-kafka,Pyspark,Apache Kafka,在Windows 10中运行python代码时发生异常。我正在使用ApacheKafka和PySpark 从Kafka读取数据的Python代码段 ssc=StreamingContext(sc,60) zkQuorum, topic = sys.argv[1:] kvs=KafkaUtils.createStream(ssc, zkQuorum, "spark-streaming-consumer", {topic: 1}) lines = kvs.map(lambda x: [
ssc=StreamingContext(sc,60)
zkQuorum, topic = sys.argv[1:]
kvs=KafkaUtils.createStream(ssc, zkQuorum, "spark-streaming-consumer", {topic: 1})
lines = kvs.map(lambda x: [x[0],x[1]])
lines.pprint()
lines.foreachRDD(SaveRecord)
ssc.start()
ssc.awaitTermination()
运行代码时发生异常
Exception in thread "streaming-start" java.lang.NoClassDefFoundError: org/apache/spark/internal/Logging$class
at org.apache.spark.streaming.kafka.KafkaReceiver.<init>(KafkaInputDStream.scala:69)
at org.apache.spark.streaming.kafka.KafkaInputDStream.getReceiver(KafkaInputDStream.scala:60)
at org.apache.spark.streaming.scheduler.ReceiverTracker.$anonfun$launchReceivers$1(ReceiverTracker.scala:441)
at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:237)
at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198)
at scala.collection.TraversableLike.map(TraversableLike.scala:237)
at scala.collection.TraversableLike.map$(TraversableLike.scala:230)
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:198)
at org.apache.spark.streaming.scheduler.ReceiverTracker.launchReceivers(ReceiverTracker.scala:440)
at org.apache.spark.streaming.scheduler.ReceiverTracker.start(ReceiverTracker.scala:160)
at org.apache.spark.streaming.scheduler.JobScheduler.start(JobScheduler.scala:102)
at org.apache.spark.streaming.StreamingContext.$anonfun$start$1(StreamingContext.scala:583)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at org.apache.spark.util.ThreadUtils$$anon$1.run(ThreadUtils.scala:145)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.internal.Logging$class
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 16 more
线程“streaming start”java.lang.NoClassDefFoundError中出现异常:org/apache/spark/internal/Logging$class
在org.apache.spark.streaming.kafka.KafkaReceiver上(KafkaInputDStream.scala:69)
在org.apache.spark.streaming.kafka.KafkaInputDStream.getReceiver(KafkaInputDStream.scala:60)
位于org.apache.spark.streaming.scheduler.ReceiverTracker.$anonfun$launchReceivers$1(ReceiverTracker.scala:441)
位于scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:237)
在scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
位于scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
位于scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198)
位于scala.collection.TraversableLike.map(TraversableLike.scala:237)
位于scala.collection.TraversableLike.map$(TraversableLike.scala:230)
位于scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:198)
位于org.apache.spark.streaming.scheduler.ReceiverTracker.launchReceivers(ReceiverTracker.scala:440)
位于org.apache.spark.streaming.scheduler.ReceiverTracker.start(ReceiverTracker.scala:160)
位于org.apache.spark.streaming.scheduler.JobScheduler.start(JobScheduler.scala:102)
位于org.apache.spark.streaming.StreamingContext.$anonfun$start$1(StreamingContext.scala:583)
在scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
位于org.apache.spark.util.ThreadUtils$$anon$1.run(ThreadUtils.scala:145)
原因:java.lang.ClassNotFoundException:org.apache.spark.internal.Logging$class
位于java.net.URLClassLoader.findClass(URLClassLoader.java:381)
位于java.lang.ClassLoader.loadClass(ClassLoader.java:424)
位于sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
位于java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 还有16个
这可能是因为Scala版本与Spark不兼容。确保项目配置中的Scala版本与Spark版本支持的版本匹配。 Spark需要Scala 2.12;Spark 3.0.0中删除了对Scala 2.11的支持 第三方jar(如dstream twitter for twitter streaming应用程序或您的Kafka streaming jar)也可能是为应用程序中不受支持的Scala版本而构建的 对我来说,dstream-twitter_2.11-2.3.0-SNAPSHOT不适用于Spark 3.0,它在线程“streaming start”java.lang.NoClassDefFoundError:org/apache/Spark/internal/Logging$class)中出现异常。但当我用scala 2.12版本更新dtream twitter jar时,它解决了这个问题
确保所有Scala版本都正确 你能解决这个问题吗?目前,我面临着完全相同的错误。如果你已经解决了,让我知道,因为我还在想怎么解决