Pyspark 运行StreamingContext.start()时发生异常

Pyspark 运行StreamingContext.start()时发生异常,pyspark,apache-kafka,Pyspark,Apache Kafka,在Windows 10中运行python代码时发生异常。我正在使用ApacheKafka和PySpark 从Kafka读取数据的Python代码段 ssc=StreamingContext(sc,60) zkQuorum, topic = sys.argv[1:] kvs=KafkaUtils.createStream(ssc, zkQuorum, "spark-streaming-consumer", {topic: 1}) lines = kvs.map(lambda x: [

在Windows 10中运行python代码时发生异常。我正在使用ApacheKafka和PySpark

从Kafka读取数据的Python代码段

ssc=StreamingContext(sc,60)
zkQuorum, topic = sys.argv[1:]       
kvs=KafkaUtils.createStream(ssc, zkQuorum, "spark-streaming-consumer", {topic: 1})
lines = kvs.map(lambda x: [x[0],x[1]])
lines.pprint()
lines.foreachRDD(SaveRecord)
ssc.start()
ssc.awaitTermination()
运行代码时发生异常

Exception in thread "streaming-start" java.lang.NoClassDefFoundError: org/apache/spark/internal/Logging$class
            at org.apache.spark.streaming.kafka.KafkaReceiver.<init>(KafkaInputDStream.scala:69)
            at org.apache.spark.streaming.kafka.KafkaInputDStream.getReceiver(KafkaInputDStream.scala:60)
            at org.apache.spark.streaming.scheduler.ReceiverTracker.$anonfun$launchReceivers$1(ReceiverTracker.scala:441)
            at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:237)
            at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
            at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
            at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198)
            at scala.collection.TraversableLike.map(TraversableLike.scala:237)
            at scala.collection.TraversableLike.map$(TraversableLike.scala:230)
            at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:198)
            at org.apache.spark.streaming.scheduler.ReceiverTracker.launchReceivers(ReceiverTracker.scala:440)
            at org.apache.spark.streaming.scheduler.ReceiverTracker.start(ReceiverTracker.scala:160)
            at org.apache.spark.streaming.scheduler.JobScheduler.start(JobScheduler.scala:102)
            at org.apache.spark.streaming.StreamingContext.$anonfun$start$1(StreamingContext.scala:583)
            at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
            at org.apache.spark.util.ThreadUtils$$anon$1.run(ThreadUtils.scala:145)
    Caused by: java.lang.ClassNotFoundException: org.apache.spark.internal.Logging$class
            at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
            at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
            at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
            at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
            ... 16 more
线程“streaming start”java.lang.NoClassDefFoundError中出现异常:org/apache/spark/internal/Logging$class 在org.apache.spark.streaming.kafka.KafkaReceiver上(KafkaInputDStream.scala:69) 在org.apache.spark.streaming.kafka.KafkaInputDStream.getReceiver(KafkaInputDStream.scala:60) 位于org.apache.spark.streaming.scheduler.ReceiverTracker.$anonfun$launchReceivers$1(ReceiverTracker.scala:441) 位于scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:237) 在scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36) 位于scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33) 位于scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198) 位于scala.collection.TraversableLike.map(TraversableLike.scala:237) 位于scala.collection.TraversableLike.map$(TraversableLike.scala:230) 位于scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:198) 位于org.apache.spark.streaming.scheduler.ReceiverTracker.launchReceivers(ReceiverTracker.scala:440) 位于org.apache.spark.streaming.scheduler.ReceiverTracker.start(ReceiverTracker.scala:160) 位于org.apache.spark.streaming.scheduler.JobScheduler.start(JobScheduler.scala:102) 位于org.apache.spark.streaming.StreamingContext.$anonfun$start$1(StreamingContext.scala:583) 在scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) 位于org.apache.spark.util.ThreadUtils$$anon$1.run(ThreadUtils.scala:145) 原因:java.lang.ClassNotFoundException:org.apache.spark.internal.Logging$class 位于java.net.URLClassLoader.findClass(URLClassLoader.java:381) 位于java.lang.ClassLoader.loadClass(ClassLoader.java:424) 位于sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349) 位于java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 还有16个
这可能是因为Scala版本与Spark不兼容。确保项目配置中的Scala版本与Spark版本支持的版本匹配。 Spark需要Scala 2.12;Spark 3.0.0中删除了对Scala 2.11的支持

第三方jar(如dstream twitter for twitter streaming应用程序或您的Kafka streaming jar)也可能是为应用程序中不受支持的Scala版本而构建的

对我来说,dstream-twitter_2.11-2.3.0-SNAPSHOT不适用于Spark 3.0,它在线程“streaming start”java.lang.NoClassDefFoundError:org/apache/Spark/internal/Logging$class)中出现异常。但当我用scala 2.12版本更新dtream twitter jar时,它解决了这个问题


确保所有Scala版本都正确

你能解决这个问题吗?目前,我面临着完全相同的错误。如果你已经解决了,让我知道,因为我还在想怎么解决