Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Apache spark 使用Kafka Jupyter的Pyspark结构化本地流媒体_Apache Spark_Pyspark_Apache Kafka_Jupyter Notebook - Fatal编程技术网

Apache spark 使用Kafka Jupyter的Pyspark结构化本地流媒体

Apache spark 使用Kafka Jupyter的Pyspark结构化本地流媒体,apache-spark,pyspark,apache-kafka,jupyter-notebook,Apache Spark,Pyspark,Apache Kafka,Jupyter Notebook,看了其他的答案后,我还是不明白 我能够使用kafkaProducer和kafkaConsumer在我的笔记本中发送和接收消息 producer = KafkaProducer(bootstrap_servers=['127.0.0.1:9092'],value_serializer=lambda m: json.dumps(m).encode('ascii')) consumer = KafkaConsumer('hr',bootstrap_servers=['127.0.0.1

看了其他的答案后,我还是不明白

我能够使用kafkaProducer和kafkaConsumer在我的笔记本中发送和接收消息

    producer = KafkaProducer(bootstrap_servers=['127.0.0.1:9092'],value_serializer=lambda m: json.dumps(m).encode('ascii'))
    consumer = KafkaConsumer('hr',bootstrap_servers=['127.0.0.1:9092'],group_id='abc' )
我尝试使用spark上下文和spark会话连接到流

    from pyspark.streaming.kafka import KafkaUtils
    sc = SparkContext("local[*]", "stream")
    ssc = StreamingContext(sc, 1)
这给了我这个错误

    Spark Streaming's Kafka libraries not found in class path. Try one 
    of the following.

    1. Include the Kafka library and its dependencies with in the
    spark-submit command as

    $ bin/spark-submit --packages org.apache.spark:spark-streaming- 
    kafka-0-8:2.3.2 ...
看来我需要把罐子加到我的桌子上

    !/usr/local/bin/spark-submit   --master local[*]  /usr/local/Cellar/apache-spark/2.3.0/libexec/jars/spark-streaming-kafka-0-8-assembly_2.11-2.3.2.jar pyspark-shell
返回

    Error: No main class set in JAR; please specify one with --class
    Run with --help for usage help or --verbose for debug output
我上什么课?
如何让Pyspark连接到消费者?

您拥有的命令正在尝试运行
spark-streaming-kafka-0-8-assembly_2.11-2.3.2.jar
,并尝试在其中找到
Pyspark shell
作为Java类

正如第一个错误所说,在
spark submit
之后,您遗漏了一个
--packages
,这意味着您可以这样做

spark-submit --packages ... someApp.jar com.example.YourClass


如果您只是本地的Jupyter,您可能想试试Kafka Python,例如,而不是PySpark。。。开销更少,并且没有Java依赖性

您是否以
--packages org.apache.spark:spark-streaming-kafka-0-8-assembly_2.11:2.2.0
的形式提供软件包?@mayank agrawal/usr/local/bin/spark submit--packagesorg.apache.spark:spark-streaming-kafka-0-8-assembly_2.11:2.3.2 pyspark shell在线程“main”java.util.NoSuchElementException:key not found:pyspark_DRIVER_CONN_INFO_路径中返回异常不确定,但它与kafka无关。你的代码可能与我在博客上看到的类似。当我尝试运行KafkaUtils.createStream命令时,我得到了“在类路径中找不到Spark Streaming的Kafka库。请尝试以下操作之一。”错误响应该
os.environ['PYSPARK\u SUBMIT\u ARGS']
行似乎建议相反。它正在代码中添加包。在您的错误中,您遗漏了前面提到的
--包