Apache spark 使用kafka的Spark流媒体(启用ssl)

Apache spark 使用kafka的Spark流媒体(启用ssl),apache-spark,pyspark,apache-kafka,Apache Spark,Pyspark,Apache Kafka,我已经看到了很多关于这个话题的问题,但是找不到答案?我试图添加评论,但尚未得到任何回应 要求->启用KafkaSSL的spark流媒体 以下是版本详细信息 我在本地运行spark Spark->2.4.6版 卡夫卡版本->2.2.1 代码段: import sys import logging from datetime import datetime try: from pyspark import SparkContext from pyspark.streaming im

我已经看到了很多关于这个话题的问题,但是找不到答案?我试图添加评论,但尚未得到任何回应

要求->启用KafkaSSL的spark流媒体 以下是版本详细信息

我在本地运行spark

Spark->2.4.6版

卡夫卡版本->2.2.1

代码段:

import sys
import logging
from datetime import datetime

try:
    from pyspark import SparkContext
    from pyspark.streaming import StreamingContext
    from pyspark.streaming.kafka import KafkaUtils
    sc = SparkContext(appName="PythonStreamingDirectKafkaWordCount")
    ssc = StreamingContext(sc, 20)
    brokers = "b1:p,b2:p"
    topic = "topic1"
    kafkaParams = {"security.protocol":"SSL","metadata.broker.list": brokers}
    kvs = KafkaUtils.createDirectStream(ssc, [topic],kafkaParams)
    lines = kvs.map(lambda x: x[1])
    print(lines)
    counts = lines.flatMap(lambda line: line.split(" ")) \
                  .map(lambda word: (word, 1)) \
                  .reduceByKey(lambda a, b: a+b)
    counts.pprint()
    ssc.start()
    ssc.awaitTermination()
except ImportError as e:
    print("Error importing Spark Modules :", e)
    sys.exit(1)
当我提交它如下

spark-submit --packages org.apache.spark:spark-streaming-kafka-0-10_2.10:2.0.0 --master local pysparkKafka.py
这是因为我引用了这篇文章

我得到下面的错误

Spark Streaming's Kafka libraries not found in class path. Try one of the following.

  1. Include the Kafka library and its dependencies with in the
     spark-submit command as

     $ bin/spark-submit --packages org.apache.spark:spark-streaming-kafka-0-8:2.4.6 ...

  2. Download the JAR of the artifact from Maven Central http://search.maven.org/,
     Group Id = org.apache.spark, Artifact Id = spark-streaming-kafka-0-8-assembly, Version = 2.4.6.
     Then, include the jar in the spark-submit command as

     $ bin/spark-submit --jars <spark-streaming-kafka-0-8-assembly.jar> ...

我在代理的SSL端口上遇到以下错误

20/07/24 11:58:46 INFO VerifiableProperties: Verifying properties
20/07/24 11:58:46 INFO VerifiableProperties: Property group.id is overridden to
20/07/24 11:58:46 WARN VerifiableProperties: Property security.protocol is not valid
20/07/24 11:58:46 INFO VerifiableProperties: Property zookeeper.connect is overridden to
20/07/24 11:58:46 INFO SimpleConsumer: Reconnect due to socket error: java.io.EOFException: Received -1 when reading from channel, socket has likely been closed.
20/07/24 11:58:47 INFO SimpleConsumer: Reconnect due to socket error: java.io.EOFException: Received -1 when reading from channel, socket has likely been closed.
Traceback (most recent call last):
注意->上述提交在没有SSL的情况下运行良好。但我的要求是启用SSL

非常感谢您的帮助。谢谢

20/07/24 11:58:46 INFO VerifiableProperties: Verifying properties
20/07/24 11:58:46 INFO VerifiableProperties: Property group.id is overridden to
20/07/24 11:58:46 WARN VerifiableProperties: Property security.protocol is not valid
20/07/24 11:58:46 INFO VerifiableProperties: Property zookeeper.connect is overridden to
20/07/24 11:58:46 INFO SimpleConsumer: Reconnect due to socket error: java.io.EOFException: Received -1 when reading from channel, socket has likely been closed.
20/07/24 11:58:47 INFO SimpleConsumer: Reconnect due to socket error: java.io.EOFException: Received -1 when reading from channel, socket has likely been closed.
Traceback (most recent call last):