Kafka使用具有SSL属性的PySpark创建DirectStream

Kafka使用具有SSL属性的PySpark创建DirectStream,pyspark,apache-kafka,Pyspark,Apache Kafka,我们有多个Borker,连接使用SSL协议进行保护。为了创建kafka direct stream,我尝试如下传递ssl信息,但其抛出错误 kafkaParams = {"metadata.broker.list": "host1:port,host2:port,host3:port", "security.protocol":"ssl", "ssl.key.password":"***

我们有多个Borker,连接使用SSL协议进行保护。为了创建kafka direct stream,我尝试如下传递ssl信息,但其抛出错误

kafkaParams = {"metadata.broker.list": "host1:port,host2:port,host3:port",
"security.protocol":"ssl",
"ssl.key.password":"***",
"ssl.keystore.location":"/path1/file.jks",
"ssl.keystore.password":"***",
"ssl.truststore.location":"/path1/file2.jks",
"ssl.truststore.password":"***"}

directKafkaStream = KafkaUtils.createDirectStream(ssc,["topic"],kafkaParams)
错误:

>>> directKafkaStream = KafkaUtils.createDirectStream(ssc,["topic"],kafkaParams)
**20/02/12 11:22:54 WARN utils.VerifiableProperties: Property security.protocol is not valid
20/02/12 11:22:54 WARN utils.VerifiableProperties: Property ssl.key.password is not valid
20/02/12 11:22:54 WARN utils.VerifiableProperties: Property ssl.keystore.location is not valid
20/02/12 11:22:54 WARN utils.VerifiableProperties: Property ssl.keystore.password is not valid
20/02/12 11:22:54 WARN utils.VerifiableProperties: Property ssl.truststore.location is not valid
20/02/12 11:22:54 WARN utils.VerifiableProperties: Property ssl.truststore.password is not valid**
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/cloudera/parcels/SPARK2-2.4.0.cloudera2-1.cdh5.13.3.p3544.1321029/lib/spark2/python/pyspark/streaming/kafka.py", line 146, in createDirectStream
    ssc._jssc, kafkaParams, set(topics), jfromOffsets)
  File "/opt/cloudera/parcels/SPARK2-2.4.0.cloudera2-1.cdh5.13.3.p3544.1321029/lib/spark2/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
  File "/opt/cloudera/parcels/SPARK2-2.4.0.cloudera2-1.cdh5.13.3.p3544.1321029/lib/spark2/python/pyspark/sql/utils.py", line 63, in deco
    return f(*a, **kw)
  File "/opt/cloudera/parcels/SPARK2-2.4.0.cloudera2-1.cdh5.13.3.p3544.1321029/lib/spark2/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o10805.createDirectStreamWithoutMessageHandler.
: org.apache.spark.SparkException: java.io.EOFException
java.io.EOFException
java.io.EOFException
        at org.apache.spark.streaming.kafka.KafkaCluster$$anonfun$checkErrors$1.apply(KafkaCluster.scala:387)
        at org.apache.spark.streaming.kafka.KafkaCluster$$anonfun$checkErrors$1.apply(KafkaCluster.scala:387)
        at scala.util.Either.fold(Either.scala:98)
        at org.apache.spark.streaming.kafka.KafkaCluster$.checkErrors(KafkaCluster.scala:386)
        at org.apache.spark.streaming.kafka.KafkaUtils$.getFromOffsets(KafkaUtils.scala:223)
        at org.apache.spark.streaming.kafka.KafkaUtilsPythonHelper.createDirectStream(KafkaUtils.scala:721)
        at org.apache.spark.streaming.kafka.KafkaUtilsPythonHelper.createDirectStreamWithoutMessageHandler(KafkaUtils.scala:689)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:282)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:238)
        at java.lang.Thread.run(Thread.java:748)

你有答案吗?我也有同样的问题,你有答案吗?我也有同样的问题。
kafkaParams = "host1:port,host2:port,host3:port'"
topic = "topic"

df= spark.readStream.format("kafka")\
.option("kafka.bootstrap.servers",kafkaParams)\
.option("kafka.security.protocol", "SSL")\
.option("kafka.ssl.truststore.location", SparkFiles.get("file.jks")) \
.option("kafka.ssl.truststore.password", "***") \
.option("kafka.ssl.keystore.location", SparkFiles.get("file1.jks")) \
.option("kafka.ssl.keystore.password", "***") \
.option("subscribe",topic)\
.option("startingOffsets","earliest")\
.load()

df1 = df.selectExpr("CAST(value as STRING)","timestamp")
from pyspark.sql.types import StructType, StringType

df_schema = StructType()\
.add("cust_id",StringType())\
.add("name",StringType())\
.add("age",StringType())\
.add("address",StringType())

from pyspark.sql.functions import from_json,col

df2 = df1.select(from_json(col("value"),df_schema).alias("df_a"),"timestamp")

df_console_write = df2\
.writeStream\
.trigger(processingTime='10 seconds')\
.option("truncate","false")\
.format("console")\
.start()

df_console_write.awaitTermination()