Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Apache spark 找不到主题的标题;位于org.apache.kafka.common.utils.utils.formatAddress的java.lang.NullPointerException NullPointerException_Apache Spark_Pyspark_Apache Kafka_Spark Streaming Kafka - Fatal编程技术网

Apache spark 找不到主题的标题;位于org.apache.kafka.common.utils.utils.formatAddress的java.lang.NullPointerException NullPointerException

Apache spark 找不到主题的标题;位于org.apache.kafka.common.utils.utils.formatAddress的java.lang.NullPointerException NullPointerException,apache-spark,pyspark,apache-kafka,spark-streaming-kafka,Apache Spark,Pyspark,Apache Kafka,Spark Streaming Kafka,当我们试图从启用SSL的Kafka主题中流式传输数据时,我们面临以下错误。你能在这个问题上帮助我们吗 19/11/07 13:26:54 INFO ConsumerFetcherManager: [ConsumerFetcherManager-1573151189884] Added fetcher for partitions ArrayBuffer() 19/11/07 13:26:54 WARN ConsumerFetcherManager$LeaderFinderThread: [spa

当我们试图从启用SSL的Kafka主题中流式传输数据时,我们面临以下错误。你能在这个问题上帮助我们吗

19/11/07 13:26:54 INFO ConsumerFetcherManager: [ConsumerFetcherManager-1573151189884] Added fetcher for partitions ArrayBuffer()
19/11/07 13:26:54 WARN ConsumerFetcherManager$LeaderFinderThread: [spark-streaming-consumer_dvtcbddc101.corp.cox.com-1573151189725-d40a510f-leader-finder-thread], Failed to find leader for Set([inst_monitor_status_test,2], [inst_monitor_status_test,0], [inst_monitor_status_test,1])
java.lang.NullPointerException
        at org.apache.kafka.common.utils.Utils.formatAddress(Utils.java:408)
        at kafka.cluster.Broker.connectionString(Broker.scala:62)
        at kafka.client.ClientUtils$$anonfun$fetchTopicMetadata$5.apply(ClientUtils.scala:89)
        at kafka.client.ClientUtils$$anonfun$fetchTopicMetadata$5.apply(ClientUtils.scala:89)
        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
        at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
        at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
        at scala.collection.AbstractTraversable.map(Traversable.scala:104)
        at kafka.client.ClientUtils$.fetchTopicMetadata(ClientUtils.scala:89)
        at kafka.consumer.ConsumerFetcherManager$LeaderFinderThread.doWork(ConsumerFetcherManager.scala:66)
        at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60)
Pypark代码:

from __future__ import print_function

import sys

from pyspark import SparkContext
from pyspark.streaming import StreamingContext
from pyspark.streaming.kafka import KafkaUtils
from pyspark import SparkConf, SparkContext
from operator import add
import sys
from pyspark.streaming import StreamingContext
from pyspark.streaming.kafka import KafkaUtils
import json
from kafka import SimpleProducer, KafkaClient
from kafka import KafkaProducer


def handler(message):
    records = message.collect()
    for record in records:
        print(record)


if __name__ == "__main__":
    if len(sys.argv) != 3:
        print("Usage: kafka_wordcount.py <zk> <topic>", file=sys.stderr)
        exit(-1)
    sc = SparkContext(appName="PythonStreamingKafkaWordCount")
    ssc = StreamingContext(sc, 10)

    zkQuorum, topic = sys.argv[1:]
    kvs = KafkaUtils.createStream(ssc, zkQuorum, "spark-streaming-consumer", {topic: 1})
    lines = kvs.map(lambda x: x[1])
    counts = lines.flatMap(lambda line: line.split(" ")).map(lambda word: (word, 1)).reduceByKey(lambda a, b: a+b)
    counts.pprint()
    kvs.foreachRDD(handler)

    ssc.start()
    ssc.awaitTermination()
from\uuuuu future\uuuuu导入打印功能
导入系统
从pyspark导入SparkContext
从pyspark.streaming导入StreamingContext
从pyspark.streaming.kafka导入KafkaUtils
从pyspark导入SparkConf,SparkContext
从操作员导入添加
导入系统
从pyspark.streaming导入StreamingContext
从pyspark.streaming.kafka导入KafkaUtils
导入json
来自kafka import SimpleProducer,kafka客户端
从卡夫卡进口卡夫卡制作人
def处理程序(消息):
records=message.collect()
记录中的记录:
打印(记录)
如果名称=“\uuuuu main\uuuuuuuu”:
如果len(sys.argv)!=三:
打印(“用法:kafka_wordcount.py”,file=sys.stderr)
出口(-1)
sc=SparkContext(appName=“PythonStreamingKafkaWordCount”)
ssc=StreamingContext(sc,10)
zkQuorum,topic=sys.argv[1:]
kvs=KafkaUtils.createStream(ssc,zkQuorum,“火花流消费者”{topic:1})
lines=kvs.map(λx:x[1])
计数=行。flatMap(lambda行:行。拆分(“”)。map(lambda字:(字,1))。reduceByKey(lambda,b:a+b)
counts.pprint()
kvs.foreachRDD(处理程序)
ssc.start()
ssc.终止协议()
Spark提交命令:

Spark提交:


/usr/hdp/2.6.1.0-129/spark2/bin/spark-submit——软件包org.apache.spark:spark-streaming-kafka-0-8_2.11:2.1.0,org.apache.spark:spark-sql-kafka-0-8_2.11:2.1:2.1.1.0,org.apache.spark:spark:spark:spark-2.11:2.3.0-dsstream2.py主机:2181 inst-monitor\u status\u test感谢您的输入。我已经在下面的方法中传递了SSL参数,并且工作正常

from pyspark.sql import SparkSession
from pyspark.sql.functions import *
from pyspark.sql.types import *
from pyspark.streaming import StreamingContext
import time

#  Spark Streaming context :

spark = SparkSession.builder.appName('PythonStreamingDirectKafkaWordCount').getOrCreate()
sc = spark.sparkContext
ssc = StreamingContext(sc, 20)

#  Kafka Topic Details :

KAFKA_TOPIC_NAME_CONS = "topic_name"
KAFKA_OUTPUT_TOPIC_NAME_CONS = "topic_to_hdfs"
KAFKA_BOOTSTRAP_SERVERS_CONS = 'kafka_server:9093'

#  Creating  readstream DataFrame :

df = spark.readStream \
     .format("kafka") \
     .option("kafka.bootstrap.servers", KAFKA_BOOTSTRAP_SERVERS_CONS) \
     .option("subscribe", KAFKA_TOPIC_NAME_CONS) \
     .option("startingOffsets", "earliest") \
     .option("kafka.security.protocol","SASL_SSL")\
     .option("kafka.client.id" ,"Clinet_id")\
     .option("kafka.sasl.kerberos.service.name","kafka")\
     .option("kafka.ssl.truststore.location", "/home/path/kafka_trust.jks") \
     .option("kafka.ssl.truststore.password", "password_rd") \
     .option("kafka.sasl.kerberos.keytab","/home/path.keytab") \
     .option("kafka.sasl.kerberos.principal","path") \
     .load()

df1 = df.selectExpr( "CAST(value AS STRING)")

#  Creating  Writestream DataFrame :

df1.writeStream \
   .option("path","target_directory") \
   .format("csv") \
   .option("checkpointLocation","chkpint_directory") \
   .outputMode("append") \
   .start()

ssc.awaitTermination()

位于kafka.cluster.Broker.connectionString
。。。听起来你没有找到集群的正确地址。如果您
打印(zkQuorum)
,您的地址是否正确?还有,你真的需要火花吗?您似乎已经拥有kafka import KafkaProducer的
,这是一个本机Python libraryPlus,您似乎缺少SparkThanks的任何SSL相关设置以供参考。zkQuorum具有正确的地址。但仍然不确定如何为Spark传递SSL相关设置。你能告诉我你对此有什么想法吗。示例代码将是伟大的!!你看到这个了吗?基于上面的链接,似乎与SSL相关的Spark设置,我们只能合并Scala或Java代码。我认为我们不能通过KafkaParms将SSL相关信息合并到pyspark代码中。请告诉我如何处理pyspark代码中的SSL功能以连接Kerberos群集。使用KafkaUtils.createDirectStream时,kafkaParams选项是什么?如果你知道,请告诉我。