Apache spark 卡夫卡不能'；无法找到集合的引线偏移_Apache Spark_Spark Streaming

Apache spark 卡夫卡不能'；无法找到集合的引线偏移

apache-spark

Apache spark 卡夫卡不能'；无法找到集合的引线偏移,apache-spark,spark-streaming,Apache Spark,Spark Streaming,我使用spark streaming'org.apache.spark:spark-streaming_2.10:1.6.1'和'org.apache.spark:spark-streaming-kafka_2.10:1.6.1'连接到卡夫卡代理版本0.10.0.1。当我尝试此代码时： def messages = KafkaUtils.createDirectStream(jssc, String.class, String.class,

我使用spark streaming'org.apache.spark:spark-streaming_2.10:1.6.1'和'org.apache.spark:spark-streaming-kafka_2.10:1.6.1'连接到卡夫卡代理版本0.10.0.1。当我尝试此代码时：

def messages = KafkaUtils.createDirectStream(jssc,
            String.class,
            String.class,
            StringDecoder.class,
            StringDecoder.class,
            kafkaParams,
            topicsSet)

我收到了以下例外情况：

    INFO consumer.SimpleConsumer: Reconnect due to socket error: java.nio.channels.ClosedChannelException
Exception in thread "main" org.apache.spark.SparkException: java.nio.channels.ClosedChannelException
org.apache.spark.SparkException: Couldn't find leader offsets for Set([stream,0])
    at org.apache.spark.streaming.kafka.KafkaCluster$$anonfun$checkErrors$1.apply(KafkaCluster.scala:366)
    at org.apache.spark.streaming.kafka.KafkaCluster$$anonfun$checkErrors$1.apply(KafkaCluster.scala:366)
    at scala.util.Either.fold(Either.scala:97)
    at org.apache.spark.streaming.kafka.KafkaCluster$.checkErrors(KafkaCluster.scala:365)
    at org.apache.spark.streaming.kafka.KafkaUtils$.getFromOffsets(KafkaUtils.scala:222)
    at org.apache.spark.streaming.kafka.KafkaUtils$.createDirectStream(KafkaUtils.scala:484)
    at org.apache.spark.streaming.kafka.KafkaUtils$.createDirectStream(KafkaUtils.scala:607)
    at org.apache.spark.streaming.kafka.KafkaUtils.createDirectStream(KafkaUtils.scala)
    at org.apache.spark.streaming.kafka.KafkaUtils$createDirectStream.call(Unknown Source)
    at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:45)
    at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:108)
    at com.privowny.classification.jobs.StreamingClassification.main(StreamingClassification.groovy:48)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:483)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

我试图在这个网站上寻找一些答案，但似乎没有答案，你能给我一些建议吗？主题

流

不是空的。

我从经验中知道，可能导致此错误消息的一个原因是Spark驱动程序无法使用代理的播发主机名（

server.properties

中的

播发的.host.name

）到达kafka代理。即使spark配置使用不同的工作地址识别kafka代理，情况也是如此。必须能够从Spark驱动程序访问所有代理的广告主机名

这种情况发生在我身上，因为集群运行在一个单独的AWS帐户中，代理使用内部DNS记录识别自己，这些记录必须复制到另一个AWS帐户。在此之前，我收到了这个错误消息，因为Spark驱动程序无法联系代理请求其最新的偏移量，即使我们在Spark配置中使用代理的私有IP地址

希望这对某人有所帮助。

我也遇到了这个问题。因此，您必须更改卡夫卡上的一些配置

转到您的Kafka配置和配置

侦听器

在Socket服务器设置部分中，格式为：

listeners=PLAINTEXT://[hostname or IP]:[port]

例如：

listeners=PLAINTEXT://192.168.1.24:9092

我从HDP运行kafka，所以当我将

引导的端口切换到时，默认端口是6667而不是9092。服务器到：6667
问题得到解决。
这通常是ZooKeeper问题的信号。重置ZooKeeper并重试。可能是什么问题？我刚刚启动了quickstart文档中的服务器！我遇到过卡夫卡和动物园管理员之间的同步问题。他们都解决了这个问题。