Java 为什么卡夫卡抛出收到-1时,从通道读取,插座可能已关闭,当火花流到安全卡夫卡?

Java 为什么卡夫卡抛出收到-1时,从通道读取,插座可能已关闭,当火花流到安全卡夫卡?,java,apache-spark,apache-kafka,spark-streaming,Java,Apache Spark,Apache Kafka,Spark Streaming,我在尝试从spark(使用Java)流式传输到安全的Kafka(使用SASL明文机制)时遇到了这个错误 更详细的错误消息: 17/07/07 14:38:43 INFO SimpleConsumer: Reconnect due to socket error: java.io.EOFException: Received -1 when reading from a channel, the socket has likely been closed. Exception in thread

我在尝试从spark(使用Java)流式传输到安全的Kafka(使用SASL明文机制)时遇到了这个错误

更详细的错误消息:

17/07/07 14:38:43 INFO SimpleConsumer: Reconnect due to socket error: java.io.EOFException: Received -1 when reading from a channel, the socket has likely been closed.
Exception in thread "main" org.apache.spark.SparkException: java.io.EOFException: Received -1 when reading from channel, socket has likely been closed.
at org.apache.spark.streaming.kafka.KafkaCluster$$anonfun$checkErrors$1.apply(KafkaCluster.scala:366)
at org.apache.spark.streaming.kafka.KafkaCluster$$anonfun$checkErrors$1.apply(KafkaCluster.scala:366)
at scala.util.Either.fold(Either.scala:98)
at org.apache.spark.streaming.kafka.KafkaCluster$.checkErrors(KafkaCluster.scala:365)
at org.apache.spark.streaming.kafka.KafkaUtils$.getFromOffsets(KafkaUtils.scala:222)
at org.apache.spark.streaming.kafka.KafkaUtils$.createDirectStream(KafkaUtils.scala:484)
at org.apache.spark.streaming.kafka.KafkaUtils$.createDirectStream(KafkaUtils.scala:607)
at org.apache.spark.streaming.kafka.KafkaUtils.createDirectStream(KafkaUtils.scala)
at SparkStreaming.main(SparkStreaming.java:41)
卡夫卡帕拉姆斯有没有指定的参数或其他东西来让spark streaming认证给卡夫卡

当时,我在Kafka broker server.properties中添加了sasl明文安全参数

authorizer.class.name=kafka.security.auth.SimpleAclAuthorizer
listeners=SASL_PLAINTEXT://:9092
security.inter.broker.protocol=SASL_PLAINTEXT
sasl.mechanism.inter.broker.protocol=PLAIN
sasl.enabled.mechanisms=PLAIN
super.users=User:admin
这也是我的kafka_jaas_server.conf

KafkaServer {
    org.apache.kafka.common.security.plain.PlainLoginModule required
    username="admin"
    password="admin1!"
    user_admin="admin1!"
    user_aldys="admin1!";
};
这是我的kafka_jaas_client.conf

KafkaClient {
    org.apache.kafka.common.security.plain.PlainLoginModule required
    username="aldys"
    password="admin1!";
};
在启动kafka代理时,我还包括我的jaas服务器配置。通过编辑最后一行中的kafka-server-start.sh:

exec $base_dir/kafka-run-class.sh $EXTRA_ARGS -Djava.security.auth.login.config=/etc/kafka/kafka_jaas_server.conf kafka.Kafka "$@"
使用此参数,我可以生成并使用我以前设置ACL的主题

这是我的java代码

import java.util.*;
import org.apache.kafka.common.serialization.StringDeserializer;
import org.apache.spark.SparkConf;
import org.apache.spark.streaming.Duration;
import org.apache.spark.streaming.api.java.JavaDStream;
import org.apache.spark.streaming.api.java.JavaPairInputDStream;
import org.apache.spark.streaming.api.java.JavaStreamingContext;
import org.apache.spark.streaming.kafka.KafkaUtils;

import kafka.serializer.StringDecoder;
import scala.Tuple2;

public class SparkStreaming {

    public static void main(String args[]) throws Exception {
        if (args.length < 2) {
            System.err.println("Usage: SparkStreaming <brokers> <topics>\n" +
                "  <brokers> is a list of one or more Kafka brokers\n" +
                "  <topics> is a list of one or more kafka topics to consume from\n\n");
            System.exit(1);
        }

        String brokers = args[0];
        String topics = args[1];

        Set<String> topicsSet = new HashSet<>(Arrays.asList(topics.split(",")));

        Map<String, String> kafkaParams = new HashMap<>();
        kafkaParams.put("bootstrap.servers", "localhost:9092");
        kafkaParams.put("group.id", "group1");
        kafkaParams.put("auto.offset.reset", "smallest");
        kafkaParams.put("security.protocol", "SASL_PLAINTEXT");

        SparkConf sparkConf = new SparkConf()
                            .setAppName("SparkStreaming")
                            .setMaster("local[2]");
        JavaStreamingContext jssc = new JavaStreamingContext(sparkConf, new Duration(2000));

        JavaPairInputDStream<String, String> messages = KafkaUtils.createDirectStream(
            jssc,
            String.class,
            String.class,
            StringDecoder.class,
            StringDecoder.class,
            kafkaParams,
            topicsSet
        );

        messages.print();

        jssc.start();
        jssc.awaitTermination();
    }
}
import java.util.*;
导入org.apache.kafka.common.serialization.StringDeserializer;
导入org.apache.spark.SparkConf;
导入org.apache.spark.streaming.Duration;
导入org.apache.spark.streaming.api.java.JavaDStream;
导入org.apache.spark.streaming.api.java.JavaPairInputStream;
导入org.apache.spark.streaming.api.java.JavaStreamingContext;
导入org.apache.spark.streaming.kafka.KafkaUtils;
导入kafka.serializer.StringDecoder;
导入scala.Tuple2;
公共类SparkStreaming{
公共静态void main(字符串args[])引发异常{
如果(参数长度<2){
System.err.println(“用法:SparkStreaming\n”+
“是一个或多个Kafka代理的列表\n”+
“是要从中使用的一个或多个卡夫卡主题的列表\n\n”);
系统出口(1);
}
字符串代理=args[0];
字符串主题=args[1];
Set-topicsSet=newhashset(Arrays.asList(topics.split(“,”));
Map kafkaParams=新HashMap();
kafkaParams.put(“bootstrap.servers”,“localhost:9092”);
kafkaParams.put(“group.id”,“group1”);
卡夫卡帕拉姆斯.普特(“自动偏移.复位”,“最小”);
kafkaParams.put(“安全协议”,“SASL_明文”);
SparkConf SparkConf=新SparkConf()
.setAppName(“SparkStreaming”)
.setMaster(“本地[2]”;
JavaStreamingContext jssc=新的JavaStreamingContext(sparkConf,新的持续时间(2000));
JavaPairInputStream消息=KafkaUtils.createDirectStream(
jssc,
String.class,
String.class,
StringDecoder.class,
StringDecoder.class,
卡夫卡帕拉姆斯,
主题集
);
messages.print();
jssc.start();
jssc.aittimination();
}
}
这里还有我在pom.xml中使用的依赖项

<dependencies>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-core_2.11</artifactId>
        <version>2.1.1</version>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-streaming_2.11</artifactId>
        <version>2.1.1</version>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-streaming-kafka_2.11</artifactId>
        <version>1.6.3</version>
    </dependency>
    <dependency>
        <groupId>org.apache.kafka</groupId>
        <artifactId>kafka_2.10</artifactId>
        <version>0.10.2.1</version>
    </dependency>
</dependencies>

org.apache.spark
spark-core_2.11
2.1.1
org.apache.spark
spark-U 2.11
2.1.1
org.apache.spark
spark-streaming-kafka_2.11
1.6.3
org.apache.kafka
卡夫卡2.10
0.10.2.1

我已通过以下指南解决了问题

我将pom.xml中的spark-streaming-kafka_2.11替换为spark-streaming-kafka-0-10_2.11和版本2.11


基于上述问题中的错误日志。我很好奇SimpleConsumer抛出的错误,SimpleConsumer被确定为老消费者。然后,如前所述,替换pom依赖项,并将代码更改为上面的spark streaming integration guide。现在,我可以进入安全的sasl平原卡夫卡。

您的控制台制作者/消费者工作正常吗?如果没有,则应再次检查kafka服务器配置和jaas配置

除此之外,我希望你能提出一些建议

将jaas文件添加到spark中

.config(“spark.driver.extraJavaOptions”,“-Djava.security.auth.login.config=/path/to/jaas.conf”)
.config(“spark.executor.extraJavaOptions”,“-Djava.security.auth.login.config=/path/to/jaas.conf”)

或者您可以使用--conf将其添加到spark submit上

确保jaas文件具有读取权限

还必须配置服务名称,该名称应与Kafka代理的主体名称匹配

例如:-
kafka/hostname。com@EXAMPLE.com

然后加上


kafkaParams.put(“sasl.kerberos.service.name”,“kafka”)

我要做的第一件事是查看kafka代理上的日志,看看您的登录是否有问题。在运行spark streaming时,还包括我的jaas客户端配置,并提供以下指南