Apache spark Spark Structed Streaming Kafka-如何读取主题的特定分区并进行偏移量管理
我是卡夫卡结构化流和偏移管理的新手。 使用spark-streaming-kafka-0-10-2.11。 在consumer中,如何读取主题的特定分区Apache spark Spark Structed Streaming Kafka-如何读取主题的特定分区并进行偏移量管理,apache-spark,apache-kafka,spark-streaming,Apache Spark,Apache Kafka,Spark Streaming,我是卡夫卡结构化流和偏移管理的新手。 使用spark-streaming-kafka-0-10-2.11。 在consumer中,如何读取主题的特定分区 comapany_df = sparkSession .readStream() .format("kafka") .option("kafka.bootstrap.servers", applicationPro
comapany_df = sparkSession
.readStream()
.format("kafka")
.option("kafka.bootstrap.servers", applicationProperties.getProperty(BOOTSTRAP_SERVERS_CONFIG))
.option("subscribe", topicName)
我正在使用类似上面的东西。如何指定要从中读取的特定分区?您可以使用以下代码块从特定Kafka分区读取
public void processKafka() throws InterruptedException {
LOG.info("************ SparkStreamingKafka.processKafka start");
// Create the spark application
SparkConf sparkConf = new SparkConf();
sparkConf.set("spark.executor.cores", "5");
//To express any Spark Streaming computation, a StreamingContext object needs to be created.
//This object serves as the main entry point for all Spark Streaming functionality.
//This creates the spark streaming context with a 'numSeconds' second batch size
jssc = new JavaStreamingContext(sparkConf, Durations.seconds(sparkBatchInterval));
//List of parameters
Map<String, Object> kafkaParams = new HashMap<>();
kafkaParams.put("bootstrap.servers", this.getBrokerList());
kafkaParams.put("client.id", "SpliceSpark");
kafkaParams.put("group.id", "mynewgroup");
kafkaParams.put("auto.offset.reset", "earliest");
kafkaParams.put("enable.auto.commit", false);
kafkaParams.put("key.deserializer", StringDeserializer.class);
kafkaParams.put("value.deserializer", StringDeserializer.class);
List<TopicPartition> topicPartitions= new ArrayList<TopicPartition>();
for(int i=0; i<5; i++) {
topicPartitions.add(new TopicPartition("mytopic", i));
}
//List of kafka topics to process
Collection<String> topics = Arrays.asList(this.getTopicList().split(","));
JavaInputDStream<ConsumerRecord<String, String>> messages = KafkaUtils.createDirectStream(
jssc,
LocationStrategies.PreferConsistent(),
ConsumerStrategies.<String, String>Subscribe(topics, kafkaParams)
);
//Another version of an attempt
/*
JavaInputDStream<ConsumerRecord<String, String>> messages = KafkaUtils.createDirectStream(
jssc,
LocationStrategies.PreferConsistent(),
ConsumerStrategies.<String, String>Assign(topicPartitions, kafkaParams)
);
*/
messages.foreachRDD(new PrintRDDDetails());
// Start running the job to receive and transform the data
jssc.start();
//Allows the current thread to wait for the termination of the context by stop() or by an exception
jssc.awaitTermination();
}
public void processKafka()抛出InterruptedException{
LOG.info(“*********SparkStreamingKafka.processKafka start”);
//创建spark应用程序
SparkConf SparkConf=新SparkConf();
sparkConf.set(“spark.executor.cores”、“5”);
//要表示任何Spark流计算,需要创建StreamingContext对象。
//此对象作为所有Spark流功能的主要入口点。
//这将创建第二批大小为“numSeconds”的spark流上下文
jssc=新的JavaStreamingContext(sparkConf,Durations.seconds(sparkBatchInterval));
//参数清单
Map kafkaParams=新HashMap();
kafkaParams.put(“bootstrap.servers”,this.getBrokerList());
kafkaParams.put(“client.id”、“spark”);
kafkaParams.put(“group.id”,“mynewgroup”);
kafkaParams.put(“自动偏移重置”、“最早”);
kafkaParams.put(“enable.auto.commit”,false);
kafkaParams.put(“key.deserializer”,StringDeserializer.class);
kafkaParams.put(“value.deserializer”,StringDeserializer.class);
List topicPartitions=new ArrayList();
对于(inti=0;iGOOD)。您能否用文字解释一下您所做的工作。这将有助于新用户理解,而不仅仅是编写代码answer@RohitYadav谢谢,但我希望使用spark结构化流媒体…而不是流媒体