Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/apache-kafka/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Apache spark Spark Structed Streaming Kafka-如何读取主题的特定分区并进行偏移量管理_Apache Spark_Apache Kafka_Spark Streaming - Fatal编程技术网

Apache spark Spark Structed Streaming Kafka-如何读取主题的特定分区并进行偏移量管理

Apache spark Spark Structed Streaming Kafka-如何读取主题的特定分区并进行偏移量管理,apache-spark,apache-kafka,spark-streaming,Apache Spark,Apache Kafka,Spark Streaming,我是卡夫卡结构化流和偏移管理的新手。 使用spark-streaming-kafka-0-10-2.11。 在consumer中,如何读取主题的特定分区 comapany_df = sparkSession .readStream() .format("kafka") .option("kafka.bootstrap.servers", applicationPro

我是卡夫卡结构化流和偏移管理的新手。 使用spark-streaming-kafka-0-10-2.11。 在consumer中,如何读取主题的特定分区

comapany_df = sparkSession
                      .readStream()
                      .format("kafka")
                      .option("kafka.bootstrap.servers", applicationProperties.getProperty(BOOTSTRAP_SERVERS_CONFIG))
                      .option("subscribe", topicName)

我正在使用类似上面的东西。如何指定要从中读取的特定分区?

您可以使用以下代码块从特定Kafka分区读取

public void processKafka() throws InterruptedException {
    LOG.info("************ SparkStreamingKafka.processKafka start");

   // Create the spark application
    SparkConf sparkConf = new SparkConf();
    sparkConf.set("spark.executor.cores", "5");

    //To express any Spark Streaming computation, a StreamingContext object needs to be created. 
    //This object serves as the main entry point for all Spark Streaming functionality.
    //This creates the spark streaming context with a 'numSeconds' second batch size
    jssc = new JavaStreamingContext(sparkConf, Durations.seconds(sparkBatchInterval));


    //List of parameters
    Map<String, Object> kafkaParams = new HashMap<>();
    kafkaParams.put("bootstrap.servers", this.getBrokerList());
    kafkaParams.put("client.id", "SpliceSpark");
    kafkaParams.put("group.id", "mynewgroup");
    kafkaParams.put("auto.offset.reset", "earliest");
    kafkaParams.put("enable.auto.commit", false);
    kafkaParams.put("key.deserializer", StringDeserializer.class);
    kafkaParams.put("value.deserializer", StringDeserializer.class);

    List<TopicPartition> topicPartitions= new ArrayList<TopicPartition>();
    for(int i=0; i<5; i++) {
        topicPartitions.add(new TopicPartition("mytopic", i));
    }


    //List of kafka topics to process
    Collection<String> topics = Arrays.asList(this.getTopicList().split(","));


    JavaInputDStream<ConsumerRecord<String, String>> messages = KafkaUtils.createDirectStream(
            jssc,
            LocationStrategies.PreferConsistent(),
            ConsumerStrategies.<String, String>Subscribe(topics, kafkaParams)
          );

    //Another version of an attempt
    /*
    JavaInputDStream<ConsumerRecord<String, String>> messages = KafkaUtils.createDirectStream(
        jssc,
        LocationStrategies.PreferConsistent(),
        ConsumerStrategies.<String, String>Assign(topicPartitions, kafkaParams)
      );
     */

    messages.foreachRDD(new PrintRDDDetails());


    // Start running the job to receive and transform the data 
    jssc.start();

    //Allows the current thread to wait for the termination of the context by stop() or by an exception
    jssc.awaitTermination();
}
public void processKafka()抛出InterruptedException{
LOG.info(“*********SparkStreamingKafka.processKafka start”);
//创建spark应用程序
SparkConf SparkConf=新SparkConf();
sparkConf.set(“spark.executor.cores”、“5”);
//要表示任何Spark流计算,需要创建StreamingContext对象。
//此对象作为所有Spark流功能的主要入口点。
//这将创建第二批大小为“numSeconds”的spark流上下文
jssc=新的JavaStreamingContext(sparkConf,Durations.seconds(sparkBatchInterval));
//参数清单
Map kafkaParams=新HashMap();
kafkaParams.put(“bootstrap.servers”,this.getBrokerList());
kafkaParams.put(“client.id”、“spark”);
kafkaParams.put(“group.id”,“mynewgroup”);
kafkaParams.put(“自动偏移重置”、“最早”);
kafkaParams.put(“enable.auto.commit”,false);
kafkaParams.put(“key.deserializer”,StringDeserializer.class);
kafkaParams.put(“value.deserializer”,StringDeserializer.class);
List topicPartitions=new ArrayList();

对于(inti=0;iGOOD)。您能否用文字解释一下您所做的工作。这将有助于新用户理解,而不仅仅是编写代码answer@RohitYadav谢谢,但我希望使用spark结构化流媒体…而不是流媒体