Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/apache-kafka/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Scala 将卡夫卡偏移量附加到foreachRDD中的每条记录_Scala_Apache Kafka_Spark Streaming_Mapr - Fatal编程技术网

Scala 将卡夫卡偏移量附加到foreachRDD中的每条记录

Scala 将卡夫卡偏移量附加到foreachRDD中的每条记录,scala,apache-kafka,spark-streaming,mapr,Scala,Apache Kafka,Spark Streaming,Mapr,我想在foreachRDD方法中检索RDD的每条记录上的每个卡夫卡偏移量。我的主题中有一个分区,所以我的RDD也有一个分区。我基本上会尝试这样的东西: dStream.foreachRDD { rdd => if (!rdd.isEmpty) { //get offset first value of the offset val firstOffset = rdd.asInstanceOf[HasOffsetRanges].offsetRanges(0).fromOf

我想在foreachRDD方法中检索RDD的每条记录上的每个卡夫卡偏移量。我的主题中有一个分区,所以我的RDD也有一个分区。我基本上会尝试这样的东西:

dStream.foreachRDD { rdd =>
  if (!rdd.isEmpty) {
    //get offset first value of the offset
    val firstOffset = rdd.asInstanceOf[HasOffsetRanges].offsetRanges(0).fromOffset
    val rddWithOffset = rdd.map(_.value)
      .zipWithIndex()
      .map{ case (v,i) => (v,i + firstOffset)}
  }
}
+------+-----+--------+
|  name|  age|position|
+------+-----+--------+
|johnny|   26|       1|
| chloe|   42|       2|
| brian|   19|       3|
| eliot|   35|       4|
+------+-----+--------+
例如,在我的producer中,我使用循环发送消息,并将索引放置在名为position的列中,如下所示:

dStream.foreachRDD { rdd =>
  if (!rdd.isEmpty) {
    //get offset first value of the offset
    val firstOffset = rdd.asInstanceOf[HasOffsetRanges].offsetRanges(0).fromOffset
    val rddWithOffset = rdd.map(_.value)
      .zipWithIndex()
      .map{ case (v,i) => (v,i + firstOffset)}
  }
}
+------+-----+--------+
|  name|  age|position|
+------+-----+--------+
|johnny|   26|       1|
| chloe|   42|       2|
| brian|   19|       3|
| eliot|   35|       4|
+------+-----+--------+
不幸的是,我注意到,当我在消费者中添加偏移量列时,订单没有得到维护:

+------+-----+--------+------+
|  name|  age|position|offset|
+------+-----+--------+------+
|johnny|   26|       1|     1|
| chloe|   42|       2|     3|
| brian|   19|       3|     4|
| eliot|   35|       4|     2|
+------+-----+--------+------+
我似乎在这个过程中失去了秩序。 你知道吗?谢谢

顺便说一下,我的Java producer如下所示:

KafkaRestProducer<String, Object> producer = new KafkaRestProducer<>(props);

ArrayList<String> names = new ArrayList<String>()
names.add("johnny")
names.add("chloe")
names.add("brian")
names.add("eliot")

ArrayList<Integer> ages = ArrayList<Integer>()
names.add(26)
names.add(42)
names.add(19)
names.add(35)

for (int i = 0; i < 3; ++i) {

    String name = names(i)
    Int age = ages(i)     
    Person person = Person
        .newBuilder()
        .setName(name)
        .setAge(age)
        .setPosition(i)
        .build();

    ProducerRecord<String, Object> record = new ProducerRecord<>("/apps/PERSON/streams:myTopic", name, person);

    producer.send(record, null);
    System.out.println(i);
}
KafkaRestProducer producer=新的KafkaRestProducer(道具);
ArrayList名称=新的ArrayList()
姓名。添加(“约翰尼”)
姓名。添加(“克洛伊”)
姓名。添加(“brian”)
名称。添加(“艾略特”)
ArrayList ages=ArrayList()
姓名.添加(26)
姓名.添加(42)
姓名.添加(19)
名称.添加(35)
对于(int i=0;i<3;++i){
字符串名称=名称(i)
Int年龄=年龄(i)
人
.newBuilder()
.setName(名称)
.setAge(年龄)
.设定位置(i)
.build();
ProducerRecord=newproducerrecord(“/apps/PERSON/streams:myTopic”,姓名,个人);
producer.send(记录,空);
系统输出打印LN(i);
}

我的英语很差。我使用以下代码:

    val Array(brokers, topic, groupId) = args
    val kafkaParams = Map[String, String]("metadata.broker.list" -> brokers, "group.id" -> groupId)
    val topicPartition = Map[TopicAndPartition, Long](TopicAndPartition(topic, 0) -> 1.toLong)
    val messageHandler = (mmd: MessageAndMetadata[String, String]) => (mmd.offset, mmd.message)
    val kafkaStream = KafkaUtils.createDirectStream[String, String, StringDecoder, StringDecoder, (Long, String)](
        ssc, kafkaParams, topicPartition, messageHandler)

    kafkaStream.foreachRDD(rdd => rdd.foreach(println))
输出: (偏移量,信息行)

你说的“放弃订单”是什么意思?你观察到了什么,它与你期望的有什么不同?谢谢你的评论,我编辑我的问题以添加一个例子来说明我是如何放松秩序的。你知道吗?关于卡夫卡主题,你有多少个分区?我的主题中有一个分区。你能添加你的制作人的代码吗?嗨,谢谢你的回答,但是我找不到具有此参数的构造函数createDiirectStream。你在用什么版本的卡夫卡?我在用。火花1.5.2,卡夫卡0.8.2