Java 使用kafka streams根据消息键向主题发送消息_Java_Apache Kafka_Apache Kafka Streams

Java 使用kafka streams根据消息键向主题发送消息

java apache-kafka

Java 使用kafka streams根据消息键向主题发送消息,java,apache-kafka,apache-kafka-streams,Java,Apache Kafka,Apache Kafka Streams,我希望能够根据消息键的键将Kafkastream中的所有记录发送到不同的主题。卡夫卡中的流包含名称作为键，记录作为值。我想根据记录的键将这些记录扇出到不同的主题数据：（jhon->{jhonsRecord}），（sean->{SeanRecord}），（mary->{marysRecord}），（jhon->{jhonsRecord2}），期望主题1：名称：jhon->（jhon->{jhonsRecord}），（jhon->{jhonsRecord2}）主题2:sean->（sea

我希望能够根据消息键的键将Kafkastream中的所有记录发送到不同的主题。卡夫卡中的流包含名称作为键，记录作为值。我想根据记录的键将这些记录扇出到不同的主题

数据：（jhon->{jhonsRecord}），（sean->{SeanRecord}），（mary->{marysRecord}），（jhon->{jhonsRecord2}），期望

主题1：名称：jhon->（jhon->{jhonsRecord}），（jhon->{jhonsRecord2}）
主题2:sean->（sean->{seansRecord}）
主题3:mary->（mary->{marysRecord}）

下面是我现在做这件事的方式，但是由于名单是hudge，所以速度很慢。另外，即使有一些名字的记录，我也需要遍历整个列表，请建议修复

    for( String name : names )
    {
        recordsByName.filterNot(( k, v ) -> k.equalsIgnoreCase(name)).to(name);
    }

我想你要找的是

以下内容未经测试，但它显示了总体思路

// get a list of predicates to branch a topic on
final List<String> names = Arrays.asList("jhon", "sean", "mary");
final Predicate[] predicates = names.stream()
    .map((Function<String, Predicate<String, Object>>) n -> (s, o) -> s.equals(n))
    .toArray(Predicate[]::new);

// example input
final KStream<Object, Object> stream = new StreamsBuilder().stream("names");

// split the topic
KStream<String, Object>[] branches = stream.branch(predicates);
for (int i = 0; i < names.size(); i++) {
    branches[i].to(names.get(i));
}

// KStream branches[0] contains all records whose keys are "jhon"
// KStream branches[1] contains all records whose keys are "sean"
...

//获取要在其上分支主题的谓词列表
最终列表名称=Arrays.asList（“jhon”、“sean”、“mary”）；
最终谓词[]谓词=names.stream（）
.map（（函数）n->（s，o）->s.equals（n））
.toArray（谓词[]：：new）；
//示例输入
final KStream stream=new StreamsBuilder（）.stream（“名称”）；
//分开话题
KStream[]branchs=stream.branch（谓词）；
对于（int i=0；i

我想你要找的是

以下内容未经测试，但它显示了总体思路

// get a list of predicates to branch a topic on
final List<String> names = Arrays.asList("jhon", "sean", "mary");
final Predicate[] predicates = names.stream()
    .map((Function<String, Predicate<String, Object>>) n -> (s, o) -> s.equals(n))
    .toArray(Predicate[]::new);

// example input
final KStream<Object, Object> stream = new StreamsBuilder().stream("names");

// split the topic
KStream<String, Object>[] branches = stream.branch(predicates);
for (int i = 0; i < names.size(); i++) {
    branches[i].to(names.get(i));
}

// KStream branches[0] contains all records whose keys are "jhon"
// KStream branches[1] contains all records whose keys are "sean"
...

//获取要在其上分支主题的谓词列表
最终列表名称=Arrays.asList（“jhon”、“sean”、“mary”）；
最终谓词[]谓词=names.stream（）
.map（（函数）n->（s，o）->s.equals（n））
.toArray（谓词[]：：new）；
//示例输入
final KStream stream=new StreamsBuilder（）.stream（“名称”）；
//分开话题
KStream[]branchs=stream.branch（谓词）；
对于（int i=0；i

我认为您应该使用

KStream:：to（final-TopicNameExtractor-topicExtractor）

函数。它使您能够计算每条消息的主题名称

示例代码：

最终KStream流=？？？；
to（（键、值、记录上下文）->key）；

我认为您应该使用

KStream:：to（final-TopicNameExtractor-topicExtractor）

函数。它使您能够计算每条消息的主题名称

示例代码：

最终KStream流=？？？；
to（（键、值、记录上下文）->key）；

如果需要为每个用户生成聚合数据，则不需要为每个用户编写单独的主题。您最好在源流上编写一个聚合。这样，您就不会每个键都有一个主题，但仍然可以独立地在每个用户上运行操作

Serde<UserRecord> recordSerde = ...
KStream<Stream, UserAggregate> aggregateByName = recordsByName
   .groupByKey(Grouped.with(Serdes.String(), recordSerde))
   .aggregate(...)
   .toStream()

Serde-recordSerde=。。。
KStream aggregateByName=recordsByName
.groupByKey（Grouped.with（Serdes.String（），recordSerde））
.合计（……）
.toStream（）

有关详细信息，请参阅

这种方法将扩展到数百万用户，这是您目前无法通过每个用户一个主题的方法实现的。

如果您需要为每个用户生成聚合数据，您不需要为每个用户编写单独的主题。您最好在源流上编写一个聚合。这样，您就不会每个键都有一个主题，但仍然可以独立地在每个用户上运行操作

Serde<UserRecord> recordSerde = ...
KStream<Stream, UserAggregate> aggregateByName = recordsByName
   .groupByKey(Grouped.with(Serdes.String(), recordSerde))
   .aggregate(...)
   .toStream()

Serde-recordSerde=。。。
KStream aggregateByName=recordsByName
.groupByKey（Grouped.with（Serdes.String（），recordSerde））
.合计（……）
.toStream（）

有关详细信息，请参阅

这种方法将扩展到数百万用户，这是您目前无法通过“每个用户一个主题”方法实现的。

假设输出主题没有任何剩余内容，我比我的回答更喜欢它假设输出主题没有任何剩余内容，我比我的回答更喜欢这一点注：这可能会导致一堆主题（默认设置）。这里的用例是什么？@cricket_007感谢您指出这一点，但这正是本文的意图。每个名称有100条记录，每个记录都需要单独处理和聚合。注意：这可能会导致一系列主题（默认设置）。这里的用例是什么？@cricket_007感谢您指出这一点，但这正是本文的意图。每个名称有100条记录，每个记录都需要单独处理和聚合。这是一个很好的想法，这里的一个缺陷是KPI需要在聚合数据和原始数据上运行。所以在任何时候，我都可能需要访问用户的原始数据和聚合数据。例如，一个作业可能需要查看单个用户的原始数据，另一个作业需要查看单个用户的聚合数据。这仍然是可能的。在上述情况下，您可以将聚合结果写入另一个主题，但原始主题和原始数据仍然可供任何其他使用者读取。您甚至可以在同一个Kafka Streams应用程序中执行此操作，在该应用程序中，您有几个不同的转换运行在同一个源主题下。实际上，Kafka Streams会读取您的每个输入消息一次，然后依次将它们传递给每个单独的转换。是的，您是对的，原始数据和聚合数据只在两个主题中可用，但当我只想查看一个特定用户的数据时，我没有任何选择，只能订阅所有内容并过滤该用户的记录。如果我的理解不正确，请告诉我。这是正确的。卡夫卡只是将每条消息视为一个字节桶，因此任何经过过滤的消息都将被视为一个字节桶