Java 卡夫卡溪流圆形机器人切割器_Java_Apache Kafka_Apache Kafka Streams

Java 卡夫卡溪流圆形机器人切割器

java apache-kafka

Java 卡夫卡溪流圆形机器人切割器,java,apache-kafka,apache-kafka-streams,Java,Apache Kafka,Apache Kafka Streams,我编写了一个kafka streams代码，它使用kafka 2.4 kafka客户端版本和kafka 2.2服务器版本。我的主题和内部主题有50个分区我的kafka流代码有selectKey（）DSL操作，我有200万条使用相同密钥的记录。在流配置中，我已经完成了 props.put（ProducerConfig.PARTITIONER\u CLASS\u CONFIG，RoundRobinPartitioner.CLASS）因此，我能够使用完全相同的密钥使用不同的分区。如果我没有像预期的

我编写了一个kafka streams代码，它使用kafka 2.4 kafka客户端版本和kafka 2.2服务器版本。我的主题和内部主题有50个分区

我的kafka流代码有selectKey（）DSL操作，我有200万条使用相同密钥的记录。在流配置中，我已经完成了

props.put（ProducerConfig.PARTITIONER\u CLASS\u CONFIG，RoundRobinPartitioner.CLASS）
因此，我能够使用完全相同的密钥使用不同的分区。如果我没有像预期的那样使用循环，我的所有消息都将转到同一分区
直到现在一切都很好，但我意识到；当我使用RoundRobinPartitioner类时，我的消息大约有40个分区。10分区处于空闲状态。我想知道我错过了什么？它应该使用其中50个，大约200万条记录，对吗
      final KStream<String, IdListExportMessage> exportedDeviceIdsStream =
            builder.stream("deviceIds");

        // k: appId::deviceId, v: device
        final KTable<String, Device> deviceTable = builder.table(
            "device",
            Consumed.with(Serdes.String(), deviceSerde)
        );
            // Some DSL operations
            .join(
                deviceTable,
                (exportedDevice, device) -> {
                    exportedDevice.setDevice(device);

                    return exportedDevice;
                },
                Joined.with(Serdes.String(), exportedDeviceSerde, deviceSerde)
            )
            .selectKey((deviceId, exportedDevice) -> exportedDevice.getDevice().getId())
            .to("bulk_consumer");

RoundRobinPartitioner.java
public class RoundRobinPartitioner implements Partitioner {
    private final ConcurrentMap<String, AtomicInteger> topicCounterMap = new ConcurrentHashMap();

    public RoundRobinPartitioner() {
    }

    public void configure(Map<String, ?> configs) {
    }

    public int partition(String topic, Object key, byte[] keyBytes, Object value, byte[] valueBytes, Cluster cluster) {
        List<PartitionInfo> partitions = cluster.partitionsForTopic(topic);
        int numPartitions = partitions.size();
        int nextValue = this.nextValue(topic);
        List<PartitionInfo> availablePartitions = cluster.availablePartitionsForTopic(topic);
        if (!availablePartitions.isEmpty()) {
            int part = Utils.toPositive(nextValue) % availablePartitions.size();
            return ((PartitionInfo)availablePartitions.get(part)).partition();
        } else {
            return Utils.toPositive(nextValue) % numPartitions;
        }
    }

    private int nextValue(String topic) {
        AtomicInteger counter = (AtomicInteger)this.topicCounterMap.computeIfAbsent(topic, (k) -> {
            return new AtomicInteger(0);
        });
        return counter.getAndIncrement();
    }

    public void close() {
    }
}

公共类RoundRobinPartitioner实现了分区器{
私有最终ConcurrentMap topicCounterMap=新ConcurrentHashMap（）；
公共RoundRobinPartitioner（）{
}
公共无效配置（映射配置）{
}
公共int分区（字符串主题、对象键、字节[]键字节、对象值、字节[]值字节、集群）{
列表分区=cluster.partitionsForTopic（主题）；
int numPartitions=partitions.size（）；
int-nextValue=this.nextValue（主题）；
List availablePartitions=cluster.availablePartitionsForTopic（主题）；
如果（！availablePartitions.isEmpty（））{
int part=Utils.toPositive（nextValue）%availablePartitions.size（）；
return（（PartitionInfo）availablePartitions.get（part））.partition（）；
}否则{
返回Utils.toPositive（nextValue）%numPartitions；
}
}
private int nextValue（字符串主题）{
AtomicInteger计数器=（AtomicInteger）this.topicCounterMap.computeIfAbsent（主题，（k）->{
返回新的原子整数（0）；
});
返回计数器。getAndIncrement（）；
}
公众假期结束（）{
}
}
不能使用ProducerConfig.PARTITIONER\u CLASS\u CONFIG
配置更改分区--这仅适用于普通生产者
在Kafka Streams中，您需要实现接口StreamsPartitioner
，并将实现传递给相应的操作符，例如，to（“topic”，producted.streamPartitioner（new MyPartitioner（））
你能分享制作人向卡夫卡集群写入数据的代码片段吗？请出示你的分区代码您好，我已经编辑了我的问题谢谢先生，但实际上如果我使用ProducerConfig.partitioner\u CLASS\u CONFIG，我看到只有一个键是分布式内部主题分区，我唯一的内部主题是由selectKey（）创建的部分。如果我不使用此属性，我可以看到分区之间的延迟。但是我不能按照您在selectKey部分和Actual.to（）部分中所说的做，希望我生成。with（key，value）：/my bad，方法名称是producted.streamPartitioner（StreamPartitionerYep，这对to（“主题”）来说是可以的，生成.streamPartitioner（新的MyPartitioner（）），但不适用于.selectKey（）。但仍然是props.put（ProducerConfig.PARTITIONER\u CLASS\u CONFIG，RoundRobinPartitioner.CLASS）；适用于selectKey部分。只是不明白为什么我不能使用所有分区，但可以使用~40分区selectKey（）不需要它
因为selectKey（）不写入主题。--不确定您所说的是什么意思，但仍然是props.put（ProducerConfig.PARTITIONER\u CLASS\u CONFIG，RoundRobinPartitioner.CLASS）；适用于selectKey部分
——坦率地说，我想知道的是，配置有什么影响——Kafka Streams应该完全忽略它，不管有没有它，都应该以某种方式进行操作……？不选择key（）写入内部主题？->->不。它只“标记流”密钥被更改了——只有当你运行GROUBYBYKY或加入下游时，重新分区主题才会被创建。Ie，你可以考虑重新分区为“懒惰”：它只在必要时完成，并且改变密钥本身不需要重新分区。
public class RoundRobinPartitioner implements Partitioner {
    private final ConcurrentMap<String, AtomicInteger> topicCounterMap = new ConcurrentHashMap();

    public RoundRobinPartitioner() {
    }

    public void configure(Map<String, ?> configs) {
    }

    public int partition(String topic, Object key, byte[] keyBytes, Object value, byte[] valueBytes, Cluster cluster) {
        List<PartitionInfo> partitions = cluster.partitionsForTopic(topic);
        int numPartitions = partitions.size();
        int nextValue = this.nextValue(topic);
        List<PartitionInfo> availablePartitions = cluster.availablePartitionsForTopic(topic);
        if (!availablePartitions.isEmpty()) {
            int part = Utils.toPositive(nextValue) % availablePartitions.size();
            return ((PartitionInfo)availablePartitions.get(part)).partition();
        } else {
            return Utils.toPositive(nextValue) % numPartitions;
        }
    }

    private int nextValue(String topic) {
        AtomicInteger counter = (AtomicInteger)this.topicCounterMap.computeIfAbsent(topic, (k) -> {
            return new AtomicInteger(0);
        });
        return counter.getAndIncrement();
    }

    public void close() {
    }
}