Java 从卡夫卡到redis的flink管道
我用弗林克读卡夫卡的书,给redis写信 为了测试,我只想阅读卡夫卡的前10条信息。因此,当Java 从卡夫卡到redis的flink管道,java,redis,apache-kafka,apache-flink,Java,Redis,Apache Kafka,Apache Flink,我用弗林克读卡夫卡的书,给redis写信 为了测试,我只想阅读卡夫卡的前10条信息。因此,当计数器=10 AtomicInteger counter = new AtomicInteger(0); FlinkKafkaConsumer08<String> kafkaConsumer = new FlinkKafkaConsumer08<>("my topic", new SimpleStri
计数器=10
AtomicInteger counter = new AtomicInteger(0);
FlinkKafkaConsumer08<String> kafkaConsumer =
new FlinkKafkaConsumer08<>("my topic",
new SimpleStringSchema() {
@Override
public boolean isEndOfStream(String nextElement) {
// It should only read 10 kafka message
return counter.getAndIncrement() > 9;
}
},
properties);
当我将条件更改为counter.getAndIncrement()>8
时,它会向redis写入27条消息。总是三倍
完整代码:
public class FlinkEntry {
private final static JedisCluster JEDIS_CLUSTER;
static {
Set<HostAndPort> hostAndPorts = new HashSet<>();
hostAndPorts.add(new HostAndPort("localhost", 7001));
JEDIS_CLUSTER = new JedisCluster(hostAndPorts);
}
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment environment = StreamExecutionEnvironment.getExecutionEnvironment();
FlinkKafkaConsumer08<String> kafkaConsumer = createKafkaConsumer();
DataStream<String> dataStream = environment.addSource(kafkaConsumer);
SinkFunction<String> redisSink = createRedisSink();
dataStream.addSink(redisSink);
environment.execute();
}
private static FlinkKafkaConsumer08<String> createKafkaConsumer() {
Properties properties = new Properties();
//... set kafka property
AtomicInteger counter = new AtomicInteger(0);
FlinkKafkaConsumer08<String> kafkaConsumer =
new FlinkKafkaConsumer08<>("my topic",
new SimpleStringSchema() {
@Override
public boolean isEndOfStream(String nextElement) {
// It should only read 10 kafka message
return counter.getAndIncrement() > 9;
}
},
properties);
kafkaConsumer.setStartFromLatest();
return kafkaConsumer;
}
private static SinkFunction<String> createRedisSink() {
return new SinkFunction<String>() {
@Override
public void invoke(String value, Context context) {
JEDIS_CLUSTER.lpush("rtp:example", value);
JEDIS_CLUSTER.expire("rtp:example", 10 * 60);
}
};
}
}
公共类{
私人最终静态绝地集群;
静止的{
Set hostAndPorts=new HashSet();
添加(新的HostAndPort(“localhost”,7001));
绝地集群=新绝地集群(主机和端口);
}
公共静态void main(字符串[]args)引发异常{
StreamExecutionEnvironment环境=StreamExecutionEnvironment.getExecutionEnvironment();
FlinkKafkaConsumer08 kafkaConsumer=createKafkaConsumer();
DataStream DataStream=environment.addSource(kafkaConsumer);
SinkFunction redisSink=createRedisSink();
dataStream.addSink(redisSink);
execute();
}
私有静态FlinkKafkaConsumer08 createKafkaConsumer(){
属性=新属性();
//…设置卡夫卡属性
AtomicInteger计数器=新的AtomicInteger(0);
FlinkKafkaConsumer08卡夫卡消费者=
新FlinkKafkaConsumer08(“我的主题”,
新的SimpleStringSchema(){
@凌驾
公共布尔值isEndOfStream(字符串nextElement){
//它应该只读10卡夫卡消息
返回计数器.getAndIncrement()>9;
}
},
财产);
kafkaConsumer.setStartFromLatest();
返回卡夫卡消费者;
}
私有静态SinkFunction createRedisSink(){
返回新函数(){
@凌驾
公共void调用(字符串值、上下文){
JEDIS_CLUSTER.lpush(“rtp:example”,value);
绝地武士团。过期(“rtp:示例”,10*60);
}
};
}
}
理解这一点的一种方法是通过调用
env.disableOperatorChaining();
然后看一些指标——例如,源的numRecordsOut和汇的numRecordsIn。我还要再次检查整个作业是否在并行度设置为1的情况下运行
您需要禁用链接,因为否则整个作业将崩溃为单个任务,并且不会为两个操作员之间的通信收集任何指标。您是否尝试将kafka使用者的并行性修复为1?胡乱猜测…@TobiSH怎么做?试试environment.setParallelism(1)
@DavidAnderson我通过web UI提交作业,并将并行度设置为1。我认为它也有同样的作用是什么让你认为FlinkKafkaConsumer08关注isEndOfStream?我看到Kafka09Fetcher有,但在旧版本中没有看到。
env.disableOperatorChaining();