Apache flink connectedStreams在Apache Flink中是如何工作的
我遵循弗林克官方文件中的示例,试图了解Apache flink connectedStreams在Apache Flink中是如何工作的,apache-flink,flink-streaming,Apache Flink,Flink Streaming,我遵循弗林克官方文件中的示例,试图了解connectedStreams是如何工作的。以下是一个例子: 执行此作业后,在日志文件/log/flink-root-taskexecutor-0-localhost.localdomain.log中,我可以看到 2020-07-25 02:40:30,152 INFO myflink.StreamingJob - flatMap1111111: 2020-07-25 02
connectedStreams
是如何工作的。以下是一个例子:
执行此作业后,在日志文件/log/flink-root-taskexecutor-0-localhost.localdomain.log
中,我可以看到
2020-07-25 02:40:30,152 INFO myflink.StreamingJob - flatMap1111111:
2020-07-25 02:40:30,153 INFO myflink.StreamingJob - flatMap1111111:
2020-07-25 02:40:30,174 INFO myflink.StreamingJob - flatMap2222222:
2020-07-25 02:40:30,174 INFO myflink.StreamingJob - flatMap2222222:
2020-07-25 02:40:30,174 INFO myflink.StreamingJob - flatMap2222222:
2020-07-25 02:40:30,174 INFO myflink.StreamingJob - flatMap2222222:
如你所见,它们都是空的
我是否做错了什么或误解了连接流的工作方式?查找文件
*.out
,而不是*.log
。预期产出为:
2020-07-24 16:18:21,083 INFO org.apache.flink.runtime.state.heap.HeapKeyedStateBackend - Initializing heap keyed state backend with stream factory.
3> Apache
4> Flink
2020-07-24 16:18:21,126 INFO org.apache.flink.runtime.taskmanager.Task
在flatmap 2
上使用out.collect()
,或者print()
在这种情况下不起作用。这是工作版本,它生成上面的输出。第一个流“Apache”、“DROP”、“Flink”、“IGNORE”
被共享变量阻塞
使用和过滤。第二个流“DROP”、“IGNORE”
修改了被阻塞的共享变量
package org.sense.flink.examples.stream.tests;
import org.apache.flink.api.common.state.ValueState;
import org.apache.flink.api.common.state.ValueStateDescriptor;
import org.apache.flink.configuration.Configuration;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.functions.co.RichCoFlatMapFunction;
import org.apache.flink.util.Collector;
public class ConnectedStreamTest {
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
DataStream<String> control = env.fromElements("DROP", "IGNORE").keyBy(x -> x);
DataStream<String> streamOfWords = env.fromElements("Apache", "DROP", "Flink", "IGNORE").keyBy(x -> x);
control
.connect(streamOfWords)
.flatMap(new ControlFunction())
.print();
env.execute();
}
private static class ControlFunction extends RichCoFlatMapFunction<String, String, String> {
private ValueState<Boolean> blocked;
@Override
public void open(Configuration config) {
blocked = getRuntimeContext().getState(new ValueStateDescriptor<>("blocked", Boolean.class));
}
@Override
public void flatMap1(String control_value, Collector<String> out) throws Exception {
blocked.update(Boolean.TRUE);
}
@Override
public void flatMap2(String data_value, Collector<String> out) throws Exception {
if (blocked.value() == null) {
out.collect(data_value);
}
}
}
}
package org.sense.flink.examples.stream.tests;
导入org.apache.flink.api.common.state.ValueState;
导入org.apache.flink.api.common.state.ValueStateDescriptor;
导入org.apache.flink.configuration.configuration;
导入org.apache.flink.streaming.api.datastream.datastream;
导入org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
导入org.apache.flink.streaming.api.functions.co.RichCoFlatMapFunction;
导入org.apache.flink.util.Collector;
公共类ConnectedStreamTest{
公共静态void main(字符串[]args)引发异常{
StreamExecutionEnvironment env=StreamExecutionEnvironment.getExecutionEnvironment();
DataStream control=env.fromElements(“DROP”、“IGNORE”).keyBy(x->x);
DataStream streamOfWords=env.frommelements(“Apache”、“DROP”、“Flink”、“IGNORE”).keyBy(x->x);
控制
.connect(streamOfWords)
.flatMap(新的ControlFunction())
.print();
execute();
}
私有静态类控制函数扩展了RichCoFlatMapFunction{
私人价值观被封锁;
@凌驾
公共无效打开(配置){
blocked=getRuntimeContext().getState(新的ValueStateDescriptor(“blocked”,Boolean.class));
}
@凌驾
公共void flatMap1(字符串控制_值,收集器输出)引发异常{
blocked.update(Boolean.TRUE);
}
@凌驾
公共void flatMap2(字符串数据_值,收集器输出)引发异常{
if(blocked.value()==null){
输出。收集(数据值);
}
}
}
}
修改后的作业(带有日志记录的版本)是否仍有连接到作业的打印接收器?否则,将不执行ControlFunction。如果您只是将日志添加到原始代码中,它应该可以工作。@DavidAnderson是的。代码control.connect(streamOfWords).flatMap(newcontrolfunction()).print();环境执行(“StreamingJob”)代码>仍然存在。
2020-07-24 16:18:21,083 INFO org.apache.flink.runtime.state.heap.HeapKeyedStateBackend - Initializing heap keyed state backend with stream factory.
3> Apache
4> Flink
2020-07-24 16:18:21,126 INFO org.apache.flink.runtime.taskmanager.Task
package org.sense.flink.examples.stream.tests;
import org.apache.flink.api.common.state.ValueState;
import org.apache.flink.api.common.state.ValueStateDescriptor;
import org.apache.flink.configuration.Configuration;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.functions.co.RichCoFlatMapFunction;
import org.apache.flink.util.Collector;
public class ConnectedStreamTest {
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
DataStream<String> control = env.fromElements("DROP", "IGNORE").keyBy(x -> x);
DataStream<String> streamOfWords = env.fromElements("Apache", "DROP", "Flink", "IGNORE").keyBy(x -> x);
control
.connect(streamOfWords)
.flatMap(new ControlFunction())
.print();
env.execute();
}
private static class ControlFunction extends RichCoFlatMapFunction<String, String, String> {
private ValueState<Boolean> blocked;
@Override
public void open(Configuration config) {
blocked = getRuntimeContext().getState(new ValueStateDescriptor<>("blocked", Boolean.class));
}
@Override
public void flatMap1(String control_value, Collector<String> out) throws Exception {
blocked.update(Boolean.TRUE);
}
@Override
public void flatMap2(String data_value, Collector<String> out) throws Exception {
if (blocked.value() == null) {
out.collect(data_value);
}
}
}
}