Apache flink 如何获得ApacheFlink中过滤器函数中不匹配的值的输出
我是ApacheFlink的新手,我正在尝试过滤以字母N开头的单词,我正在获得输出,但我如何才能获得不以字母N开头的单词,下面是我正在使用的代码Apache flink 如何获得ApacheFlink中过滤器函数中不匹配的值的输出,apache-flink,flink-streaming,flink-cep,Apache Flink,Flink Streaming,Flink Cep,我是ApacheFlink的新手,我正在尝试过滤以字母N开头的单词,我正在获得输出,但我如何才能获得不以字母N开头的单词,下面是我正在使用的代码 package DataStream; import org.apache.flink.api.common.functions.FilterFunction; import org.apache.flink.api.common.functions.FlatMapFunction; import org.apache.flink.api.java.
package DataStream;
import org.apache.flink.api.common.functions.FilterFunction;
import org.apache.flink.api.common.functions.FlatMapFunction;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.util.Collector;
public class WordStream {
public static void main(String[] args) throws Exception {
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
DataStream<String> inputData = env.socketTextStream("localhost", 9999);
DataStream<String> filterData = inputData.filter(new FilterFunction<String>() {
/**
*
*/
private static final long serialVersionUID = 1L;
@Override
public boolean filter(String value) throws Exception {
return value.startsWith("N");
}
});
DataStream<Tuple2<String, Integer>> tokenize = filterData
.flatMap(new FlatMapFunction<String, Tuple2<String, Integer>>() {
@Override
public void flatMap(String value, Collector<Tuple2<String, Integer>> out) throws Exception {
out.collect(new Tuple2<String, Integer>(value, Integer.valueOf(1)));
}
});
DataStream<Tuple2<String, Integer>> counts = tokenize.keyBy(0).sum(1);
counts.print();
env.execute("WordStream");
}
}
您能否建议如何将不匹配的单词捕获到另一个流。我认为您可以利用来实现这一点。只需使用ProcessFunction发出实际收集器中的匹配元素和带有side output标记的未匹配元素,然后从主流中获取side output元素 例如,您的代码可以这样更改
package datastream;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.functions.ProcessFunction;
import org.apache.flink.util.Collector;
import org.apache.flink.util.OutputTag;
public class WordStream {
public static void main(String[] args) throws Exception {
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
DataStream<String> inputData = env.socketTextStream("localhost", 9999);
// Initialize side-output tag to collect the un-matched elements
OutputTag<Tuple2<String, Integer>> unMatchedSideOutput = new OutputTag<Tuple2<String, Integer>>("unmatched-side-output") {};
SingleOutputStreamOperator<Tuple2<String, Integer>> tokenize = inputData
.process(new ProcessFunction<String, Tuple2<String, Integer>>() {
@Override
public void processElement(String value, Context ctx, Collector<Tuple2<String, Integer>> out) {
if (value.startsWith("N")) {
// Emit the data to actual collector
out.collect(new Tuple2<>("Matched=" + value, Integer.valueOf(1)));
} else {
// Emit the un-matched data to side output
ctx.output(unMatchedSideOutput, new Tuple2<>("UnMatched=" + value, Integer.valueOf(1)));
}
}
});
DataStream<Tuple2<String, Integer>> count = tokenize.keyBy(0).sum(1);
// Fetch the un-matched element using side-output tag and process it
DataStream<Tuple2<String, Integer>> unMatchedCount = tokenize.getSideOutput(unMatchedSideOutput).keyBy(0).sum(1);
count.print();
unMatchedCount.print();
env.execute("WordStream");
}
}
我得到以下输出
3> (UnMatched=Hello,1)
4> (Matched=Nevermind,1)
3> (UnMatched=Hello,2)
更简单的解决方案:
DataStream nwords=input.filters->startsWithN;
DataStream others=input.filters->!开始使用n;
我相信这比使用侧输出的解决方案效率稍低,但它仍将在单个任务中运行,使用操作员链接,因此它也不需要ser/de开销或网络
不要误解我的意思-一般来说,端输出是分割流的方式。是否可以在不使用processfunctionNope的情况下对Filterfunction执行相同的操作?您不能这样做,因为,不提供您自己发出的上下文对象。这就是我们必须使用它的原因,它为您提供了丰富的对象,如上下文和收集器。
3> (UnMatched=Hello,1)
4> (Matched=Nevermind,1)
3> (UnMatched=Hello,2)