Apache flink Flink中的处理流

Apache flink Flink中的处理流,apache-flink,flink-streaming,Apache Flink,Flink Streaming,我需要按如下方式处理消息: 每个消息必须在所有进程中保持不变地传递 如果进程检测到模式匹配,则必须将消息的副本传递给另一个线程 我试着使用OutputTag OutputTag<SysmonPartial> filtersOutput = new OutputTag<SysmonPartial>("FiltersOutput"){}; DataStream<SysmonPartial> kafkaSource = env.addSou

我需要按如下方式处理消息:

  • 每个消息必须在所有进程中保持不变地传递
  • 如果进程检测到模式匹配,则必须将消息的副本传递给另一个线程
  • 我试着使用OutputTag

     OutputTag<SysmonPartial> filtersOutput = new OutputTag<SysmonPartial>("FiltersOutput"){};
    
     DataStream<SysmonPartial> kafkaSource = env.addSource(consumer);
    
            DataStream<SysmonPartial> source = kafkaSource.rebalance();
    
            SingleOutputStreamOperator<SysmonPartial> s = source
                    .process(lambda1).name("lambda1").startNewChain()
                    .process(lambda2).name("lambda2").startNewChain()
                    .process(lambda3).startNewChain()
                    .process(lambda4).startNewChain()
                    .process(lambda5).startNewChain()
                    .process(lambda6).startNewChain()
                    .process(lambda7).startNewChain();
    
            SingleOutputStreamOperator<Any> output = s
                    .getSideOutput(filtersOutput)
                    .process(filterProcessFunction).setParallelism(1).startNewChain();
    
            output.addSink(sink).setParallelism(1);
    
            env.execute(jobName);
    
    
    OutputTag filtersOutput=新的OutputTag(“filtersOutput”){;
    DataStream kafkaSource=env.addSource(消费者);
    DataStream source=kafkaSource.rebalance();
    SingleOutputStreamOperator s=源
    .process(lambda1).name(“lambda1”).startNewChain()
    .process(lambda2).name(“lambda2”).startNewChain()
    .process(lambda3).startNewChain()
    .process(lambda4).startNewChain()
    .process(lambda5).startNewChain()
    .process(lambda6).startNewChain()
    .process(lambda7.startNewChain();
    
    SingleOutputStreamOperator),OutputTag用于创建带有标记的附加消息。还是我错了?

    连接作业图的方式意味着在作业结束时,您可以访问侧面输出

    SingleOutputStreamOperator输出=s
    .getSideOutput(过滤器输出)
    
    您只得到最后一个进程函数放在侧面输出上的内容——即,只得到lambda7发出的事件

    我相信你打算做的事情可以表达为

    SingleOutputStreamOperator s1=source.process(lambda1.name(“lambda1”);
    SingleOutputStreamOperator s2=s1.process(lambda2).name(“lambda2”);
    SingleOutputStreamOperator s3=s2.进程(lambda3);
    ...
    DataStream side1=s1.getSideOutput(filtersOutput);
    DataStream side2=s2.getSideOutput(filtersOutput);
    DataStream side3=s3.getSideOutput(filtersOutput);
    ...
    SingleOutputStreamOperator输出=side1.union(side2,side3,…)
    .进程(filterProcessFunction)
    ...
    
    另外,我认为您不应该使用
    startNewChain()
    (这将禁用操作符链接)。操作员链接是一种有价值的优化,只应在特殊情况下禁用

     OutputTag<SysmonPartial> filtersOutput = new OutputTag<SysmonPartial>("FiltersOutput"){};
    
     DataStream<SysmonPartial> kafkaSource = env.addSource(consumer);
    
            DataStream<SysmonPartial> source = kafkaSource.rebalance();
    
            SingleOutputStreamOperator<SysmonPartial> s = source
                    .process(lambda1).name("lambda1").startNewChain()
                    .process(lambda2).name("lambda2").startNewChain()
                    .process(lambda3).startNewChain()
                    .process(lambda4).startNewChain()
                    .process(lambda5).startNewChain()
                    .process(lambda6).startNewChain()
                    .process(lambda7).startNewChain();
    
            SingleOutputStreamOperator<Any> output = s
                    .getSideOutput(filtersOutput)
                    .process(filterProcessFunction).setParallelism(1).startNewChain();
    
            output.addSink(sink).setParallelism(1);
    
            env.execute(jobName);
    
    
    public class Bypass_WS_01_03 extends FilterTagFuction<SysmonPartial, SysmonPartial, SysmonPartial> {
        private static final Pattern p_1 = Pattern.compile("pattern1");
        private static final Pattern p_0 = Pattern.compile("pattern2");
    
        @Override
        public void processElement(SysmonPartial t, Context ctx, Collector<SysmonPartial> out) throws Exception {
            out.collect(t);
            if (
                    "1".equals(t.B_VendorEventID) &&
                            t.CommandLine != null && t.CommandLine.length() != 0 && p_0.matcher(t.CommandLine).find() &&
                            t.ImageName != null && t.ImageName.length() != 0 && p_1.matcher(t.ImageName).find()
            ) {
                t.RuleId = "Bypass_WS_01_03";
                ctx.output(getOutputTag(), t);
            }
        }
    }