Parallel processing ApacheFlink:KeyedStream上的倾斜数据分布

Parallel processing ApacheFlink:KeyedStream上的倾斜数据分布,parallel-processing,apache-flink,flink-streaming,windowing,Parallel Processing,Apache Flink,Flink Streaming,Windowing,我在Flink中有以下Java代码: env.setParallelism(6); //Read from Kafka topic with 12 partitions DataStream<String> line = env.addSource(myConsumer); //Filter half of the records DataStream<Tuple2<String, Integer>> line_Num_Odd = line_Num.fi

我在Flink中有以下Java代码:

env.setParallelism(6);

//Read from Kafka topic with 12 partitions
DataStream<String> line = env.addSource(myConsumer);

//Filter half of the records 
DataStream<Tuple2<String, Integer>> line_Num_Odd = line_Num.filter(new FilterOdd());
DataStream<Tuple3<String, String, Integer>> line_Num_Odd_2 = line_Num_Odd.map(new OddAdder());

//Filter the other half
DataStream<Tuple2<String, Integer>> line_Num_Even = line_Num.filter(new FilterEven());
DataStream<Tuple3<String, String, Integer>> line_Num_Even_2 = line_Num_Even.map(new EvenAdder());

//Join all the data again
DataStream<Tuple3<String, String, Integer>> line_Num_U = line_Num_Odd_2.union(line_Num_Even_2);

//Window
DataStream<Tuple3<String, String, Integer>> windowedLine_Num_U_K = line_Num_U
                .keyBy(1)
                .window(TumblingProcessingTimeWindows.of(Time.seconds(10)))
                .reduce(new Reducer());
env.setParallelism(6);
//阅读带有12个分区的卡夫卡主题
数据流行=环境添加源(myConsumer);
//过滤一半的记录
DataStream line_Num_Odd=line_Num.filter(new FilterOdd());
数据流line_Num_Odd_2=line_Num_Odd.map(new odddadder());
//过滤另一半
DataStream line_Num_偶数=line_Num.filter(new FilterEven());
DataStream line_Num_Even_2=line_Num_Even.map(新的EvenAdder());
//再次加入所有数据
数据流行数=行数奇数并(行数偶数);
//窗口
DataStream windowedLine\U Num\U K=line\U Num\U
.keyBy(1)
.window(TumblingProcessingTimeWindows.of(Time.seconds(10)))
.减少(新减速器());
问题是窗口应该能够以parallelism=2的方式处理,因为在Tuple3的第二个字符串中有两组不同的数据,其键为“奇”和“偶”。一切都是以并行度6运行的,但不是以并行度=1运行的窗口,因为我的要求,我只需要它具有并行度=2

代码中使用的功能如下:

public static class FilterOdd implements FilterFunction<Tuple2<String, Integer>> {

    public boolean filter(Tuple2<String, Integer> line) throws Exception {
        Boolean isOdd = (Long.valueOf(line.f0.split(" ")[0]) % 2) != 0;
        return isOdd;
    }
};


public static class FilterEven implements FilterFunction<Tuple2<String, Integer>> {

    public boolean filter(Tuple2<String, Integer> line) throws Exception {
        Boolean isEven = (Long.valueOf(line.f0.split(" ")[0]) % 2) == 0;
        return isEven;
    }
};

public static class OddAdder implements MapFunction<Tuple2<String, Integer>, Tuple3<String, String, Integer>> {

    public Tuple3<String, String, Integer> map(Tuple2<String, Integer> line) throws Exception {
        Tuple3<String, String, Integer> newLine = new Tuple3<String, String, Integer>(line.f0, "odd", line.f1);
        return newLine;
    }
};


public static class EvenAdder implements MapFunction<Tuple2<String, Integer>, Tuple3<String, String, Integer>> {

    public Tuple3<String, String, Integer> map(Tuple2<String, Integer> line) throws Exception {
        Tuple3<String, String, Integer> newLine = new Tuple3<String, String, Integer>(line.f0, "even", line.f1);
        return newLine;
    }
};

public static class Reducer implements ReduceFunction<Tuple3<String, String, Integer>> {

    public Tuple3<String, String, Integer> reduce(Tuple3<String, String, Integer> line1,
            Tuple3<String, String, Integer> line2) throws Exception {
        Long sum = Long.valueOf(line1.f0.split(" ")[0]) + Long.valueOf(line2.f0.split(" ")[0]);
        Long sumTS = Long.valueOf(line1.f0.split(" ")[1]) + Long.valueOf(line2.f0.split(" ")[1]);
        Tuple3<String, String, Integer> newLine = new Tuple3<String, String, Integer>(String.valueOf(sum) +
                " " + String.valueOf(sumTS), line1.f1, line1.f2 + line2.f2);
        return newLine;
    }
};
公共静态类FilterOdd实现FilterFunction{
公共布尔筛选器(Tuple2行)引发异常{
布尔值isOdd=(Long.valueOf(line.f0.split(“”[0])%2)!=0;
返回isOdd;
}
};
公共静态类FilterEvent实现FilterFunction{
公共布尔筛选器(Tuple2行)引发异常{
布尔值isEven=(Long.valueOf(line.f0.split(“”[0])%2)=0;
返回isEven;
}
};
公共静态类OddAdder实现映射函数{
公共Tuple3映射(Tuple2行)引发异常{
Tuple3换行符=新的Tuple3(line.f0,“奇数”,line.f1);
返回换行符;
}
};
公共静态类EvenAdder实现映射函数{
公共Tuple3映射(Tuple2行)引发异常{
Tuple3换行符=新的Tuple3(line.f0,“偶数”,line.f1);
返回换行符;
}
};
公共静态类Reducer实现ReduceFunction{
公共Tuple3减少(Tuple3第1行,
Tuple3(第2行)引发异常{
Long sum=Long.valueOf(line1.f0.split(“”[0])+Long.valueOf(line2.f0.split(“”[0]);
Long sumTS=Long.valueOf(line1.f0.split(“”[1])+Long.valueOf(line2.f0.split(“”[1]);
Tuple3换行符=新的Tuple3(String.valueOf(sum)+
“”+String.valueOf(sumTS),line1.f1,line1.f2+line2.f2);
返回换行符;
}
};
谢谢你的帮助


解决方案:我已将密钥的内容从“奇数”和“偶数”更改为“odd0000”和“even1111”,它现在工作正常。

通过哈希分区将密钥分发给工作线程。这意味着键值是散列的,线程由模#工作者确定。对于两个键和两个线程,很有可能两个键都被分配到同一个线程


您可以尝试使用散列值分布在两个线程中的不同键值

谢谢!!我已将其从“奇数”/“偶数”更改为“奇数0000”/“偶数1111”,并且它现在正在工作:D。唯一的问题是我有两个工作线程,并且两个线程都在同一台机器中,是否有任何方法强制将每个线程放入不同的工作线程?这取决于您的设置。您可以使用单个插槽启动辅助线,但这取决于您是在纱线、Mesos还是其他设备上运行,您无法控制辅助线的启动位置。我运行的是独立模式。是的,我已经考虑过一个任务槽,但我希望有一些性能..哈哈,我会这样离开,但吞吐量受到两个工人之一的限制:(