Java Flink Kafka-如何使应用程序并行运行？_Java_Parallel Processing_Apache Kafka_Apache Flink

Java Flink Kafka-如何使应用程序并行运行？

java parallel-processing apache-kafka apache-flink

Java Flink Kafka-如何使应用程序并行运行？,java,parallel-processing,apache-kafka,apache-flink,Java,Parallel Processing,Apache Kafka,Apache Flink,我正在Flink中创建一个应用程序阅读主题中的消息对它做一些简单的处理将结果写入其他主题我的代码确实有效，但它不是并行运行的我该怎么做？我的代码似乎只在一个线程/块上运行在Flink Web仪表板上：应用程序进入运行状态但是，概览子任务中只显示一个块和接收/发送的字节数，接收/发送的记录始终为零（无更新）这是我的代码，请帮助我学习如何拆分我的应用程序以使其能够并行运行，我编写的应用程序是否正确 public class SimpleApp { public s

我正在Flink中创建一个应用程序

阅读主题中的消息

对它做一些简单的处理

将结果写入其他主题

我的代码确实有效，但它不是并行运行的
我该怎么做？
我的代码似乎只在一个线程/块上运行

在Flink Web仪表板上：

应用程序进入运行状态
但是，概览子任务中只显示一个块
和接收/发送的字节数，接收/发送的记录始终为零（无更新）

这是我的代码，请帮助我学习如何拆分我的应用程序以使其能够并行运行，我编写的应用程序是否正确

public class SimpleApp {

    public static void main(String[] args) throws Exception {

        // create execution environment INPUT
        StreamExecutionEnvironment env_in  =    
                 StreamExecutionEnvironment.getExecutionEnvironment();
        // event time characteristic
        env_in.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);

        // production Ready (Does NOT Work if greater than 1)
        env_in.setParallelism(Integer.parseInt(args[0].toString()));

        // configure kafka consumer
        Properties properties = new Properties();
        properties.setProperty("zookeeper.connect", "localhost:2181");
        properties.setProperty("bootstrap.servers", "localhost:9092");
        properties.setProperty("auto.offset.reset", "earliest");

        // create a kafka consumer
        final DataStream<String> consumer = env_in
                .addSource(new FlinkKafkaConsumer09<>("test", new   
                            SimpleStringSchema(), properties));

        // filter data
        SingleOutputStreamOperator<String> result = consumer.filter(new  
            FilterFunction<String>(){
            @Override
            public boolean filter(String s) throws Exception {
                return s.substring(0, 2).contentEquals("PS");
            }
        });

        // Process Data
        // Transform String Records to JSON Objects
        SingleOutputStreamOperator<JSONObject> data = result.map(new 
                MapFunction<String, JSONObject>()
        {
            @Override
            public JSONObject map(String value) throws Exception
            {
                JSONObject jsnobj = new JSONObject();

                if(value.substring(0, 2).contentEquals("PS"))
                {
                    // 1. Raw Data
                    jsnobj.put("Raw_Data", value.substring(0, value.length()-6));

                    // 2. Comment
                    int first_index_comment = value.indexOf("$");
                    int last_index_comment  = value.lastIndexOf("$") + 1;
                    //   - set comment
                    String comment          =  
                    value.substring(first_index_comment, last_index_comment);
                    comment = comment.substring(0, comment.length()-6);
                    jsnobj.put("Comment", comment);
                }
                else {
                    jsnobj.put("INVALID", value);
                }

                return jsnobj;
            }
        });

        // Write JSON to Kafka Topic
        data.addSink(new FlinkKafkaProducer09<JSONObject>("localhost:9092",
                "FilteredData",
                new SimpleJsonSchema()));

        env_in.execute();
    }
}

公共类SimpleApp{
公共静态void main（字符串[]args）引发异常{
//创建执行环境输入
StreamExecutionEnvironment env_in=
StreamExecutionEnvironment.getExecutionEnvironment（）；
//事件时间特性
env_in.setStreamTimeCharacteristic（TimeCharacteristic.EventTime）；
//生产准备就绪（如果大于1，则不工作）
env_in.setParallelism（Integer.parseInt（args[0].toString（））；
//配置卡夫卡消费者
属性=新属性（）；
setProperty（“zookeeper.connect”，“localhost:2181”）；
setProperty（“bootstrap.servers”，“localhost:9092”）；
properties.setProperty（“auto.offset.reset”、“最早”）；
//创建卡夫卡消费者
最终数据流使用者=env_in
.addSource（新FlinkKafkaConsumer09（“测试”），新
SimpleStringSchema（），属性））；
//过滤数据
SingleOutputStreamOperator结果=consumer.filter（新）
FilterFunction（）{
@凌驾
公共布尔筛选器（字符串s）引发异常{
返回s.substring（0，2）.contentEquals（“PS”）；
}
});
//过程数据
//将字符串记录转换为JSON对象
SingleOutputStreamOperator数据=结果.map（新
映射函数（）
{
@凌驾
公共JSONObject映射（字符串值）引发异常
{
JSONObject jsnobj=新的JSONObject（）；
if（value.substring（0,2）.contentEquals（“PS”））
{
//1.原始数据
jsnobj.put（“原始数据”，value.substring（0，value.length（）-6））；
//2.评论
int first\u index\u comment=value.indexOf（$）；
int last\u index\u comment=value.lastIndexOf（“$”）+1；
//-设置注释
字符串注释=
子字符串（第一个索引注释，最后一个索引注释）；
comment=comment.substring（0，comment.length（）-6）；
jsnobj.put（“评论”，评论）；
}
否则{
jsnobj.put（“无效”，值）；
}
返回jsnobj；
}
});
//将JSON写入Kafka主题
data.addSink（新FlinkKafkaProducer09（“localhost:9092”），
“过滤数据”，
新的SimpleJsonSchema（））；
env_in.execute（）；
}
}

我的代码确实有效，但似乎只在一个线程上运行（显示一个块）在web界面中（不传递数据，因此发送/接收的字节不会更新）

如何使其并行运行？

要并行运行作业，您可以做两件事：

在环境级别增加工作的并行性-例如

StreamExecutionEnvironment env_in= StreamExecutionEnvironment.getExecutionEnvironment（）.setParallelism（4）

但这只会在flink端读取数据后增加并行性，因此，如果源代码生成数据的速度更快，则可能无法充分利用

要完全并行化您的作业，请为您的kafka主题设置多个分区，理想情况下为您的flink作业设置所需的并行量。因此，在创建卡夫卡主题时，您可能需要执行以下操作：

bin/kafka-topics.sh--创建--zookeeper本地主机：2181 --复制因子3—分区4—主题测试

“生产准备就绪（如果大于1则不工作）”您能详细说明一下吗？并尝试使用您的输出流

bin/kafka-console-consumer.sh--引导服务器localhost:9092--主题过滤器数据--从一开始

我就知道了为什么（如果输入多个）崩溃。我必须将Flink server的配置文件设置为每个管理器接受多个任务谢谢，我会将我的接收器附加到一个具有多个分区的主题Show我是否转换代码，以便Flink仪表板显示多个块：1个用于源，1个用于过滤器，1个用于进程，1对于sink@当前仪表板上只显示一个块，因此它不跟踪接收到的字节和要使用的写入字节：

env_in.disableOperatorChaining（）