Java Flink任务管理器超时
随着越来越多的记录被处理,我的程序变得非常慢。我最初认为这是由于内存消耗过多,因为我的程序是字符串密集型的(我使用Java 11,因此应尽可能使用紧凑字符串),因此我增加了JVM堆:Java Flink任务管理器超时,java,timeout,apache-flink,taskmanager,Java,Timeout,Apache Flink,Taskmanager,随着越来越多的记录被处理,我的程序变得非常慢。我最初认为这是由于内存消耗过多,因为我的程序是字符串密集型的(我使用Java 11,因此应尽可能使用紧凑字符串),因此我增加了JVM堆: -Xms2048m -Xmx6144m 我还增加了任务管理器的内存和超时时间,flink-conf.yaml: jobmanager.heap.size: 6144m heartbeat.timeout: 5000000 然而,所有这些都无助于解决这个问题。在处理了大约350万条记录之后,这个程序仍然非常慢,只
-Xms2048m
-Xmx6144m
我还增加了任务管理器的内存和超时时间,flink-conf.yaml
:
jobmanager.heap.size: 6144m
heartbeat.timeout: 5000000
然而,所有这些都无助于解决这个问题。在处理了大约350万条记录之后,这个程序仍然非常慢,只剩下大约50万条记录了。当程序接近350万大关时,它会变得非常缓慢,直到最后超时,总执行时间约为11分钟
我在VisualVm中检查了内存消耗,但内存消耗从未超过700MB。我的flink管道如下所示:
final StreamExecutionEnvironment environment = StreamExecutionEnvironment.createLocalEnvironment(1);
environment.setParallelism(1);
DataStream<Tuple> stream = environment.addSource(new TPCHQuery3Source(filePaths, relations));
stream.process(new TPCHQuery3Process(relations)).addSink(new FDSSink());
environment.execute("FlinkDataService");
另外,我的VisualVM监控屏幕截图是在事情变得非常缓慢的时候拍摄的:
以下是我的源函数的运行循环:
while (run) {
readers.forEach(reader -> {
try {
String line = reader.readLine();
if (line != null) {
Tuple tuple = lineToTuple(line, counter.get() % filePaths.size());
if (tuple != null && isValidTuple(tuple)) {
sourceContext.collect(tuple);
}
} else {
closedReaders.add(reader);
if (closedReaders.size() == filePaths.size()) {
System.out.println("ALL FILES HAVE BEEN STREAMED");
cancel();
}
}
counter.getAndIncrement();
} catch (IOException e) {
e.printStackTrace();
}
});
}
我基本上读取了我需要的3个文件中的每一个文件的一行,根据文件的顺序,我构造了一个tuple对象,它是我的自定义类,名为tuple,表示表中的一行,如果该tuple有效,则发出该tuple,即fullfils在日期上的特定条件
我还建议JVM在第一百万、一百五十万、两百万和二百五十万条记录中进行垃圾收集,如下所示:
System.gc()
有没有关于如何优化这个的想法?String
intern()
救了我一命。在将每个字符串存储到地图之前,我都对其进行了实习,这非常有效。这些是我在链接独立群集上更改的属性,用于计算TPC-H查询03
jobmanager.memory.process.size: 1600m
heartbeat.timeout: 100000
taskmanager.memory.process.size: 8g # defaul: 1728m
我实现了这个查询,只对Order表进行流式处理,并将其他表保持为状态。另外,我将计算作为一个无窗口查询,我认为这更有意义,而且速度更快
public class TPCHQuery03 {
private final String topic = "topic-tpch-query-03";
public TPCHQuery03() {
this(PARAMETER_OUTPUT_LOG, "127.0.0.1", false, false, -1);
}
public TPCHQuery03(String output, String ipAddressSink, boolean disableOperatorChaining, boolean pinningPolicy, long maxCount) {
try {
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setStreamTimeCharacteristic(TimeCharacteristic.ProcessingTime);
if (disableOperatorChaining) {
env.disableOperatorChaining();
}
DataStream<Order> orders = env
.addSource(new OrdersSource(maxCount)).name(OrdersSource.class.getSimpleName()).uid(OrdersSource.class.getSimpleName());
// Filter market segment "AUTOMOBILE"
// customers = customers.filter(new CustomerFilter());
// Filter all Orders with o_orderdate < 12.03.1995
DataStream<Order> ordersFiltered = orders
.filter(new OrderDateFilter("1995-03-12")).name(OrderDateFilter.class.getSimpleName()).uid(OrderDateFilter.class.getSimpleName());
// Join customers with orders and package them into a ShippingPriorityItem
DataStream<ShippingPriorityItem> customerWithOrders = ordersFiltered
.keyBy(new OrderKeySelector())
.process(new OrderKeyedByCustomerProcessFunction(pinningPolicy)).name(OrderKeyedByCustomerProcessFunction.class.getSimpleName()).uid(OrderKeyedByCustomerProcessFunction.class.getSimpleName());
// Join the last join result with Lineitems
DataStream<ShippingPriorityItem> result = customerWithOrders
.keyBy(new ShippingPriorityOrderKeySelector())
.process(new ShippingPriorityKeyedProcessFunction(pinningPolicy)).name(ShippingPriorityKeyedProcessFunction.class.getSimpleName()).uid(ShippingPriorityKeyedProcessFunction.class.getSimpleName());
// Group by l_orderkey, o_orderdate and o_shippriority and compute revenue sum
DataStream<ShippingPriorityItem> resultSum = result
.keyBy(new ShippingPriority3KeySelector())
.reduce(new SumShippingPriorityItem(pinningPolicy)).name(SumShippingPriorityItem.class.getSimpleName()).uid(SumShippingPriorityItem.class.getSimpleName());
// emit result
if (output.equalsIgnoreCase(PARAMETER_OUTPUT_MQTT)) {
resultSum
.map(new ShippingPriorityItemMap(pinningPolicy)).name(ShippingPriorityItemMap.class.getSimpleName()).uid(ShippingPriorityItemMap.class.getSimpleName())
.addSink(new MqttStringPublisher(ipAddressSink, topic, pinningPolicy)).name(OPERATOR_SINK).uid(OPERATOR_SINK);
} else if (output.equalsIgnoreCase(PARAMETER_OUTPUT_LOG)) {
resultSum.print().name(OPERATOR_SINK).uid(OPERATOR_SINK);
} else if (output.equalsIgnoreCase(PARAMETER_OUTPUT_FILE)) {
StreamingFileSink<String> sink = StreamingFileSink
.forRowFormat(new Path(PATH_OUTPUT_FILE), new SimpleStringEncoder<String>("UTF-8"))
.withRollingPolicy(
DefaultRollingPolicy.builder().withRolloverInterval(TimeUnit.MINUTES.toMillis(15))
.withInactivityInterval(TimeUnit.MINUTES.toMillis(5))
.withMaxPartSize(1024 * 1024 * 1024).build())
.build();
resultSum
.map(new ShippingPriorityItemMap(pinningPolicy)).name(ShippingPriorityItemMap.class.getSimpleName()).uid(ShippingPriorityItemMap.class.getSimpleName())
.addSink(sink).name(OPERATOR_SINK).uid(OPERATOR_SINK);
} else {
System.out.println("discarding output");
}
System.out.println("Stream job: " + TPCHQuery03.class.getSimpleName());
System.out.println("Execution plan >>>\n" + env.getExecutionPlan());
env.execute(TPCHQuery03.class.getSimpleName());
} catch (IOException e) {
e.printStackTrace();
} catch (Exception e) {
e.printStackTrace();
}
}
public static void main(String[] args) throws Exception {
new TPCHQuery03();
}
}
公共类TPCHQuery03{
私有最终字符串topic=“topic-tpch-query-03”;
公共TPCHQuery03(){
这(参数输出日志,“127.0.0.1”,false,false,-1);
}
公共TPCHQuery03(字符串输出、字符串ipAddressSink、布尔禁用运算符链接、布尔固定策略、长最大计数){
试一试{
StreamExecutionEnvironment env=StreamExecutionEnvironment.getExecutionEnvironment();
环境setStreamTimeCharacteristic(TimeCharacteristic.ProcessingTime);
if(禁用操作员链接){
环境禁用运算符链接();
}
数据流订单=环境
.addSource(新OrdersSource(maxCount)).name(OrdersSource.class.getSimpleName()).uid(OrdersSource.class.getSimpleName());
//过滤器细分市场“汽车”
//customers=customers.filter(新CustomerFilter());
//过滤o_orderdate<12.03.1995的所有订单
DataStream ordersFiltered=订单
.filter(新的OrderDateFilter(“1995-03-12”)).name(OrderDateFilter.class.getSimpleName()).uid(OrderDateFilter.class.getSimpleName());
//将订单加入客户,并将其打包成ShippingPriorityItem
DataStream customerWithOrders=ordersFiltered
.keyBy(新的OrderKeySelector())
.process(新OrderKeyedByCustomerProcessFunction(pinningPolicy)).name(OrderKeyedByCustomerProcessFunction.class.getSimpleName()).uid(OrderKeyedByCustomerProcessFunction.class.getSimpleName());
//使用Lineitems连接最后一个连接结果
数据流结果=customerWithOrders
.keyBy(新的ShippingPriorityOrderKeySelector())
.process(新的ShippingPriorityKeyedProcessFunction(pinningPolicy)).name(ShippingPriorityKeyedProcessFunction.class.getSimpleName()).uid(ShippingPriorityKeyedProcessFunction.class.getSimpleName());
//按l_orderkey、o_orderdate和o_shippriority分组并计算收入总和
DataStream resultSum=结果
.keyBy(新的ShippingPriority3KeySelector())
.reduce(新的SumShippingPriorityItem(pinningPolicy)).name(SumShippingPriorityItem.class.getSimpleName()).uid(SumShippingPriorityItem.class.getSimpleName());
//发射结果
if(output.equalsIgnoreCase(参数输出MQTT)){
结果
.map(新ShippingPriorityItemMap(pinningPolicy)).name(ShippingPriorityItemMap.class.getSimpleName()).uid(ShippingPriorityItemMap.class.getSimpleName())
.addSink(新的MqttStringPublisher(ipAddressSink,主题,固定策略)).name(OPERATOR_SINK).uid(OPERATOR_SINK);
}else if(output.equalsIgnoreCase(参数\u output\u LOG)){
resultSum.print().name(OPERATOR\u SINK).uid(OPERATOR\u SINK);
}else if(output.equalsIgnoreCase(参数输出文件)){
StreamingFileSink接收器=StreamingFileSink
.forRowFormat(新路径(路径输出文件)、新SimpleStringeCoder(“UTF-8”))
.使用滚动策略(
DefaultRollingPolicy.builder(),带RolloverInterval(时间单位:分钟:分钟:15))
.withInactivityInterval(时间单位。分钟。托米利斯(5))
.withMaxPartSize(1024*1024*1024).build())
.build();
结果
.map(新ShippingPriorityItemMap(pinningPolicy)).name(ShippingPriorityItemMap.class.getSimpleName()).uid(ShippingPriorityItemMap.class.getSimpleName())
.addSink(sink).name(OPERATOR\u sink).uid(OPERATOR\u sink);
}否则{
System.out.println(“丢弃输出”);
}
System.out.println(“流作业:+TPCHQuery03.class.getSimpleName());
System.out.println(“执行计划>>\n”+env.getExecution
jobmanager.memory.process.size: 1600m
heartbeat.timeout: 100000
taskmanager.memory.process.size: 8g # defaul: 1728m
public class TPCHQuery03 {
private final String topic = "topic-tpch-query-03";
public TPCHQuery03() {
this(PARAMETER_OUTPUT_LOG, "127.0.0.1", false, false, -1);
}
public TPCHQuery03(String output, String ipAddressSink, boolean disableOperatorChaining, boolean pinningPolicy, long maxCount) {
try {
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setStreamTimeCharacteristic(TimeCharacteristic.ProcessingTime);
if (disableOperatorChaining) {
env.disableOperatorChaining();
}
DataStream<Order> orders = env
.addSource(new OrdersSource(maxCount)).name(OrdersSource.class.getSimpleName()).uid(OrdersSource.class.getSimpleName());
// Filter market segment "AUTOMOBILE"
// customers = customers.filter(new CustomerFilter());
// Filter all Orders with o_orderdate < 12.03.1995
DataStream<Order> ordersFiltered = orders
.filter(new OrderDateFilter("1995-03-12")).name(OrderDateFilter.class.getSimpleName()).uid(OrderDateFilter.class.getSimpleName());
// Join customers with orders and package them into a ShippingPriorityItem
DataStream<ShippingPriorityItem> customerWithOrders = ordersFiltered
.keyBy(new OrderKeySelector())
.process(new OrderKeyedByCustomerProcessFunction(pinningPolicy)).name(OrderKeyedByCustomerProcessFunction.class.getSimpleName()).uid(OrderKeyedByCustomerProcessFunction.class.getSimpleName());
// Join the last join result with Lineitems
DataStream<ShippingPriorityItem> result = customerWithOrders
.keyBy(new ShippingPriorityOrderKeySelector())
.process(new ShippingPriorityKeyedProcessFunction(pinningPolicy)).name(ShippingPriorityKeyedProcessFunction.class.getSimpleName()).uid(ShippingPriorityKeyedProcessFunction.class.getSimpleName());
// Group by l_orderkey, o_orderdate and o_shippriority and compute revenue sum
DataStream<ShippingPriorityItem> resultSum = result
.keyBy(new ShippingPriority3KeySelector())
.reduce(new SumShippingPriorityItem(pinningPolicy)).name(SumShippingPriorityItem.class.getSimpleName()).uid(SumShippingPriorityItem.class.getSimpleName());
// emit result
if (output.equalsIgnoreCase(PARAMETER_OUTPUT_MQTT)) {
resultSum
.map(new ShippingPriorityItemMap(pinningPolicy)).name(ShippingPriorityItemMap.class.getSimpleName()).uid(ShippingPriorityItemMap.class.getSimpleName())
.addSink(new MqttStringPublisher(ipAddressSink, topic, pinningPolicy)).name(OPERATOR_SINK).uid(OPERATOR_SINK);
} else if (output.equalsIgnoreCase(PARAMETER_OUTPUT_LOG)) {
resultSum.print().name(OPERATOR_SINK).uid(OPERATOR_SINK);
} else if (output.equalsIgnoreCase(PARAMETER_OUTPUT_FILE)) {
StreamingFileSink<String> sink = StreamingFileSink
.forRowFormat(new Path(PATH_OUTPUT_FILE), new SimpleStringEncoder<String>("UTF-8"))
.withRollingPolicy(
DefaultRollingPolicy.builder().withRolloverInterval(TimeUnit.MINUTES.toMillis(15))
.withInactivityInterval(TimeUnit.MINUTES.toMillis(5))
.withMaxPartSize(1024 * 1024 * 1024).build())
.build();
resultSum
.map(new ShippingPriorityItemMap(pinningPolicy)).name(ShippingPriorityItemMap.class.getSimpleName()).uid(ShippingPriorityItemMap.class.getSimpleName())
.addSink(sink).name(OPERATOR_SINK).uid(OPERATOR_SINK);
} else {
System.out.println("discarding output");
}
System.out.println("Stream job: " + TPCHQuery03.class.getSimpleName());
System.out.println("Execution plan >>>\n" + env.getExecutionPlan());
env.execute(TPCHQuery03.class.getSimpleName());
} catch (IOException e) {
e.printStackTrace();
} catch (Exception e) {
e.printStackTrace();
}
}
public static void main(String[] args) throws Exception {
new TPCHQuery03();
}
}