Apache flink Apache flink-限制暴露的度量的数量_Apache Flink_Flink Streaming

Apache flink Apache flink-限制暴露的度量的数量

apache-flink

Apache flink Apache flink-限制暴露的度量的数量,apache-flink,flink-streaming,Apache Flink,Flink Streaming,我们有一个大约30名操作员的flink工作。当我们以12个flink并行运行此作业时，总共输出400.000个度量，这对于我们的度量平台来说太多了，无法很好地处理当查看这类指标时，这似乎不是一个bug或类似的东西只是当有许多操作员拥有许多TaskManager和TaskSlot时，度量的数量经常被复制到400.000（可能作业重新启动也会复制度量的数量？）这是我用于度量的配置： metrics.reporters: graphite metrics.reporter.graphite.cl

我们有一个大约30名操作员的flink工作。当我们以12个flink并行运行此作业时，总共输出400.000个度量，这对于我们的度量平台来说太多了，无法很好地处理

当查看这类指标时，这似乎不是一个bug或类似的东西

只是当有许多操作员拥有许多TaskManager和TaskSlot时，度量的数量经常被复制到400.000（可能作业重新启动也会复制度量的数量？）

这是我用于度量的配置：

metrics.reporters: graphite
metrics.reporter.graphite.class: org.apache.flink.metrics.graphite.GraphiteReporter
metrics.reporter.graphite.host: some-host.com
metrics.reporter.graphite.port: 2003
metrics.reporter.graphite.protocol: TCP
metrics.reporter.graphite.interval: 60 SECONDS
metrics.scope.jm: applications.__ENVIRONMENT__.__APPLICATION__.<host>.jobmanager
metrics.scope.jm.job: applications.__ENVIRONMENT__.__APPLICATION__.<host>.jobmanager.<job_name>
metrics.scope.tm: applications.__ENVIRONMENT__.__APPLICATION__.<host>.taskmanager.<tm_id>
metrics.scope.tm.job: applications.__ENVIRONMENT__.__APPLICATION__.<host>.taskmanager.<tm_id>.<job_name>
metrics.scope.task: applications.__ENVIRONMENT__.__APPLICATION__.<host>.taskmanager.<tm_id>.<job_name>.<task_id>.<subtask_index>
metrics.scope.operator: applications.__ENVIRONMENT__.__APPLICATION__.<host>.taskmanager.<tm_id>.<job_name>.<operator_id>.<subtask_index>

metrics.reporters:graphite
metrics.reporter.graphite.class:org.apache.flink.metrics.graphite.GraphiteReporter
metrics.reporter.graphite.host:some-host.com
metrics.reporter.graphite.port:2003
metrics.reporter.graphite.protocol:TCP
metrics.reporter.graphite.interval:60秒
metrics.scope.jm：应用程序、环境、应用程序、作业管理器
metrics.scope.jm.job:applications.\uuuuu ENVIRONMENT.\uuuuu APPLICATION.\uuuuuu..作业管理器。
metrics.scope.tm：应用程序、环境、应用程序、任务管理器。
metrics.scope.tm.job:应用程序、环境、应用程序、任务管理器。。
metrics.scope.task:应用程序。\环境。\应用程序。\任务管理器。。。。
metrics.scope.operator:应用程序.\uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu。。。。

由于我们不需要全部400.000个延迟，是否有可能影响正在公开的度量？

您可能正在经历Flink某些版本中延迟度量的基数爆炸，其中延迟从每个源子任务跟踪到每个操作员子任务。这在Flink 1.7中得到了解决。有关详细信息，请参阅和

对于快速修复，您可以尝试通过将

metrics.latency.interval

配置为0来禁用延迟跟踪。

我应该添加它，但我们正在使用1.4.2查看您链接的问题，该版本不应该受到延迟问题的影响，对吗？虽然这种行为似乎与我们看到的一致，但以前的版本也受到大量延迟指标的影响。