Grafana Telegraf问题:[outputs.influxdb]度量缓冲区溢出;已删除3645个指标

Grafana Telegraf问题:[outputs.influxdb]度量缓冲区溢出;已删除3645个指标,grafana,influxdb,telegraf,Grafana,Influxdb,Telegraf,我试图使用telegraf从1100个虚拟机的vCenter中提取指标,并将这些数据存储在XDB中。指标正在“成功”提取,然后显示在grafana上创建的仪表板上 然而,当我检查telegraf的状态时,有报道称3000-11000个指标被删除。我不确定这是否是telegraf或XDB配置的问题 Telegraf vsphere配置: # Read metrics from VMware vCenter [[inputs.vsphere]] ## List of vCenter URL

我试图使用telegraf从1100个虚拟机的vCenter中提取指标,并将这些数据存储在XDB中。指标正在“成功”提取,然后显示在grafana上创建的仪表板上

然而,当我检查telegraf的状态时,有报道称3000-11000个指标被删除。我不确定这是否是telegraf或XDB配置的问题

Telegraf vsphere配置:

# Read metrics from VMware vCenter
 [[inputs.vsphere]]
    ## List of vCenter URLs to be monitored. These three lines must be uncommented
 ## and edited for the plugin to work.
 vcenters = [ "https:/***/sdk" ]
    username = "***"
    password = “***"
 #
 ## VMs
 ## Typical VM metrics (if omitted or empty, all metrics are collected)
 vm_metric_include = [
      "cpu.demand.average",
      "cpu.idle.summation",
      "cpu.latency.average",
      "cpu.readiness.average",
      "cpu.ready.summation",
      "cpu.run.summation",
      "cpu.usagemhz.average",
      "cpu.used.summation",
      "cpu.wait.summation",
      "mem.active.average",
      "mem.granted.average",
      "mem.latency.average",
      "mem.swapin.average",
      "mem.swapinRate.average",
      "mem.swapout.average",
      "mem.swapoutRate.average",
      "mem.usage.average",
      "mem.vmmemctl.average",
      "net.bytesRx.average",
      "net.bytesTx.average",
      "net.droppedRx.summation",
      "net.droppedTx.summation",
      "net.usage.average",
      "power.power.average",
      "virtualDisk.numberReadAveraged.average",
      "virtualDisk.numberWriteAveraged.average",
      "virtualDisk.read.average",
      "virtualDisk.readOIO.latest",
      "virtualDisk.throughput.usage.average",
      "virtualDisk.totalReadLatency.average",
      "virtualDisk.totalWriteLatency.average",
      "virtualDisk.write.average",
      "virtualDisk.writeOIO.latest",
      "sys.uptime.latest",
    ]
 # vm_metric_exclude = [] ## Nothing is excluded by default
 # vm_instances = true ## true by default
 #
 ## Hosts
 ## Typical host metrics (if omitted or empty, all metrics are collected)
 host_metric_include = [
      "cpu.coreUtilization.average",
      "cpu.costop.summation",
      "cpu.demand.average",
      "cpu.idle.summation",
      "cpu.latency.average",
      "cpu.readiness.average",
      "cpu.ready.summation",
      "cpu.swapwait.summation",
      "cpu.usage.average",
      "cpu.usagemhz.average",
      "cpu.used.summation",
      "cpu.utilization.average",
      "cpu.wait.summation",
      "disk.deviceReadLatency.average",
      "disk.deviceWriteLatency.average",
      "disk.kernelReadLatency.average",
      "disk.kernelWriteLatency.average",
      "disk.numberReadAveraged.average",
      "disk.numberWriteAveraged.average",
      "disk.read.average",
      "disk.totalReadLatency.average",
      "disk.totalWriteLatency.average",
      "disk.write.average",
      "mem.active.average",
      "mem.latency.average",
      "mem.state.latest",
      "mem.swapin.average",
      "mem.swapinRate.average",
      "mem.swapout.average",
      "mem.swapoutRate.average",
      "mem.totalCapacity.average",
      "mem.usage.average",
      "mem.vmmemctl.average",
      "net.bytesRx.average",
      "net.bytesTx.average",
      "net.droppedRx.summation", 
      "net.errorsTx.summation",
      "net.usage.average",
      "power.power.average",
      "storageAdapter.numberReadAveraged.average",
      "storageAdapter.numberWriteAveraged.average",
      "storageAdapter.read.average",
      "storageAdapter.write.average",
      "sys.uptime.latest",
    ]
 # host_metric_exclude = [] ## Nothing excluded by default
 # host_instances = true ## true by default
 #
 ## Clusters
 cluster_metric_include = [] ## if omitted or empty, all metrics are collected
 cluster_metric_exclude = [] ## Nothing excluded by default
 # cluster_instances = false ## false by default
 #
 ## Datastores
 datastore_metric_include = [] ## if omitted or empty, all metrics are collected
 # datastore_metric_exclude = [] ## Nothing excluded by default
 # datastore_instances = false ## false by default for Datastores only
 #
 ## Datacenters
 datacenter_metric_include = [] ## if omitted or empty, all metrics are collected
# datacenter_metric_exclude = [ "*" ] ## Datacenters are not collected by default.
 # datacenter_instances = false ## false by default for Datastores only
 #

   ## Plugin Settings
   ## separator character to use for measurement and field names (default: "_")
   # separator = "_"

   ## number of objects to retrieve per query for realtime resources (vms and hosts)
   ## set to 64 for vCenter 5.5 and 6.0 (default: 256)
   # max_query_objects = 256

   ## number of metrics to retrieve per query for non-realtime resources (clusters and datastores)
   ## set to 64 for vCenter 5.5 and 6.0 (default: 256)
   # max_query_metrics = 256

   ## number of go routines to use for collection and discovery of objects and metrics
   # collect_concurrency = 1
   # discover_concurrency = 1

   ## whether or not to force discovery of new objects on initial gather call before collecting metrics
   ## when true for large environments this may cause errors for time elapsed while collecting metrics
   ## when false (default) the first collection cycle may result in no or limited metrics while objects are discov$
   # force_discover_on_init = false

   ## the interval before (re)discovering objects subject to metrics collection (default: 300s)
   # object_discovery_interval = "300s"

   ## timeout applies to any of the api request made to vcenter
   timeout = "180s"

   ## When set to true, all samples are sent as integers. This makes the output
   ## data types backwards compatible with Telegraf 1.9 or lower. Normally all
   ## samples from vCenter, with the exception of percentages, are integer
   ## values, but under some conditions, some averaging takes place internally in
   ## the plugin. Setting this flag to "false" will send values as floats to
   ## preserve the full precision when averaging takes place.
   # use_int_samples = true

   ## Custom attributes from vCenter can be very useful for queries in order to slice the
   ## metrics along different dimension and for forming ad-hoc relationships. They are disabled
   ## by default, since they can add a considerable amount of tags to the resulting metrics. To
   ## enable, simply set custom_attribute_exlude to [] (empty set) and use custom_attribute_include
   ## to select the attributes you want to include.
   # custom_attribute_include = []
   # custom_attribute_exclude = ["*"]

   ## Optional SSL Config
   # ssl_ca = "/path/to/cafile"
   # ssl_cert = "/path/to/certfile"
   # ssl_key = "/path/to/keyfile"
   ## Use SSL but skip chain & host verification
   insecure_skip_verify = true
电传代理配置

# Configuration for telegraf agent
[agent]
  ## Default data collection interval for all inputs
  interval = "180s"
  ## Rounds collection interval to 'interval'
  ## ie, if interval="10s" then always collect on :00, :10, :20, etc.
  round_interval = true

  ## Telegraf will send metrics to outputs in batches of at most
  ## metric_batch_size metrics.
  ## This controls the size of writes that Telegraf sends to output plugins.
  metric_batch_size = 1000

  ## Maximum number of unwritten metrics per output.
  metric_buffer_limit = 1000

  ## Collection jitter is used to jitter the collection by a random amount.
  ## Each plugin will sleep for a random time within jitter before collecting.
  ## This can be used to avoid many plugins querying things like sysfs at the
  ## same time, which can have a measurable effect on the system.
  collection_jitter = "0s"

  ## Default flushing interval for all outputs. Maximum flush_interval will be
  ## flush_interval + flush_jitter
  flush_interval = "10s"
  ## Jitter the flush interval by a random amount. This is primarily to avoid
  ## large write spikes for users running a large number of telegraf instances.
  ## ie, a jitter of 5s and interval 10s means flushes will happen every 10- 15s
  flush_jitter = "0s"

  ## By default or when set to "0s", precision will be set to the same
  ## timestamp order as the collection interval, with the maximum being 1s.
  ##   ie, when interval = "10s", precision will be "1s"

  ## Precision will NOT be used for service inputs. It is up to each individual
  ## service input to set the timestamp at the appropriate precision.
  ## Valid time units are "ns", "us" (or "µs"), "ms", "s".
  precision = ""

  ## Log at debug level.
  # debug = false
  ## Log only error level messages.
  # quiet = false

  ## Log file name, the empty string means to log to stderr.
  # logfile = ""

  ## The logfile will be rotated after the time interval specified.  When set
  ## to 0 no time based rotation is performed.
  # logfile_rotation_interval = "0d"

  ## The logfile will be rotated when it becomes larger than the specified
  ## size.  When set to 0 no size based rotation is performed.
  # logfile_rotation_max_size = "0MB"

  ## Maximum number of rotated archives to keep, any older logs are deleted.
  ## If set to -1, no archives are removed.
  # logfile_rotation_max_archives = 5

  ## Override default hostname, if empty use os.Hostname()
  hostname = ""
  ## If set to true, do no set the "host" tag in the telegraf agent.
  omit_hostname = false
Telegraf XDB插件配置

# Configuration for sending metrics to InfluxDB
[[outputs.influxdb]]
    urls = ["http://***********"]
    database = "vmware"
    timeout = "0s"
运行systemctl status-l telegraf时,我收到以下错误:

[outputs.XDB]度量缓冲区溢出;已完成3645项指标 下降

连续不断地删除不同数量的度量

我还收到了这个错误,它可能是一个原因或突出了另一个问题:

[agent]输入“inputs.vsphere”未在其间隔内完成


不确定问题出在哪里

这里有两个问题,而且都是直言不讳的

inputs.vsphere
[agent]输入“inputs.vsphere”未在其间隔内完成

运行
inputs.vsphere
的时间间隔小于插件收集数据所需的时间。您可能需要深入研究以加快收集速度。如果这本身没有帮助,那么您需要在并发性和提高性能之间找到最佳点

不要害怕这样做。根据我的经验,人们高估了他们实际需要的粒度。我见过间隔从10秒更改为几分钟的部署。请注意,您可以

outputs.influxdb
[outputs.XDB]度量缓冲区溢出;已删除3645个指标

您收集的数据点超过了缓冲区的容量。只需将
metric\u buffer\u limit
增加掉测的最大数量(再加上一个安全的好测量值)。与
间隔一样,您可以在代理范围内(该值随后应用于每个输出)或仅针对
输出执行此操作。influxdb

欢迎来到StackOverflow!还有一张纸条 我想你是注册问这个问题的。我给你的答案完全基于基本的文档——你应该在提问之前阅读(并理解!)。其原因是,期望人们在回答你的问题时投入精力,而不投入精力(或者至少没有表现出来)自己解决问题是不好的。请记住:这也是一个平台,您可以在此平台上请求同龄人帮助您。有些同龄人可能会认为问你一个问题是缺乏尊重——老实说——你本可以很容易地回答自己;只需阅读错误消息、文档并结合这些信息。事实上,我很确定这就是你的问题被否决的原因

在这方面,你可能会发现他的优秀文章很有趣,尽管它的标题很“苛刻”