elasticsearch 使用jdbc输入的日志存储索引,elasticsearch,logstash,logstash-input-jdbc,elasticsearch,Logstash,Logstash Input Jdbc" /> elasticsearch 使用jdbc输入的日志存储索引,elasticsearch,logstash,logstash-input-jdbc,elasticsearch,Logstash,Logstash Input Jdbc" />

elasticsearch 使用jdbc输入的日志存储索引

elasticsearch 使用jdbc输入的日志存储索引,elasticsearch,logstash,logstash-input-jdbc,elasticsearch,Logstash,Logstash Input Jdbc,我正在与一起使用Logstash,以便将数据发送到Elasticsearch。我正在Windows计算机上使用NSSM将其作为服务运行。这是config文件(我从过滤器中删除了一些重复代码): 根据我的理解: 每分钟日志存储将执行执行dbo.MyLogstashProcedure 它将执行以下ETL: 删除nil 将XML列解析为子文档 将字段重命名回其真实名称(JDBC驱动程序小写列名) 它将索引数据到Elasticsearch集群 ETL文档如下所示。此外,有时ReportsPermi

我正在与一起使用Logstash,以便将数据发送到Elasticsearch。我正在Windows计算机上使用NSSM将其作为服务运行。这是
config
文件(我从过滤器中删除了一些重复代码):

根据我的理解:

  • 每分钟日志存储将执行
    执行dbo.MyLogstashProcedure
  • 它将执行以下ETL:
  • 删除
    nil
  • XML
    列解析为子文档
  • 将字段重命名回其真实名称(JDBC驱动程序小写列名)
  • 它将索引数据到Elasticsearch集群
ETL文档如下所示。此外,有时ReportsPermissions阵列可能非常庞大,可存储20000条记录

{
  "RelatedCountries": [
    {
      "CountryCode": "MX",
      "CountryName": "Mexico",
      "CountryPermissions": ["4", "122", "11"]
    }
  ],
  "RelatedProducts": [
    {
      "ProductID": "1",
      "ProductName": "Packaged Food"
    },
    {
      "ProductID": "2",
      "ProductName": "Packaged Food",
      "ProductPermissions": ["19", "29", "30", "469"]
    }
  ],
  "Title": "Packaged Food in Mexico",
  "ItemTypeID": "3",
  "ItemSourceID": "2",
  "DatePublished": "2014-11-27T00:00:00",
  "ReportPermissions": ["p19c4", "p19c11", "p19c122", "p29c4", "p29c11", "p29c122", "p30c4", "p30c11", "p30c122", "p281c4", "p281c11", "p281c122", "p285c4", "p285c11", "p285c122", "p286c4", "p286c11", "p286c122", "p292c4", "p292c11", "p292c122", "p294c4", "p294c11", "p294c122", "p295c4", "p295c11", "p295c122", "p297c4", "p297c11", "p297c122", "p298c4", "p298c11", "p298c122", "p299c4", "p299c11", "p299c122", "p469c11", "p515c4", "p515c11", "p515c122", "p516c4", "p516c11", "p516c122", "p517c4", "p517c11", "p517c122", "p518c4", "p518c11", "p518c122", "p519c4", "p519c11", "p519c122", "p520c4", "p520c11", "p520c122", "p521c4", "p521c11", "p521c122"]
}
我的存储过程执行时间在很大程度上取决于数据。有时它可能在10-15秒内执行,有时大约在40秒内执行

若存储过程在大约10-15秒内返回结果,则数据将被很好地索引(每分钟一次)。现在,当需要更长的时间来执行时,我注意到Logstash的速度变慢了,每隔几分钟就开始发送数据,甚至完全停止发送数据

Nor
stdout
stderr
均未显示任何错误。Elasticsearch日志中也没有任何内容

这在Rackspace虚拟机上运行

  • 操作系统:Windows Server 2012 R2 x64
  • Elasticsearch版本:1.7.3
  • 日志存储版本:1.5.5
  • 2个虚拟CPU@2.59 GHz
  • 2GB内存
  • 40GB SSD驱动器(空间充足)
  • Java版本:1.8.0_65 x64
正在从同一个专用网络读取数据

为了修复索引,可能的解决方案是什么


这是使用
--debug
输出的:

您能否尝试使用
--debug
选项运行logstash,以便我们可以从更详细的输出中获得一些提示?另外,既然您知道存储的过程执行起来可能会太短,那么为什么不将cron增加到每两分钟一次,而不是每分钟一次呢?@Val我已经从
--debug
添加了信息,希望我做得对。当然,我可以将cron配置为每两分钟执行一次,但是如果我的SP很快返回数据,我最终会减慢整体索引速度,这几乎就是为什么您可以尝试使用
--debug
选项运行logstash,以便我们可以从更详细的输出中获得一些提示?另外,既然您知道存储的过程执行起来可能会太短,那么为什么不将cron增加到每两分钟一次,而不是每分钟一次呢?@Val我已经从
--debug
添加了信息,希望我做得对。当然,我可以将cron配置为每两分钟执行一次,但是如果我的SP很快返回数据,我最终会减慢整个索引的速度,就是这样
{
  "RelatedCountries": [
    {
      "CountryCode": "MX",
      "CountryName": "Mexico",
      "CountryPermissions": ["4", "122", "11"]
    }
  ],
  "RelatedProducts": [
    {
      "ProductID": "1",
      "ProductName": "Packaged Food"
    },
    {
      "ProductID": "2",
      "ProductName": "Packaged Food",
      "ProductPermissions": ["19", "29", "30", "469"]
    }
  ],
  "Title": "Packaged Food in Mexico",
  "ItemTypeID": "3",
  "ItemSourceID": "2",
  "DatePublished": "2014-11-27T00:00:00",
  "ReportPermissions": ["p19c4", "p19c11", "p19c122", "p29c4", "p29c11", "p29c122", "p30c4", "p30c11", "p30c122", "p281c4", "p281c11", "p281c122", "p285c4", "p285c11", "p285c122", "p286c4", "p286c11", "p286c122", "p292c4", "p292c11", "p292c122", "p294c4", "p294c11", "p294c122", "p295c4", "p295c11", "p295c122", "p297c4", "p297c11", "p297c122", "p298c4", "p298c11", "p298c122", "p299c4", "p299c11", "p299c122", "p469c11", "p515c4", "p515c11", "p515c122", "p516c4", "p516c11", "p516c122", "p517c4", "p517c11", "p517c122", "p518c4", "p518c11", "p518c122", "p519c4", "p519c11", "p519c122", "p520c4", "p520c11", "p520c122", "p521c4", "p521c11", "p521c122"]
}