elasticsearch 使用jdbc输入的日志存储索引
我正在与一起使用Logstash,以便将数据发送到Elasticsearch。我正在Windows计算机上使用NSSM将其作为服务运行。这是elasticsearch 使用jdbc输入的日志存储索引,elasticsearch,logstash,logstash-input-jdbc,elasticsearch,Logstash,Logstash Input Jdbc,我正在与一起使用Logstash,以便将数据发送到Elasticsearch。我正在Windows计算机上使用NSSM将其作为服务运行。这是config文件(我从过滤器中删除了一些重复代码): 根据我的理解: 每分钟日志存储将执行执行dbo.MyLogstashProcedure 它将执行以下ETL: 删除nil 将XML列解析为子文档 将字段重命名回其真实名称(JDBC驱动程序小写列名) 它将索引数据到Elasticsearch集群 ETL文档如下所示。此外,有时ReportsPermi
config
文件(我从过滤器中删除了一些重复代码):
根据我的理解:
- 每分钟日志存储将执行
执行dbo.MyLogstashProcedure代码>
- 它将执行以下ETL:
- 删除
nil
- 将
列解析为子文档XML
- 将字段重命名回其真实名称(JDBC驱动程序小写列名)
- 它将索引数据到Elasticsearch集群
{
"RelatedCountries": [
{
"CountryCode": "MX",
"CountryName": "Mexico",
"CountryPermissions": ["4", "122", "11"]
}
],
"RelatedProducts": [
{
"ProductID": "1",
"ProductName": "Packaged Food"
},
{
"ProductID": "2",
"ProductName": "Packaged Food",
"ProductPermissions": ["19", "29", "30", "469"]
}
],
"Title": "Packaged Food in Mexico",
"ItemTypeID": "3",
"ItemSourceID": "2",
"DatePublished": "2014-11-27T00:00:00",
"ReportPermissions": ["p19c4", "p19c11", "p19c122", "p29c4", "p29c11", "p29c122", "p30c4", "p30c11", "p30c122", "p281c4", "p281c11", "p281c122", "p285c4", "p285c11", "p285c122", "p286c4", "p286c11", "p286c122", "p292c4", "p292c11", "p292c122", "p294c4", "p294c11", "p294c122", "p295c4", "p295c11", "p295c122", "p297c4", "p297c11", "p297c122", "p298c4", "p298c11", "p298c122", "p299c4", "p299c11", "p299c122", "p469c11", "p515c4", "p515c11", "p515c122", "p516c4", "p516c11", "p516c122", "p517c4", "p517c11", "p517c122", "p518c4", "p518c11", "p518c122", "p519c4", "p519c11", "p519c122", "p520c4", "p520c11", "p520c122", "p521c4", "p521c11", "p521c122"]
}
我的存储过程执行时间在很大程度上取决于数据。有时它可能在10-15秒内执行,有时大约在40秒内执行
若存储过程在大约10-15秒内返回结果,则数据将被很好地索引(每分钟一次)。现在,当需要更长的时间来执行时,我注意到Logstash的速度变慢了,每隔几分钟就开始发送数据,甚至完全停止发送数据
Norstdout
stderr均未显示任何错误。Elasticsearch日志中也没有任何内容
这在Rackspace虚拟机上运行
- 操作系统:Windows Server 2012 R2 x64
- Elasticsearch版本:1.7.3
- 日志存储版本:1.5.5
- 2个虚拟CPU@2.59 GHz
- 2GB内存
- 40GB SSD驱动器(空间充足)
- Java版本:1.8.0_65 x64
这是使用
--debug
输出的:您能否尝试使用--debug
选项运行logstash,以便我们可以从更详细的输出中获得一些提示?另外,既然您知道存储的过程执行起来可能会太短,那么为什么不将cron增加到每两分钟一次,而不是每分钟一次呢?@Val我已经从--debug
添加了信息,希望我做得对。当然,我可以将cron配置为每两分钟执行一次,但是如果我的SP很快返回数据,我最终会减慢整体索引速度,这几乎就是为什么您可以尝试使用--debug
选项运行logstash,以便我们可以从更详细的输出中获得一些提示?另外,既然您知道存储的过程执行起来可能会太短,那么为什么不将cron增加到每两分钟一次,而不是每分钟一次呢?@Val我已经从--debug
添加了信息,希望我做得对。当然,我可以将cron配置为每两分钟执行一次,但是如果我的SP很快返回数据,我最终会减慢整个索引的速度,就是这样
{
"RelatedCountries": [
{
"CountryCode": "MX",
"CountryName": "Mexico",
"CountryPermissions": ["4", "122", "11"]
}
],
"RelatedProducts": [
{
"ProductID": "1",
"ProductName": "Packaged Food"
},
{
"ProductID": "2",
"ProductName": "Packaged Food",
"ProductPermissions": ["19", "29", "30", "469"]
}
],
"Title": "Packaged Food in Mexico",
"ItemTypeID": "3",
"ItemSourceID": "2",
"DatePublished": "2014-11-27T00:00:00",
"ReportPermissions": ["p19c4", "p19c11", "p19c122", "p29c4", "p29c11", "p29c122", "p30c4", "p30c11", "p30c122", "p281c4", "p281c11", "p281c122", "p285c4", "p285c11", "p285c122", "p286c4", "p286c11", "p286c122", "p292c4", "p292c11", "p292c122", "p294c4", "p294c11", "p294c122", "p295c4", "p295c11", "p295c122", "p297c4", "p297c11", "p297c122", "p298c4", "p298c11", "p298c122", "p299c4", "p299c11", "p299c122", "p469c11", "p515c4", "p515c11", "p515c122", "p516c4", "p516c11", "p516c122", "p517c4", "p517c11", "p517c122", "p518c4", "p518c11", "p518c122", "p519c4", "p519c11", "p519c122", "p520c4", "p520c11", "p520c122", "p521c4", "p521c11", "p521c122"]
}