elasticsearch 如何避免elasticsearch重复文档,elasticsearch,logstash,elasticsearch,Logstash" /> elasticsearch 如何避免elasticsearch重复文档,elasticsearch,logstash,elasticsearch,Logstash" />

elasticsearch 如何避免elasticsearch重复文档

elasticsearch 如何避免elasticsearch重复文档,elasticsearch,logstash,elasticsearch,Logstash,如何避免elasticsearch重复文档 elasticsearch索引文档计数20010253与日志行计数13411790不匹配 nifi: 记录elk服务器上的行数: wc -l /mnt/elk/logstash/data/from/nifi/dev/logs/nifi/*.log 13,411,790 total elasticsearch索引文档计数: curl -XGET 'ip:9200/_cat/indices?v&pretty' docs.count = 20,0

如何避免elasticsearch重复文档

elasticsearch索引文档计数20010253与日志行计数13411790不匹配

nifi:

记录elk服务器上的行数:

wc -l /mnt/elk/logstash/data/from/nifi/dev/logs/nifi/*.log
13,411,790 total 
elasticsearch索引文档计数:

curl -XGET 'ip:9200/_cat/indices?v&pretty'
docs.count = 20,010,253 
日志存储输入配置文件:

cat /mnt/elk/logstash/input_conf_files/test_4.conf
input {
file {
path => "/mnt/elk/logstash/data/from/nifi/dev/logs/nifi/*.log"
type => "test_4"
sincedb_path => "/mnt/elk/logstash/scripts/sincedb/test_4"
}
}
filter {
if [type] == "test_4" {
grok {
match => {
"message" => "%{DATE:date} %{TIME:time} %{WORD:EventType} %{GREEDYDATA:EventText}"
}
}
}
}
output {
if [type] == "test_4" {
elasticsearch {
hosts => "ip:9200"
index => "test_4"
}
}
else {
stdout {
codec => rubydebug
}
}
}

您可以使用指纹过滤器插件:

例如,这可用于在插入时创建一致的文档ID 事件转换为Elasticsearch,允许日志存储中的事件导致 要更新的现有文档,而不是要更新的新文档 创造

curl -XGET 'ip:9200/_cat/indices?v&pretty'
docs.count = 20,010,253 
cat /mnt/elk/logstash/input_conf_files/test_4.conf
input {
file {
path => "/mnt/elk/logstash/data/from/nifi/dev/logs/nifi/*.log"
type => "test_4"
sincedb_path => "/mnt/elk/logstash/scripts/sincedb/test_4"
}
}
filter {
if [type] == "test_4" {
grok {
match => {
"message" => "%{DATE:date} %{TIME:time} %{WORD:EventType} %{GREEDYDATA:EventText}"
}
}
}
}
output {
if [type] == "test_4" {
elasticsearch {
hosts => "ip:9200"
index => "test_4"
}
}
else {
stdout {
codec => rubydebug
}
}
}