使用logstash和jdbc更新复杂的嵌套elasticsearch文档
让我们假设Oracle模式具有以下表和列: Country country_id; (Primary Key) country_name; Department department_id; (Primary Key) department_name; country_id; (Foreign key to Country:country_id) Employee employee_id; (Primary Key) employee_name; department_id; (Foreign key to Department:department_id) 国家 国家识别号;(主键) 国家名称; 部门 部门id;(主键) 部门名称; 国家识别号;(国家/地区外键:国家/地区id) 雇员 雇员身份证;(主键) 员工姓名; 部门id;(部门外键:部门id) 我有我的Elasticsearch文档,其中根元素是一个国家,它包含 该国家的所有部门,而这些部门又包含各自部门的所有员工 因此,文档结构如下所示: { "mappings": { "country": { "properties": { "country_id": { "type": "string"}, "country_name": { "type": "string"}, "department": { "type": "nested", "properties": { "department_id": { "type": "string"}, "department_name": { "type": "string"}, "employee": { "type": "nested", "properties": { "employee_id": { "type": "string"}, "employee_name": { "type": "string"} } } } } } } } } { “映射”:{ “国家”:{ “财产”:{ “国家/地区id”:{“类型”:“字符串”}, “国家/地区名称”:{“类型”:“字符串”}, “部门”:{ “类型”:“嵌套”, “财产”:{ “部门id”:{“类型”:“字符串”}, “部门名称”:{“类型”:“字符串”}, “雇员”:{ “类型”:“嵌套”, “财产”:{ “雇员id”:{“类型”:“字符串”}, “雇员姓名”:{“类型”:“字符串”} } } } } } } } } 我希望能够在每个表上运行单独的输入jdbc查询,它们应该创建/更新/删除 添加/更新/删除基表中的数据时,elasticsearch文档中的数据 这是一个示例问题,实际的表和数据结构更加复杂。所以我不是在寻找解决方案 仅限于此 有没有办法做到这一点使用logstash和jdbc更新复杂的嵌套elasticsearch文档,jdbc,
elasticsearch,logstash,Jdbc,
elasticsearch,Logstash,让我们假设Oracle模式具有以下表和列: Country country_id; (Primary Key) country_name; Department department_id; (Primary Key) department_name; country_id; (Foreign key to Country:country_id) Employee employee
谢谢。对于第一级,它直接使用。您需要在它们之间有一个共同的id来引用
filter {
aggregate {
task_id => "%{id}"
code => "
map['id'] = event.get('id')
map['department'] ||= []
map['department'] << event.to_hash.each do |key,value| { key => value } end
"
push_previous_map_as_event => true
timeout => 150000
timeout_tags => ['aggregated']
}
if "aggregated" not in [tags] {
drop {}
}
}
解决级别2的一种方法是查询已编制索引的文档,并使用嵌套记录更新它。
再次使用;文档应该有一个公共id,以便您可以查找并插入到正确的文档中
filter {
#get the document from elastic based on id and store it in 'emp'
elasticsearch {
hosts => ["${ELASTICSEARCH_HOST}/${INDEX_NAME}/${INDEX_TYPE}"]
query => "id:%{id}"
fields => { "employee" => "emp" }
}
aggregate {
task_id => "%{id}"
code => "
map['id'] = event.get('id')
map['employee'] = []
employeeArr = []
temp_emp = {}
event.to_hash.each do |key,value|
temp_emp[key] = value
end
#push the objects into an array
employeeArr.push(temp_emp)
empArr = event.get('emp')
for emp in empArr
emp['employee'] = employeeArr
map['employee'].push(emp)
end
"
push_previous_map_as_event => true
timeout => 150000
timeout_tags => ['aggregated']
}
if "aggregated" not in [tags] {
drop {}
}
}
output {
elasticsearch {
action => "update" #important
...
}
}
另外,为了调试ruby代码,请在输出中使用以下命令
我猜您可能已经解决了这个问题,但是,您是否可以使用Oracle视图将所需的数据组合到文档结构格式(国家、部门、员工)中,并将其作为单个JDBC查询,这样您就可以将elasticsearch文档id创建为最低唯一级别(本例中为员工id)并在那里管理变化?
filter {
#get the document from elastic based on id and store it in 'emp'
elasticsearch {
hosts => ["${ELASTICSEARCH_HOST}/${INDEX_NAME}/${INDEX_TYPE}"]
query => "id:%{id}"
fields => { "employee" => "emp" }
}
aggregate {
task_id => "%{id}"
code => "
map['id'] = event.get('id')
map['employee'] = []
employeeArr = []
temp_emp = {}
event.to_hash.each do |key,value|
temp_emp[key] = value
end
#push the objects into an array
employeeArr.push(temp_emp)
empArr = event.get('emp')
for emp in empArr
emp['employee'] = employeeArr
map['employee'].push(emp)
end
"
push_previous_map_as_event => true
timeout => 150000
timeout_tags => ['aggregated']
}
if "aggregated" not in [tags] {
drop {}
}
}
output {
elasticsearch {
action => "update" #important
...
}
}
output{
stdout { codec => dots }
}