Elasticsearch/Python-更改映射后重新索引数据?
在映射或数据类型更改后,我有点纠结于如何在弹性搜索中重新索引数据 根据弹性搜索文档 使用滚动搜索从旧索引中提取文档,并使用批量API将其索引到新索引中。许多客户端API都提供了一个reindex()方法,它将为您完成所有这一切。完成后,可以删除旧索引 这是我以前的地图Elasticsearch/Python-更改映射后重新索引数据?,python,
elasticsearch,Python,
elasticsearch,在映射或数据类型更改后,我有点纠结于如何在弹性搜索中重新索引数据 根据弹性搜索文档 使用滚动搜索从旧索引中提取文档,并使用批量API将其索引到新索引中。许多客户端API都提供了一个reindex()方法,它将为您完成所有这一切。完成后,可以删除旧索引 这是我以前的地图 { "test-index2": { "mappings": { "business": { "properties": { "address": {
{
"test-index2": {
"mappings": {
"business": {
"properties": {
"address": {
"type": "nested",
"properties": {
"country": {
"type": "string"
},
"full_address": {
"type": "string"
}
}
}
}
}
}
}
}
新的索引映射,我正在更改full\u地址
->location\u地址
{
"test-index2": {
"mappings": {
"business": {
"properties": {
"address": {
"type": "nested",
"properties": {
"country": {
"type": "string"
},
"location_address": {
"type": "string"
}
}
}
}
}
}
}
}
我正在使用python客户端进行elasticsearch
但是,这会将数据从一个索引传输到另一个索引
我如何使用它来更改上述案例的映射/(数据类型等?API将文档从一个索引“移动”到另一个索引。它无法检测/推断旧索引文档中的字段名
full\u address
应该是新索引文档中的location\u address
。我怀疑标准Elasticsearch客户端提供的API是否能满足您的需求。我能想到的实现这一点的唯一方法是通过客户端的附加自定义逻辑,它维护一个从旧索引到新索引的字段名字典,然后从旧索引读取文档,并使用从字段名字典获得的新字段名将对应的文档索引到新索引。如果您使用已经在elasticsearch的python客户端中实现的scan&scroll和Bulk API,则非常简单
首先->通过扫描和滚动方法获取所有文档
循环浏览并对每个文档进行必要的修改
使用批量API将修改后的文档插入到新索引中
from elasticsearch import Elasticsearch, helpers
es = Elasticsearch()
# Use the scan&scroll method to fetch all documents from your old index
res = helpers.scan(es, query={
"query": {
"match_all": {}
},
"size":1000
},index="old_index")
new_insert_data = []
# Change the mapping and everything else by looping through all your documents
for x in res:
x['_index'] = 'new_index'
# Change "address" to "location_address"
x['_source']['location_address'] = x['_source']['address']
del x['_source']['address']
# This is a useless field
del x['_score']
es.indices.refresh(index="testing_index3")
# Add the new data into a list
new_insert_data.append(x)
es.indices.refresh(index="new_index")
print new_insert_data
#Use the Bulk API to insert the list of your modified documents into the database
helpers.bulk(es,new_insert_data)
在更新映射之后,可以通过使用批量API更新现有文档来完成此操作 邮政/散装 {“更新”:{“_id”:“59519”,“_类型”:“资产”,“_索引”:“资产”} {“doc”:{“facility_id”:491},“detect_noop”:false} 注意-使用“detect_noop”检测noop更新
from elasticsearch import Elasticsearch, helpers
es = Elasticsearch()
# Use the scan&scroll method to fetch all documents from your old index
res = helpers.scan(es, query={
"query": {
"match_all": {}
},
"size":1000
},index="old_index")
new_insert_data = []
# Change the mapping and everything else by looping through all your documents
for x in res:
x['_index'] = 'new_index'
# Change "address" to "location_address"
x['_source']['location_address'] = x['_source']['address']
del x['_source']['address']
# This is a useless field
del x['_score']
es.indices.refresh(index="testing_index3")
# Add the new data into a list
new_insert_data.append(x)
es.indices.refresh(index="new_index")
print new_insert_data
#Use the Bulk API to insert the list of your modified documents into the database
helpers.bulk(es,new_insert_data)