elasticsearch,Python,elasticsearch" /> elasticsearch,Python,elasticsearch" />

Elasticsearch/Python-更改映射后重新索引数据?

Elasticsearch/Python-更改映射后重新索引数据?,python,elasticsearch,Python,elasticsearch,在映射或数据类型更改后,我有点纠结于如何在弹性搜索中重新索引数据 根据弹性搜索文档 使用滚动搜索从旧索引中提取文档,并使用批量API将其索引到新索引中。许多客户端API都提供了一个reindex()方法,它将为您完成所有这一切。完成后,可以删除旧索引 这是我以前的地图 { "test-index2": { "mappings": { "business": { "properties": { "address": {

在映射或数据类型更改后,我有点纠结于如何在弹性搜索中重新索引数据

根据弹性搜索文档

使用滚动搜索从旧索引中提取文档,并使用批量API将其索引到新索引中。许多客户端API都提供了一个reindex()方法,它将为您完成所有这一切。完成后,可以删除旧索引

这是我以前的地图

{
  "test-index2": {
    "mappings": {
      "business": {
        "properties": {
          "address": {
            "type": "nested",
            "properties": {
              "country": {
                "type": "string"
              },
              "full_address": {
                "type": "string"
              }
            }
          }
        }
      }
    }
  }
}
新的索引映射,我正在更改
full\u地址
->
location\u地址

{
  "test-index2": {
    "mappings": {
      "business": {
        "properties": {
          "address": {
            "type": "nested",
            "properties": {
              "country": {
                "type": "string"
              },
              "location_address": {
                "type": "string"
              }
            }
          }
        }
      }
    }
  }
}
我正在使用python客户端进行elasticsearch

但是,这会将数据从一个索引传输到另一个索引


我如何使用它来更改上述案例的映射/(数据类型等?

API将文档从一个索引“移动”到另一个索引。它无法检测/推断旧索引文档中的字段名
full\u address
应该是新索引文档中的
location\u address
。我怀疑标准Elasticsearch客户端提供的API是否能满足您的需求。我能想到的实现这一点的唯一方法是通过客户端的附加自定义逻辑,它维护一个从旧索引到新索引的字段名字典,然后从旧索引读取文档,并使用从字段名字典获得的新字段名将对应的文档索引到新索引。

如果您使用已经在elasticsearch的python客户端中实现的scan&scroll和Bulk API,则非常简单

首先->通过扫描和滚动方法获取所有文档

循环浏览并对每个文档进行必要的修改

使用批量API将修改后的文档插入到新索引中

from elasticsearch import Elasticsearch, helpers

es = Elasticsearch()

# Use the scan&scroll method to fetch all documents from your old index

res = helpers.scan(es, query={
  "query": {
    "match_all": {}

  },
  "size":1000 
},index="old_index")


new_insert_data = []

# Change the mapping and everything else by looping through all your documents

for x in res:
    x['_index'] = 'new_index'
    # Change "address" to "location_address"
    x['_source']['location_address'] = x['_source']['address']
    del x['_source']['address']
    # This is a useless field
    del x['_score']
    es.indices.refresh(index="testing_index3")

    # Add the new data into a list
    new_insert_data.append(x)





es.indices.refresh(index="new_index")
print new_insert_data

#Use the Bulk API to insert the list of your modified documents into the database
helpers.bulk(es,new_insert_data)

在更新映射之后,可以通过使用批量API更新现有文档来完成此操作

邮政/散装 {“更新”:{“_id”:“59519”,“_类型”:“资产”,“_索引”:“资产”} {“doc”:{“facility_id”:491},“detect_noop”:false}

注意-使用“detect_noop”检测noop更新

from elasticsearch import Elasticsearch, helpers

es = Elasticsearch()

# Use the scan&scroll method to fetch all documents from your old index

res = helpers.scan(es, query={
  "query": {
    "match_all": {}

  },
  "size":1000 
},index="old_index")


new_insert_data = []

# Change the mapping and everything else by looping through all your documents

for x in res:
    x['_index'] = 'new_index'
    # Change "address" to "location_address"
    x['_source']['location_address'] = x['_source']['address']
    del x['_source']['address']
    # This is a useless field
    del x['_score']
    es.indices.refresh(index="testing_index3")

    # Add the new data into a list
    new_insert_data.append(x)





es.indices.refresh(index="new_index")
print new_insert_data

#Use the Bulk API to insert the list of your modified documents into the database
helpers.bulk(es,new_insert_data)