elasticsearch,elasticsearch-jest,Java,elasticsearch,Elasticsearch Jest" /> elasticsearch,elasticsearch-jest,Java,elasticsearch,Elasticsearch Jest" />

Java 使用Jest使用自定义分析器创建索引时出现问题

Java 使用Jest使用自定义分析器创建索引时出现问题,java,elasticsearch,elasticsearch-jest,Java,elasticsearch,Elasticsearch Jest,为elasticsearch提供了一个出色的异步API,我们发现它非常有用。然而,有时结果表明,生成的请求与我们预期的略有不同 通常我们都不在乎,因为一切都很好,但在这种情况下就不在乎了 我想用一个定制的ngram分析器创建一个索引。当我按照elasticsearch rest API文档执行此操作时,我调用以下命令: curl -XPUT 'localhost:9200/test' --data ' { "settings": { "number_of_shards": 3,

为elasticsearch提供了一个出色的异步API,我们发现它非常有用。然而,有时结果表明,生成的请求与我们预期的略有不同

通常我们都不在乎,因为一切都很好,但在这种情况下就不在乎了

我想用一个定制的ngram分析器创建一个索引。当我按照elasticsearch rest API文档执行此操作时,我调用以下命令:

curl -XPUT 'localhost:9200/test' --data '
{
  "settings": {
    "number_of_shards": 3,
    "analysis": {
      "filter": {
        "keyword_search": {
          "type":     "edge_ngram",
          "min_gram": 3,
          "max_gram": 15
        }
      },
      "analyzer": {
        "keyword": {
          "type":      "custom",
          "tokenizer": "whitespace",
          "filter": [
            "lowercase",
            "keyword_search"
          ]
        }
      }
    }
  }
}'
然后我使用以下方法确认分析仪配置正确:

curl -XGET 'localhost:9200/test/_analyze?analyzer=keyword&text=Expecting many tokens
作为响应,我收到多个令牌,如exp、expe、expec等等

现在使用Jest客户端,我将配置json放在类路径上的一个文件中,内容与上面put请求的主体完全相同。我执行如下构造的Jest操作:

new CreateIndex.Builder(name)
            .settings(
                    ImmutableSettings.builder()
                            .loadFromClasspath(
                                    "settings.json"
                            ).build().getAsMap()
            ).build();
结果

  • Primo-通过tcpdump检查,实际发布到elasticsearch的内容是(打印精美):

  • Secundo-生成的索引设置为:

    {
      "test": {
        "settings": {
          "index": {
            "settings": {
              "analysis": {
                "filter": {
                  "keyword_search": {
                    "type": "edge_ngram",
                    "min_gram": "3",
                    "max_gram": "15"
                  }
                },
                "analyzer": {
                  "keyword": {
                    "filter": [
                      "lowercase",
                      "keyword_search"
                    ],
                    "type": "custom",
                    "tokenizer": "whitespace"
                  }
                }
              },
              "number_of_shards": "3"   <-- the only difference from the one created with rest call
            },
            "number_of_shards": "3",
            "number_of_replicas": "0",
            "version": {"created": "1030499"},
            "uuid": "Glqf6FMuTWG5EH2jarVRWA"
          }
        }
      }
    }
    
    {
    “测试”:{
    “设置”:{
    “索引”:{
    “设置”:{
    “分析”:{
    “过滤器”:{
    “关键字搜索”:{
    “类型”:“边缘图”,
    “最小值”:“3”,
    “最大重量”:“15”
    }
    },
    “分析器”:{
    “关键字”:{
    “过滤器”:[
    “小写”,
    “关键字搜索”
    ],
    “类型”:“自定义”,
    “标记器”:“空白”
    }
    }
    },
    
    “碎片数”:“3”很高兴你发现笑话很有用,请看下面我的答案

    问题1。Jest没有发布我的原创作品的原因是什么 设置json,但有些人处理了json

    这不是开玩笑,而是Elasticsearch的
    ImmutableSettings
    这样做,请参见:

        Map test = ImmutableSettings.builder()
                .loadFromSource("{\n" +
                        "  \"settings\": {\n" +
                        "    \"number_of_shards\": 3,\n" +
                        "    \"analysis\": {\n" +
                        "      \"filter\": {\n" +
                        "        \"keyword_search\": {\n" +
                        "          \"type\":     \"edge_ngram\",\n" +
                        "          \"min_gram\": 3,\n" +
                        "          \"max_gram\": 15\n" +
                        "        }\n" +
                        "      },\n" +
                        "      \"analyzer\": {\n" +
                        "        \"keyword\": {\n" +
                        "          \"type\":      \"custom\",\n" +
                        "          \"tokenizer\": \"whitespace\",\n" +
                        "          \"filter\": [\n" +
                        "            \"lowercase\",\n" +
                        "            \"keyword_search\"\n" +
                        "          ]\n" +
                        "        }\n" +
                        "      }\n" +
                        "    }\n" +
                        "  }\n" +
                        "}").build().getAsMap();
        System.out.println("test = " + test);
    
    产出:

    test = {
        settings.analysis.filter.keyword_search.type=edge_ngram,
        settings.number_of_shards=3,
        settings.analysis.analyzer.keyword.filter.0=lowercase,
        settings.analysis.analyzer.keyword.filter.1=keyword_search,
        settings.analysis.analyzer.keyword.type=custom,
        settings.analysis.analyzer.keyword.tokenizer=whitespace,
        settings.analysis.filter.keyword_search.max_gram=15,
        settings.analysis.filter.keyword_search.min_gram=3
    }
    
    问题2。为什么Jest生成的设置不起作用

    因为您对设置JSON/map的使用不是预期的情况。我创建此测试是为了重现您的情况(虽然有点长,但请耐心等待):

    当您运行它时,您将看到使用
    settingsAsMap
    的情况,实际设置完全错误(
    settings
    包括另一个
    settings
    ,这是您的JSON,但它们应该已经合并),因此分析失败

    为什么这不是预期用途?

    因为这就是Elasticsearch在这种情况下的行为。如果设置数据被展平(默认情况下由
    ImmutableSettings
    类完成),那么它不应该具有顶级元素
    settings
    ,但如果数据未展平,它可以具有相同的顶级元素(这就是为什么带有
    settingsAsString
    的测试用例可以工作的原因)

    tl;dr:


    您的设置JSON不应包含顶级“settings”元素(如果您通过
    ImmutableSettings
    运行它).

    感谢您努力回答我的问题,这一定花了一些时间!我应用了您的建议删除了top settings元素,效果非常好。没问题!请记住,您可以使用原始字符串作为
    源代码。
    
    test = {
        settings.analysis.filter.keyword_search.type=edge_ngram,
        settings.number_of_shards=3,
        settings.analysis.analyzer.keyword.filter.0=lowercase,
        settings.analysis.analyzer.keyword.filter.1=keyword_search,
        settings.analysis.analyzer.keyword.type=custom,
        settings.analysis.analyzer.keyword.tokenizer=whitespace,
        settings.analysis.filter.keyword_search.max_gram=15,
        settings.analysis.filter.keyword_search.min_gram=3
    }
    
        @Test
        public void createIndexTemp() throws IOException {
            String index = "so_q_26949195";
    
            String settingsAsString = "{\n" +
                    "  \"settings\": {\n" +
                    "    \"number_of_shards\": 3,\n" +
                    "    \"analysis\": {\n" +
                    "      \"filter\": {\n" +
                    "        \"keyword_search\": {\n" +
                    "          \"type\":     \"edge_ngram\",\n" +
                    "          \"min_gram\": 3,\n" +
                    "          \"max_gram\": 15\n" +
                    "        }\n" +
                    "      },\n" +
                    "      \"analyzer\": {\n" +
                    "        \"keyword\": {\n" +
                    "          \"type\":      \"custom\",\n" +
                    "          \"tokenizer\": \"whitespace\",\n" +
                    "          \"filter\": [\n" +
                    "            \"lowercase\",\n" +
                    "            \"keyword_search\"\n" +
                    "          ]\n" +
                    "        }\n" +
                    "      }\n" +
                    "    }\n" +
                    "  }\n" +
                    "}";
            Map settingsAsMap = ImmutableSettings.builder()
                    .loadFromSource(settingsAsString).build().getAsMap();
    
            CreateIndex createIndex = new CreateIndex.Builder(index)
                    .settings(settingsAsString)
                    .build();
    
            JestResult result = client.execute(createIndex);
            assertTrue(result.getErrorMessage(), result.isSucceeded());
    
            GetSettings getSettings = new GetSettings.Builder().addIndex(index).build();
            result = client.execute(getSettings);
            assertTrue(result.getErrorMessage(), result.isSucceeded());
            System.out.println("SETTINGS SENT AS STRING settingsResponse = " + result.getJsonString());
    
            Analyze analyze = new Analyze.Builder()
                    .index(index)
                    .analyzer("keyword")
                    .source("Expecting many tokens")
                    .build();
            result = client.execute(analyze);
            assertTrue(result.getErrorMessage(), result.isSucceeded());
            Integer actualTokens = result.getJsonObject().getAsJsonArray("tokens").size();
            assertTrue("Expected multiple tokens but got " + actualTokens, actualTokens > 1);
    
            analyze = new Analyze.Builder()
                    .analyzer("keyword")
                    .source("Expecting single token")
                    .build();
            result = client.execute(analyze);
            assertTrue(result.getErrorMessage(), result.isSucceeded());
            actualTokens = result.getJsonObject().getAsJsonArray("tokens").size();
            assertTrue("Expected single token but got " + actualTokens, actualTokens == 1);
    
            admin().indices().delete(new DeleteIndexRequest(index)).actionGet();
    
            createIndex = new CreateIndex.Builder(index)
                    .settings(settingsAsMap)
                    .build();
    
            result = client.execute(createIndex);
            assertTrue(result.getErrorMessage(), result.isSucceeded());
    
            getSettings = new GetSettings.Builder().addIndex(index).build();
            result = client.execute(getSettings);
            assertTrue(result.getErrorMessage(), result.isSucceeded());
            System.out.println("SETTINGS AS MAP settingsResponse = " + result.getJsonString());
    
            analyze = new Analyze.Builder()
                    .index(index)
                    .analyzer("keyword")
                    .source("Expecting many tokens")
                    .build();
            result = client.execute(analyze);
            assertTrue(result.getErrorMessage(), result.isSucceeded());
            actualTokens = result.getJsonObject().getAsJsonArray("tokens").size();
            assertTrue("Expected multiple tokens but got " + actualTokens, actualTokens > 1);
        }