elasticsearch 无法搜索等于*abc的令牌,elasticsearch,nest,elasticsearch,Nest" /> elasticsearch 无法搜索等于*abc的令牌,elasticsearch,nest,elasticsearch,Nest" />

elasticsearch 无法搜索等于*abc的令牌

elasticsearch 无法搜索等于*abc的令牌,elasticsearch,nest,elasticsearch,Nest,假设我有这样的索引文档:1:abc,2:*abc,3:abc def,4:def*abc,5:1abc 我希望搜索的行为如下: Add("myAnalyzer", new CustomAnalyzer { Tokenizer = "myTokenizer", Filter = new[] { "myAsciiFolding" ,"lowercase"

假设我有这样的索引文档:1:abc,2:*abc,3:abc def,4:def*abc,5:1abc

我希望搜索的行为如下:

Add("myAnalyzer", new CustomAnalyzer
        {
          Tokenizer = "myTokenizer",
          Filter = new[]
          {
            "myAsciiFolding"
            ,"lowercase"
            ,"ipPattern"
          }
        }
Add("ipTokenizer", new PatternTokenizer
              {
                Pattern = @"\W+"
              })
Add("ipAsciiFolding", new AsciiFoldingTokenFilter
            {
              PreserveOriginal = true
            })
搜索=abc结果=1,2,3,4,5 搜索=*abc结果=2,4

我使用如下定义的自定义分析器:

Add("myAnalyzer", new CustomAnalyzer
        {
          Tokenizer = "myTokenizer",
          Filter = new[]
          {
            "myAsciiFolding"
            ,"lowercase"
            ,"ipPattern"
          }
        }
Add("ipTokenizer", new PatternTokenizer
              {
                Pattern = @"\W+"
              })
Add("ipAsciiFolding", new AsciiFoldingTokenFilter
            {
              PreserveOriginal = true
            })
使用如下定义的标记器:

Add("myAnalyzer", new CustomAnalyzer
        {
          Tokenizer = "myTokenizer",
          Filter = new[]
          {
            "myAsciiFolding"
            ,"lowercase"
            ,"ipPattern"
          }
        }
Add("ipTokenizer", new PatternTokenizer
              {
                Pattern = @"\W+"
              })
Add("ipAsciiFolding", new AsciiFoldingTokenFilter
            {
              PreserveOriginal = true
            })
然后像这样折叠:

Add("myAnalyzer", new CustomAnalyzer
        {
          Tokenizer = "myTokenizer",
          Filter = new[]
          {
            "myAsciiFolding"
            ,"lowercase"
            ,"ipPattern"
          }
        }
Add("ipTokenizer", new PatternTokenizer
              {
                Pattern = @"\W+"
              })
Add("ipAsciiFolding", new AsciiFoldingTokenFilter
            {
              PreserveOriginal = true
            })
实际上,搜索1成功,但第二个(带“*”)返回的结果与第一个相同。有没有一种方法可以指定多个标记器来完成我期望的任务

有什么想法吗

Thx,

要执行此操作:

搜索=abc结果=1,2,3,4,5搜索=*abc结果=2,4

当您在字符串中搜索时(在“*abc”中查找“abc”),您不希望“*abc”的搜索与“*def abc”匹配,我将使用它来标记数据

curl -XPUT 'localhost:9200/test' -d '
{
    "settings" : {
        "analysis" : {
            "analyzer" : {
                "my_ngram_analyzer" : {
                    "tokenizer" : "my_ngram_tokenizer"
                }
            },
            "tokenizer" : {
                "my_ngram_tokenizer" : {
                    "type" : "nGram",
                    "min_gram" : "2",
                    "max_gram" : "5",
                    "token_chars": [ "letter", "digit", "punctuation", "symbol" ]
                }
            }
        }
    }
}'
如果您的术语(*abc等)都是5个字符或更少,那么我将使用查询(即,您将在索引中找到一个完全匹配的术语)


如果您的术语长度超过5个字符,我将使用a并将默认_运算符设置为,并且

您在映射中使用的分析器是什么?如果您希望将*视为数据而不被忽略,那么您可能需要切换到。