Django Haystack-如何在不使用词干的情况下强制进行精确的属性匹配?
我正在使用Django 1.5、Django haystack 2.0和elasticsearch后端。我试图通过精确的属性匹配进行搜索。然而,即使我同时使用Django Haystack-如何在不使用词干的情况下强制进行精确的属性匹配?,django,elasticsearch,django-haystack,Django,elasticsearch,Django Haystack,我正在使用Django 1.5、Django haystack 2.0和elasticsearch后端。我试图通过精确的属性匹配进行搜索。然而,即使我同时使用\uuuExact操作符和exact()类,我也得到了“类似”的结果。如何防止这种行为 例如: # models.py class Person(models.Model): name = models.TextField() # search_indexes.py class PersonIndex(indexes.Searc
\uuuExact
操作符和exact()类,我也得到了“类似”的结果。如何防止这种行为
例如:
# models.py
class Person(models.Model):
name = models.TextField()
# search_indexes.py
class PersonIndex(indexes.SearchIndex, indexes.Indexable):
text = indexes.CharField(document=True, use_template=True)
name = indexes.CharField(model_attr="name")
def get_model(self):
return Person
def index_queryset(self, using=None):
return self.get_model().objects.all()
# templates/search/indexes/people/person_text.txt
{{ object.name }}
>>> p1 = Person(name="Simon")
>>> p1.save()
>>> p2 = Person(name="Simons")
>>> p2.save()
$ ./manage.py rebuild_index
>>> person_sqs = SearchQuerySet().models(Person)
>>> person_sqs.filter(name__exact="Simons")
[<SearchResult: people.person (name=u'Simon')>
<SearchResult: people.person (name=u'Simons')>]
>>> person_sqs.filter(name=Exact("Simons", clean=True))
[<SearchResult: people.person (name=u'Simon')>
<SearchResult: people.person (name=u'Simons')>]
#models.py
班级负责人(models.Model):
name=models.TextField()
#search_index.py
类PersonIndex(index.SearchIndex,index.Indexable):
text=index.CharField(document=True,use\u template=True)
name=index.CharField(model\u attr=“name”)
def get_型号(自):
返回人
def index_queryset(self,using=None):
返回self.get_model().objects.all()
#模板/search/index/people/person\u text.txt
{{object.name}
>>>p1=个人(姓名=“西蒙”)
>>>p1.保存()
>>>p2=个人(姓名=“西蒙斯”)
>>>p2.save()
$./manage.py重建索引
>>>person_sqs=SearchQuerySet().模型(person)
>>>人员过滤器(名称为“Simons”)
[
]
>>>person_sqs.filter(name=Exact(“Simons”,clean=True))
[
]
我只想要“Simons”的搜索结果——“Simon”结果不应该出现。不要使用CharField使用EdgeNgramField
# search_indexes.py
class PersonIndex(indexes.SearchIndex, indexes.Indexable):
text = indexes.CharField(document=True, use_template=True)
name = indexes.EdgeNgramField(model_attr="name")
def get_model(self):
return Person
def index_queryset(self, using=None):
return self.get_model().objects.all()
而不是用户过滤器,用户自动完成
person_sqs = SearchQuerySet().models(Person)
person_sqs.autocomplete(name="Simons")
资料来源:我也面临着类似的问题。如果更改haystacks elasticsearch后端的设置,如:
DEFAULT_SETTINGS = {
'settings': {
"analysis": {
"analyzer": {
"ngram_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": ["haystack_ngram", "lowercase"]
},
"edgengram_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": ["haystack_edgengram", "lowercase"]
}
},
"tokenizer": {
"haystack_ngram_tokenizer": {
"type": "nGram",
"min_gram": 6,
"max_gram": 15,
},
"haystack_edgengram_tokenizer": {
"type": "edgeNGram",
"min_gram": 6,
"max_gram": 15,
"side": "front"
}
},
"filter": {
"haystack_ngram": {
"type": "nGram",
"min_gram": 6,
"max_gram": 15
},
"haystack_edgengram": {
"type": "edgeNGram",
"min_gram": 6,
"max_gram": 15
}
}
}
}
}
然后,它将仅在查询超过6个字符时进行标记化
如果您希望得到类似“xyzsimonsxyz”的结果,那么您需要使用ngram analyzer而不是EdgeGram,或者您可以根据需要同时使用这两种分析器。Edengram只从一开始生成令牌
假设max_gram>=6,使用NGram时,“simons”将是术语xyzsimonsxyz的生成标记之一,您将得到预期的结果,同时搜索分析器需要不同,否则您将得到奇怪的结果
此外,如果你有大量的文本,ngram的索引大小可能会变得相当大
TL;DR:定义自定义标记器(非过滤器)
冗长的解释 a) 使用EdgeNgramField:
# search_indexes.py
class PersonIndex(indexes.SearchIndex, indexes.Indexable):
text = indexes.EdgeNgramField(document=True, use_template=True)
...
b) 模板:
# templates/search/indexes/people/person_text.txt
{{ object.name }}
c) 创建自定义搜索后端:
# backends.py
from django.conf import settings
from haystack.backends.elasticsearch_backend import (
ElasticsearchSearchBackend,
ElasticsearchSearchEngine,
)
class CustomElasticsearchSearchBackend(ElasticsearchSearchBackend):
def __init__(self, connection_alias, **connection_options):
super(CustomElasticsearchSearchBackend, self).__init__(
connection_alias, **connection_options)
setattr(self, 'DEFAULT_SETTINGS', settings.ELASTICSEARCH_INDEX_SETTINGS)
class CustomElasticsearchSearchEngine(ElasticsearchSearchEngine):
backend = CustomElasticsearchSearchBackend
d) 定义自定义标记器(非过滤器!):
e) 使用自动查询(更通用):
f) 更改后重新索引:
$ ./manage.py rebuild_index
# views.py
search_value = 'Simons'
...
person_sqs = \
SearchQuerySet().models(Person).filter(
content=AutoQuery(search_value)
)
$ ./manage.py rebuild_index