Python Wagtail默认搜索不使用非英语字段_Python_Wagtail

Python Wagtail默认搜索不使用非英语字段

python

Python Wagtail默认搜索不使用非英语字段,python,wagtail,Python,Wagtail,我在项目中使用默认数据库后端进行搜索功能： from __future__ import absolute_import, unicode_literals from django.core.paginator import EmptyPage, PageNotAnInteger, Paginator from django.shortcuts import render from home.models import BlogPage, get_all_tags from wagtail.

我在项目中使用默认数据库后端进行搜索功能：

from __future__ import absolute_import, unicode_literals

from django.core.paginator import EmptyPage, PageNotAnInteger, Paginator
from django.shortcuts import render

from home.models import BlogPage, get_all_tags
from wagtail.wagtailsearch.models import Query


def search(request):
    search_query = request.GET.get('query', None)
    page = request.GET.get('page', 1)

    # Search
    if search_query:
        search_results = BlogPage.objects.live().search(search_query)
        query = Query.get(search_query)

        # Record hit
        query.add_hit()
    else:
        search_results = BlogPage.objects.none()

    # Pagination
    paginator = Paginator(search_results, 10)
    try:
        search_results = paginator.page(page)
    except PageNotAnInteger:
        search_results = paginator.page(1)
    except EmptyPage:
        search_results = paginator.page(paginator.num_pages)

    return render(request, 'search/search.html', {
        'search_query': search_query,
        'blogpages': search_results,
        'tags': get_all_tags()
    })

博客页面：

class BlogPage(Page):
    date = models.DateField("Post date")
    intro = models.CharField(max_length=250)
    body = StreamField([
        ('heading', blocks.CharBlock(classname="full title")),
        ('paragraph', blocks.RichTextBlock()),
        ('image', ImageChooserBlock()),
        ('code', CodeBlock()),
    ])
    tags = ClusterTaggableManager(through=BlogPageTag, blank=True)

    search_fields = Page.search_fields + [
        index.SearchField('intro'),
        index.SearchField('body'),
    ]
    ...

只有当

BlogPage

模型中的

body

字段为英语时，搜索才能正常工作，如果我尝试在

body

字段中使用一些俄语单词，则它不会搜索任何内容。我查看了数据库，发现

BlogPage

有

body

字段，如下所示：

[{"value": "\u0442\u0435\u0441\u0442\u043e\u0432\u044b\u0439", "id": "3343151a-edbc-4165-89f2-ce766922d68e", "type": "heading"}, {"value": "<p>\u0442\u0435\u0441\u0442\u0438\u043f\u0440</p>", "id": "22d3818d-8c69-4d72-967e-7c1f807e80b2", "type": "paragraph"}]

[{“value”：“\u0442\u0435\u0441\u0442\u043e\u0432\u044b\u0439”，“id”：“334351a-edbc-4165-89f2-ce766922d68e”，“type”：“heading”}，{“value”：“\u0442\u0435\u0441\u0442\u0438\u043f\u0440，“id”：“22d3818c69-4d72-967e-7c80b2”，“type”]

因此，问题是wagtail将Streamfield字段保存为unicode字符，如果我在phpmyadmin中手动更改为：

[{"value": "Тест", "id": "3343151a-edbc-4165-89f2-ce766922d68e", "type": "heading"}, {"value": "<p>Тестовый</p>", "id": "22d3818d-8c69-4d72-967e-7c1f807e80b2", "type": "paragraph"}]

[{“值”：“id”：“334351A-edbc-4165-89f2-ce766922d68e”，“类型”：“标题”}，{“值”：“id”：“22d3818d-8c69-4d72-967e-7c1f807e80b2”，“类型”：“段落”}]

然后搜索开始工作，这样也许有人知道如何防止wagtail用unicode保存

Streamfield

字段？

我讨厌这种解决方法，但我决定只添加另一个字段

search\u body

和

search\u intro

，然后使用它们进行搜索：

class BlogPage(Page):
    date = models.DateField("Post date")
    intro = models.CharField(max_length=250)
    body = StreamField([
        ('heading', blocks.CharBlock(classname="full title")),
        ('paragraph', blocks.RichTextBlock()),
        ('image', ImageChooserBlock()),
        ('code', CodeBlock()),
    ])
    search_intro = models.CharField(max_length=250)
    search_body = models.CharField(max_length=50000)
    tags = ClusterTaggableManager(through=BlogPageTag, blank=True)

    def main_image(self):
        gallery_item = self.gallery_images.first()
        if gallery_item:
            return gallery_item.image
        else:
            return None

    def get_context(self, request):
        context = super(BlogPage, self).get_context(request)
        context['tags'] = get_all_tags()
        context['page_url'] = urllib.parse.urljoin(BASE_URL, self.url)
        return context

    def save(self, *args, **kwargs):
        if self.body.stream_data and isinstance(
                self.body.stream_data[0], tuple):
            self.search_body = ''
            for block in self.body.stream_data:
                if len(block) >= 2:
                    self.search_body += str(block[1])
        self.search_intro = self.intro.lower()
        self.search_body = self.search_body.lower()
        return super().save(*args, **kwargs)

    search_fields = Page.search_fields + [
        index.SearchField('search_intro'),
        index.SearchField('search_body'),
    ]
    ...

搜索/views.py：

def search(request):
    search_query = request.GET.get('query', None)
    page = request.GET.get('page', 1)

    # Search
    if search_query:
        search_results = BlogPage.objects.live().search(search_query.lower())
        query = Query.get(search_query)
    ...

亚历克斯，谢谢你

但我接到了两个保存方法的电话
我应该使用以下代码：

def save(self, *args, **kwargs): search_body = '' if self.blog_post_body.stream_data and isinstance( self.blog_post_body.stream_data[0], dict): for block in self.blog_post_body.stream_data: if block.get('type', '') in ('some_header', 'some_text'): search_body += str(block['value']) self.search_body = search_body super(BlogPost, self).save(*args, **kwargs)

StreamField使用DjangoJSONEncoder对JSON进行编码，这确保了_ascii=True。然后您将看到Unicode显示为“\u…”。默认的db search后端仅使用数据库文本匹配，并将使用非ASCII关键字查询失败

def get_prep_value(self, value): if isinstance(value, StreamValue) and not(value) and value.raw_text is not None: # An empty StreamValue with a nonempty raw_text attribute should have that # raw_text attribute written back to the db. (This is probably only useful # for reverse migrations that convert StreamField data back into plain text # fields.) return value.raw_text else: return json.dumps(self.stream_block.get_prep_value(value), cls=DjangoJSONEncoder)
您需要对StreamField进行子类化，并提供一个自定义JSONEncoder，确保_ascii=False。但是，您需要确保数据库在默认情况下可以处理utf-8字符串。（对于PostgreSQL应该没问题）

如果您切换到另一个后端，如PG搜索后端。它将在构建索引时从StreamField中提取文本（由引入）。您不会有问题。
您没有提到正在使用哪个搜索后端。你使用Elasticsearch吗？我成功地使用Elasticsearch合并了德语搜索。看起来你没有。或者你只是在
BlogPage
中省略了search\u字段声明？我已经指定了search\u字段（将这些行添加到问题中），我想我使用默认的数据库后端进行搜索。我应该怎么做才能切换到Elasticsearch？我应该将数据库更改为elastichsearch，并更改wagtailsearch配置？您应该先看一下开始。PostgreSQL后端更容易操作。是的，也许你的代码更好，但是我认为这个问题与SqLite db有关，当我切换到postgres时，问题消失了，所以我认为最好不要使用这种方法