Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/343.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python Wagtail默认搜索不使用非英语字段_Python_Wagtail - Fatal编程技术网

Python Wagtail默认搜索不使用非英语字段

Python Wagtail默认搜索不使用非英语字段,python,wagtail,Python,Wagtail,我在项目中使用默认数据库后端进行搜索功能: from __future__ import absolute_import, unicode_literals from django.core.paginator import EmptyPage, PageNotAnInteger, Paginator from django.shortcuts import render from home.models import BlogPage, get_all_tags from wagtail.

我在项目中使用默认数据库后端进行搜索功能:

from __future__ import absolute_import, unicode_literals

from django.core.paginator import EmptyPage, PageNotAnInteger, Paginator
from django.shortcuts import render

from home.models import BlogPage, get_all_tags
from wagtail.wagtailsearch.models import Query


def search(request):
    search_query = request.GET.get('query', None)
    page = request.GET.get('page', 1)

    # Search
    if search_query:
        search_results = BlogPage.objects.live().search(search_query)
        query = Query.get(search_query)

        # Record hit
        query.add_hit()
    else:
        search_results = BlogPage.objects.none()

    # Pagination
    paginator = Paginator(search_results, 10)
    try:
        search_results = paginator.page(page)
    except PageNotAnInteger:
        search_results = paginator.page(1)
    except EmptyPage:
        search_results = paginator.page(paginator.num_pages)

    return render(request, 'search/search.html', {
        'search_query': search_query,
        'blogpages': search_results,
        'tags': get_all_tags()
    })
博客页面:

class BlogPage(Page):
    date = models.DateField("Post date")
    intro = models.CharField(max_length=250)
    body = StreamField([
        ('heading', blocks.CharBlock(classname="full title")),
        ('paragraph', blocks.RichTextBlock()),
        ('image', ImageChooserBlock()),
        ('code', CodeBlock()),
    ])
    tags = ClusterTaggableManager(through=BlogPageTag, blank=True)

    search_fields = Page.search_fields + [
        index.SearchField('intro'),
        index.SearchField('body'),
    ]
    ...
只有当
BlogPage
模型中的
body
字段为英语时,搜索才能正常工作,如果我尝试在
body
字段中使用一些俄语单词,则它不会搜索任何内容。 我查看了数据库,发现
BlogPage
body
字段,如下所示:

[{"value": "\u0442\u0435\u0441\u0442\u043e\u0432\u044b\u0439", "id": "3343151a-edbc-4165-89f2-ce766922d68e", "type": "heading"}, {"value": "<p>\u0442\u0435\u0441\u0442\u0438\u043f\u0440</p>", "id": "22d3818d-8c69-4d72-967e-7c1f807e80b2", "type": "paragraph"}]
[{“value”:“\u0442\u0435\u0441\u0442\u043e\u0432\u044b\u0439”,“id”:“334351a-edbc-4165-89f2-ce766922d68e”,“type”:“heading”},{“value”:“\u0442\u0435\u0441\u0442\u0438\u043f\u0440

,“id”:“22d3818c69-4d72-967e-7c80b2”,“type”]
因此,问题是wagtail将Streamfield字段保存为unicode字符,如果我在phpmyadmin中手动更改为:

[{"value": "Тест", "id": "3343151a-edbc-4165-89f2-ce766922d68e", "type": "heading"}, {"value": "<p>Тестовый</p>", "id": "22d3818d-8c69-4d72-967e-7c1f807e80b2", "type": "paragraph"}]
[{“值”:“id”:“334351A-edbc-4165-89f2-ce766922d68e”,“类型”:“标题”},{“值”:“id”:“22d3818d-8c69-4d72-967e-7c1f807e80b2”,“类型”:“段落”}]

然后搜索开始工作,这样也许有人知道如何防止wagtail用unicode保存
Streamfield
字段?

我讨厌这种解决方法,但我决定只添加另一个字段
search\u body
search\u intro
,然后使用它们进行搜索:

class BlogPage(Page):
    date = models.DateField("Post date")
    intro = models.CharField(max_length=250)
    body = StreamField([
        ('heading', blocks.CharBlock(classname="full title")),
        ('paragraph', blocks.RichTextBlock()),
        ('image', ImageChooserBlock()),
        ('code', CodeBlock()),
    ])
    search_intro = models.CharField(max_length=250)
    search_body = models.CharField(max_length=50000)
    tags = ClusterTaggableManager(through=BlogPageTag, blank=True)

    def main_image(self):
        gallery_item = self.gallery_images.first()
        if gallery_item:
            return gallery_item.image
        else:
            return None

    def get_context(self, request):
        context = super(BlogPage, self).get_context(request)
        context['tags'] = get_all_tags()
        context['page_url'] = urllib.parse.urljoin(BASE_URL, self.url)
        return context

    def save(self, *args, **kwargs):
        if self.body.stream_data and isinstance(
                self.body.stream_data[0], tuple):
            self.search_body = ''
            for block in self.body.stream_data:
                if len(block) >= 2:
                    self.search_body += str(block[1])
        self.search_intro = self.intro.lower()
        self.search_body = self.search_body.lower()
        return super().save(*args, **kwargs)

    search_fields = Page.search_fields + [
        index.SearchField('search_intro'),
        index.SearchField('search_body'),
    ]
    ...
搜索/views.py:

def search(request):
    search_query = request.GET.get('query', None)
    page = request.GET.get('page', 1)

    # Search
    if search_query:
        search_results = BlogPage.objects.live().search(search_query.lower())
        query = Query.get(search_query)
    ...
亚历克斯,谢谢你

但我接到了两个保存方法的电话

我应该使用以下代码:

    def save(self, *args, **kwargs):
    search_body = ''
    if self.blog_post_body.stream_data and isinstance(
            self.blog_post_body.stream_data[0], dict):
        for block in self.blog_post_body.stream_data:
            if block.get('type', '') in ('some_header', 'some_text'):
                search_body += str(block['value'])
    self.search_body = search_body
    super(BlogPost, self).save(*args, **kwargs)

StreamField使用DjangoJSONEncoder对JSON进行编码,这确保了_ascii=True。然后您将看到Unicode显示为“\u…”。默认的db search后端仅使用数据库文本匹配,并将使用非ASCII关键字查询失败

    def get_prep_value(self, value):
        if isinstance(value, StreamValue) and not(value) and value.raw_text is not None:
            # An empty StreamValue with a nonempty raw_text attribute should have that
            # raw_text attribute written back to the db. (This is probably only useful
            # for reverse migrations that convert StreamField data back into plain text
            # fields.)
            return value.raw_text
        else:
            return json.dumps(self.stream_block.get_prep_value(value), cls=DjangoJSONEncoder)
您需要对StreamField进行子类化,并提供一个自定义JSONEncoder,确保_ascii=False。但是,您需要确保数据库在默认情况下可以处理utf-8字符串。(对于PostgreSQL应该没问题)


如果您切换到另一个后端,如PG搜索后端。它将在构建索引时从StreamField中提取文本(由引入)。您不会有问题。

您没有提到正在使用哪个搜索后端。你使用Elasticsearch吗?我成功地使用Elasticsearch合并了德语搜索。看起来你没有。或者你只是在
BlogPage
中省略了search\u字段声明?我已经指定了search\u字段(将这些行添加到问题中),我想我使用默认的数据库后端进行搜索。我应该怎么做才能切换到Elasticsearch?我应该将数据库更改为elastichsearch,并更改wagtailsearch配置?您应该先看一下开始。PostgreSQL后端更容易操作。是的,也许你的代码更好,但是我认为这个问题与SqLite db有关,当我切换到postgres时,问题消失了,所以我认为最好不要使用这种方法