Python 使用Google App Engine索引搜索返回整个数据集_Python_Google App Engine

Python 使用Google App Engine索引搜索返回整个数据集

python google-app-engine

Python 使用Google App Engine索引搜索返回整个数据集,python,google-app-engine,Python,Google App Engine,有没有办法在应用程序引擎搜索索引中获取整个数据集？下面的搜索通过QueryOptions获取整数限制，以及始终需要存在的限制我无法确定是否有一些特殊标志可以绕过此限制并返回整个结果集。如果查询没有查询选项，则结果集限制为20 _INDEX = search.Index(name=constants.SEARCH_INDEX) _INDEX.search(query=search.Query( query, options=search.QueryOptions( limit

有没有办法在应用程序引擎搜索索引中获取整个数据集？下面的搜索通过

QueryOptions

获取整数限制，以及始终需要存在的限制

我无法确定是否有一些特殊标志可以绕过此限制并返回整个结果集。如果查询没有

查询选项

，则结果集限制为20

_INDEX = search.Index(name=constants.SEARCH_INDEX)
_INDEX.search(query=search.Query(
  query,
  options=search.QueryOptions(
      limit=limit,
      sort_options=search.SortOptions(...))))

有什么想法吗？

首先，如果你仔细查看查询选项的构造函数，这就回答了你的问题，为什么它会返回20个结果：

def __init__(self, limit=20, number_found_accuracy=None, cursor=None,
               offset=None, sort_options=None, returned_fields=None,
               ids_only=False, snippeted_fields=None,
               returned_expressions=None):

我认为API这样做的原因是为了避免不必要的结果获取。如果需要在用户操作时获取更多结果，而不是总是获取所有结果，则应使用偏移量。看

如果您确实希望索引中的每个文档而不是查询中的每个结果，那么您可以自定义delete all示例

所以你可能会得到这样的结果：

while True:
    document_ids = [document.doc_id
                    for document in doc_index.get_range(ids_only=True)]
    if not document_ids:
        break
    # Get then something with the document
    for id in document_ids:
        document = index.get(id)

您可能希望在列表中获取文档本身，而不是获取ID，然后从该ID获取文档，但是您明白了

是的，我读过，但我需要一个后端统计页面的所有结果。不需要分页。我不太明白您在后端统计页面上试图做什么。也许如果你描述一下你的用例会有所帮助。我的猜测是也许你可以把一些巨大的数字作为限制。毕竟，如果您有一个数据存储，其中充满了与搜索查询匹配的内容。我会质疑这些数据或搜索查询有多有用。我需要的基本上是将搜索结果转储到一个简单的html页面上。。我认为，如果没有任何工作的分页可以研究。不过，目前的情况是，同一个stats页面从NDB获取文档列表。但是数据存储没有任何属性用于统计页面上需要的一些新信息。然而，设计的搜索索引会获取这些额外数据以及文档信息。这使得结果查找非常方便。否则，我将不得不通过另一个或两个表来获取相同的数据，并以某种方式将它们连接起来。@abhink您是否找到了强制获取所有结果的方法，还是使用了分页？我有一个模拟用例，需要所有结果。@StephanCelis我们不得不求助于分页。虽然要生成所需的视图，我们只是从相关的数据存储模型中获取每个实体。数据未编制索引，但仍完全存在。我记得我用微线程来处理这个问题。太好了。。我要试试这个。如果结果的数量达到数万个，你认为这项技术的扩展程度如何？我想这取决于你需要以多快的速度重新生成结果。

from google.appengine.api import search

def delete_all_in_index(index_name):
    """Delete all the docs in the given index."""
    doc_index = search.Index(name=index_name)

    # looping because get_range by default returns up to 100 documents at a time
    while True:
        # Get a list of documents populating only the doc_id field and extract the ids.
        document_ids = [document.doc_id
                        for document in doc_index.get_range(ids_only=True)]
        if not document_ids:
            break
        # Delete the documents for the given ids from the Index.
        doc_index.delete(document_ids)

while True:
    document_ids = [document.doc_id
                    for document in doc_index.get_range(ids_only=True)]
    if not document_ids:
        break
    # Get then something with the document
    for id in document_ids:
        document = index.get(id)