加速调用大量实体，并获得唯一值，谷歌应用程序引擎python_Python_Google App Engine

加速调用大量实体，并获得唯一值，谷歌应用程序引擎python

python google-app-engine

加速调用大量实体，并获得唯一值，谷歌应用程序引擎python,python,google-app-engine,Python,Google App Engine,好的，这是一个由两部分组成的问题，我已经看到并搜索了几种方法来获得一个类的唯一值列表，到目前为止，我对任何方法都不满意。因此，任何人都有一个简单的示例代码来获取唯一的值，例如这段代码。这是我的超级慢的例子 class LinkRating2(db.Model): user = db.StringProperty() link = db.StringProperty() rating2 = db.FloatProperty() def uniqueLinkGet(tab

好的，这是一个由两部分组成的问题，我已经看到并搜索了几种方法来获得一个类的唯一值列表，到目前为止，我对任何方法都不满意。
因此，任何人都有一个简单的示例代码来获取唯一的值，例如这段代码。这是我的超级慢的例子

class LinkRating2(db.Model):
    user = db.StringProperty()
    link = db.StringProperty()
    rating2 = db.FloatProperty()

def uniqueLinkGet(tabl):
    start = time.time()
    dic = {}
    query = tabl.all()
    for obj in query:
        dic[obj.link]=1
    end = time.time()
    print end-start
    return dic

我的第二个问题是调用一个迭代器，而不是较慢的获取？有没有更快的方法来完成下面的代码？特别是如果调用的元素数大于1000

query = LinkRating2.all()
link1 = 'some random string'
a = query.filter('link = ', link1)
adic ={}
for itema in a:
    adic[itema.user]=itema.rating2

1）快速查询的一个技巧是对数据进行非规范化。具体来说，创建另一个模型，该模型仅存储链接作为键。然后，只需阅读该表中的所有内容，就可以得到一个独特链接的列表。假设每个链接都有许多

LinkRating2

实体，那么这将为您节省大量时间。例如：

class Link(db.Model):
    pass  # the only data in this model will be stored in its key

# Whenever a link is added, you can try to add it to the datastore.  If it already
# exists, then this is functionally a no-op - it will just overwrite the old copy of
# the same link.  Using link as the key_name ensures there will be no duplicates.
Link(key_name=link).put()

# Get all the unique links by simply retrieving all of its entities and extracting
# the link field.  You'll need to use cursors if you have >1,000 entities.
unique_links = [x.key().name() for Link.all().fetch(1000)]

另一个想法是：如果您需要经常执行此查询，那么在memcache中保留一份结果副本，这样您就不必一直从数据存储中读取所有这些数据。单个memcache条目只能存储1MB的数据，因此您可能必须将链接数据拆分为块，以将其存储在memcache中

2）使用

fetch（）

比使用迭代器更快。迭代器使实体被提取进来——每个“小批量”都会导致数据存储的往返以获取更多数据。如果使用

fetch（）

，则只需一次往返数据存储，即可一次获取所有数据。简而言之，如果您知道需要大量结果，请使用

fetch（）