Python中更有效的循环_Python_Django_Performance_Nested Loops

Python中更有效的循环

python django performance

Python中更有效的循环,python,django,performance,nested-loops,Python,Django,Performance,Nested Loops,在这种情况下，我需要在两个对象列表上循环，找到相等的对象，然后在它们的字段上循环，并更改一些属性。看起来像这样 for new_product in products_and_articles['products']: for old_product in products_for_update: if new_product.article == old_product.article: for old_field in old_product._meta.get_al

在这种情况下，我需要在两个对象列表上循环，找到相等的对象，然后在它们的字段上循环，并更改一些属性。看起来像这样

for new_product in products_and_articles['products']:
  for old_product in products_for_update:
    if new_product.article == old_product.article:
      for old_field in old_product._meta.get_all_field_names():
        for new_field in new_product._meta.get_all_field_names():
          if old_field == new_field and old_field != 'id' and old_field != 'slug':
            setattr(old_product, old_field, getattr(new_product, old_field))

显然，这远远不是好的，甚至是不可接受的。因此，我正在寻求建议，如何避免如此多的循环并增强算法

您可以使用set查找交叉点，而不是在两个列表上循环并检查是否相等：

set(products_and_articles['products']).intersection(set(products_for_update))

例如：

>>> l=[1,2,3]
>>> a=[2,3,4]
>>> set(l).intersection(set(a))
set([2, 3])

可将的前两个更改为：

from itertools import product


for new_product, old_product in product(list1, list2)
    # logic and other loops

您可以对两个内部循环执行相同的操作：

我们从四个循环开始，效率为^2*k^2，n是项目数，k是属性数。让我们看看我们能做些什么

首先，摆脱新的产品循环，你不需要它：

for old_field in old_product._meta.get_all_field_names():
    for new_field in new_product._meta.get_all_field_names():
        if old_field == new_field and old_field != 'id' and old_field != 'slug':
            setattr(old_product, old_field, getattr(new_product, old_field))

致：

在^2*k上找到它。现在是产品查找部分

首先，对两个列表进行排序，然后像在“合并排序”中合并列表一样继续操作：

a = sorted(products_and_articles['products'], key=lambda x: x.article)
b = sorted(products_for_update, key=lambda x: x.article)
i = j = 0
while(i < len(a) and j < len(b)):
    if (a[i].article < b[j].article):
        a += 1
        continue
    if (a[i].article > b[j].article):
        b += 1
        continue
    ...logic...
    a += 1  # Maybe you want to get rid of this one, I'm not sure..
    b += 1

根据数据库的大小，它可能或多或少是足够的，因为它需要您创建新的排序列表。内存不是很重，反正只是参考文献，但是如果你有很长的列表和有限的空间，那么巨大的效率胜利可能无法弥补

把它记下来，这是我能做的最好的了。您可能可以使用字典将其降到更低，但这需要您更改数据库，因此需要更多的时间和精力。

如果您将流程分解为逻辑上的、可重用的部分，则会有所帮助

for new_product in products_and_articles['products']:
  for old_product in products_for_update:
    if new_product.article == old_product.article:
      …

例如，这里您要做的是查找与特定文章匹配的产品。由于这篇文章是独一无二的，我们可以这样写：

def find_products_by_article(products, article):
  '''Find all products that match the given article.  Returns
  either a product or 'None' if it doesn't exist.'''
  for products in products:
    return product

然后用以下词语来称呼它：

for old_product in products_for_update:
  new_products = find_products_by_article(
                   products_and_articles['products'],
                   old_product.article)
  …

但是，如果我们能够利用为查找而优化的数据结构，即dict常量而不是线性复杂度，那么这可能会更有效率。因此，我们可以做的是：

# build a dictionary that stores products indexed by article
products_by_article = dict(product.article, product for product in
                           products_and_articles['products'])

for old_product in products_for_update:
  try:
    # look up product in the dictionary
    new_product = products_by_article[old_product.article]
  except KeyError:
    # silently ignore products that don't exist
    continue
  …

如果您经常进行这样的查找，那么最好也在其他地方重复使用products\u by\u article dictionary，而不是每次都从头开始构建一个。但要小心：如果您使用产品记录的多个表示形式，则需要使它们始终保持同步

对于内部循环，请注意此处的新_字段仅用于检查字段是否存在：

…
  for old_field in old_product._meta.get_all_field_names():
    for new_field in new_product._meta.get_all_field_names():
      if old_field == new_field and old_field != 'id' and old_field != 'slug':
        setattr(old_product, old_field, getattr(new_product, old_field))

请注意，这有点可疑：旧产品中不存在的任何新字段都会被悄悄地丢弃：这是故意的吗

可以按如下方式重新打包：

def transfer_fields(old, new, exclusions=('id', 'slug')):
  '''Update all pre-existing fields in the old record to have
  the same values as the new record.  The 'exclusions' parameter
  can be used to exclude certain fields from being updated.'''
  # use a set here for efficiency reasons
  fields = frozenset(old._meta.get_all_field_names())
  fields.difference_update(new._meta.get_all_field_names())
  fields.difference_update(exclusions)
  for field in fields:
    setattr(old, field, getattr(new, field))

综上所述：

# dictionary of products indexed by article
products_by_article = dict(product.article, product for product in
                           products_and_articles['products'])

for old_product in products_for_update:
  try:
    new_product = products_by_article[old_product.article]
  except KeyError:
    continue          # ignore non-existent products
  transfer_fields(old_product, new_product)

最后一个代码的时间复杂度为On×k，其中n是产品的数量，k是字段的数量。

是否删除新的\u字段循环？不管怎样，你都不使用new_字段。另外，对这两个列表进行排序，得到的是nlogn而不是n^2。你能举一个两个输入列表的快速示例吗，还有一个简单的输出示例？我在这里做的是在两个列表中搜索相等的乘积，然后在匹配的对象中搜索相等的模型字段，然后用新数据更新旧对象的字段这看起来像django，如果是这样-让数据库执行重载这只会使时间加倍。。我不认为组合和for循环之间有太大的区别，而且你会发现它们都属于同一个列表。我的观点是，在这里设置集合不是一个好方法，因为每个列表可能都有重复的项，而集合将删除它们。@Urb是的，但是因为它没有改变主列表，op只想要交叉点，所以我认为它很好！请检查我上面的评论，很抱歉，但比这更复杂。您处理的对象不是数字-它们是对象，因此您必须形成.article的集合，或者在对象中实现哈希方法。即使这样，要将属性从旧属性转换为新属性，也需要对两个对象都使用参照；你需要在循环中搜索，没有办法绕过它。请注意，你也在很好地组织它，它仍然在^2*k上。是的，通过重新设计数据结构可以提高效率，但我将其作为OP.Plus的练习，我不知道这篇文章是否是一把独特的钥匙——它在如何完成这件事上会有一些不同。你能写下你对重新设计的建议吗，因为我希望有一天管理层会说“让这件事做好”：完成。您不必重新设计本身，您只需在本地为代码的这一部分使用新的数据结构即可。

…
  for old_field in old_product._meta.get_all_field_names():
    for new_field in new_product._meta.get_all_field_names():
      if old_field == new_field and old_field != 'id' and old_field != 'slug':
        setattr(old_product, old_field, getattr(new_product, old_field))

def transfer_fields(old, new, exclusions=('id', 'slug')):
  '''Update all pre-existing fields in the old record to have
  the same values as the new record.  The 'exclusions' parameter
  can be used to exclude certain fields from being updated.'''
  # use a set here for efficiency reasons
  fields = frozenset(old._meta.get_all_field_names())
  fields.difference_update(new._meta.get_all_field_names())
  fields.difference_update(exclusions)
  for field in fields:
    setattr(old, field, getattr(new, field))

# dictionary of products indexed by article
products_by_article = dict(product.article, product for product in
                           products_and_articles['products'])

for old_product in products_for_update:
  try:
    new_product = products_by_article[old_product.article]
  except KeyError:
    continue          # ignore non-existent products
  transfer_fields(old_product, new_product)