Python 使用索引和更好的SQL优化循环中的查询集

Python 使用索引和更好的SQL优化循环中的查询集,python,django,Python,Django,我有一个视图可以返回一些关于电子邮件列表增长的统计信息。涉及的模式包括: 型号.py class Contact(models.Model): email_list = models.ForeignKey(EmailList, related_name='contacts') customer = models.ForeignKey('Customer', related_name='contacts') status = models.CharField(max_len

我有一个视图可以返回一些关于电子邮件列表增长的统计信息。涉及的模式包括:

型号.py

class Contact(models.Model):
    email_list = models.ForeignKey(EmailList, related_name='contacts')
    customer = models.ForeignKey('Customer', related_name='contacts')
    status = models.CharField(max_length=8)
    create_date = models.DateTimeField(auto_now_add=True)


class EmailList(models.Model):
    customers = models.ManyToManyField('Customer',
        related_name='lists',
        through='Contact')


class Customer(models.Model):
    is_unsubscribed = models.BooleanField(default=False, db_index=True)
    unsubscribe_date = models.DateTimeField(null=True, blank=True, db_index=True)
class ListHealthView(View):
    def get(self, request, *args, **kwargs):
        start_date, end_date = get_dates_from_querystring(request)

        data = []
        for email_list in EmailList.objects.all():
            # historic data up to start_date
            past_contacts = email_list.contacts.filter(
                status='active',
                create_date__lt=start_date).count()
            past_unsubscribes = email_list.customers.filter(
                is_unsubscribed=True,
                unsubscribe_date__lt=start_date,
                contacts__status='active').count()
            past_deleted = email_list.contacts.filter(
                status='deleted',
                modify_date__lt=start_date).count()
            # data for the given timeframe
            new_contacts = email_list.contacts.filter(
                status='active',
                create_date__range=(start_date, end_date)).count()
            new_unsubscribes = email_list.customers.filter(
                is_unsubscribed=True,
                unsubscribe_date__range=(start_date, end_date),
                contacts__status='active').count()
            new_deleted = email_list.contacts.filter(
                status='deleted',
                modify_date__range=(start_date, end_date)).count()

            data.append({
                'new_contacts': new_contacts,
                'new_unsubscribes': new_unsubscribes,
                'new_deleted': new_deleted,
                'past_contacts': past_contacts,
                'past_unsubscribes': past_unsubscribes,
                'past_deleted': past_deleted,
            })
        return Response({'data': data})
在视图中,我所做的是迭代所有EmailList对象并获取一些指标:以下方式:

view.py

class Contact(models.Model):
    email_list = models.ForeignKey(EmailList, related_name='contacts')
    customer = models.ForeignKey('Customer', related_name='contacts')
    status = models.CharField(max_length=8)
    create_date = models.DateTimeField(auto_now_add=True)


class EmailList(models.Model):
    customers = models.ManyToManyField('Customer',
        related_name='lists',
        through='Contact')


class Customer(models.Model):
    is_unsubscribed = models.BooleanField(default=False, db_index=True)
    unsubscribe_date = models.DateTimeField(null=True, blank=True, db_index=True)
class ListHealthView(View):
    def get(self, request, *args, **kwargs):
        start_date, end_date = get_dates_from_querystring(request)

        data = []
        for email_list in EmailList.objects.all():
            # historic data up to start_date
            past_contacts = email_list.contacts.filter(
                status='active',
                create_date__lt=start_date).count()
            past_unsubscribes = email_list.customers.filter(
                is_unsubscribed=True,
                unsubscribe_date__lt=start_date,
                contacts__status='active').count()
            past_deleted = email_list.contacts.filter(
                status='deleted',
                modify_date__lt=start_date).count()
            # data for the given timeframe
            new_contacts = email_list.contacts.filter(
                status='active',
                create_date__range=(start_date, end_date)).count()
            new_unsubscribes = email_list.customers.filter(
                is_unsubscribed=True,
                unsubscribe_date__range=(start_date, end_date),
                contacts__status='active').count()
            new_deleted = email_list.contacts.filter(
                status='deleted',
                modify_date__range=(start_date, end_date)).count()

            data.append({
                'new_contacts': new_contacts,
                'new_unsubscribes': new_unsubscribes,
                'new_deleted': new_deleted,
                'past_contacts': past_contacts,
                'past_unsubscribes': past_unsubscribes,
                'past_deleted': past_deleted,
            })
        return Response({'data': data})
现在这很好,但随着我的数据库开始增长,此视图的响应时间超过1s,偶尔会导致数据库中长时间运行的查询。我认为最明显的改进是索引
EmailList.customers
,但我认为可能需要一个复合索引?还有,有没有更好的方法?也许使用聚合

编辑

在@bdoubleu回答之后,我尝试了以下方法:

data = (
    EmailList.objects.annotate(
        past_contacts=Count(Subquery(
            Contact.objects.values('id').filter(
                email_list=F('pk'),
                status='active',
                create_date__lt=start_date)
        )),
        past_deleted=Count(Subquery(
            Contact.objects.values('id').filter(
                email_list=F('pk'),
                status='deleted',
                modify_date__lt=start_date)
        )),
    )
    .values(
        'past_contacts', 'past_deleted',
    )
)
我不得不改用
F
而不是
OuterRef
,因为我意识到我的模型
EmailList
id=HashidAutoField(primary_key=True,salt='…'),
导致了
编程错误:作为表达式使用的子查询返回了多行,但我不能完全确定


现在查询工作了,但遗憾的是,所有计数都返回为0,因为您的代码为每个
EmailList
实例生成6个查询。对于100个实例,至少需要600个查询,这会减慢速度

您可以使用表达式和
.values()
进行优化

更新:对于较旧版本的Django,您的子查询可能需要如下所示

customers = (
    Customer.objects
    .annotate(
        template_count=Subquery(
            CustomerTemplate.objects
            .filter(customer=OuterRef('pk'))
            .values('customer')
            .annotate(count=Count('*')).values('count')
        )
    ).values('name', 'template_count')
)

我不是Django专家,似乎你在循环体中执行多个查询:
email\u list.contacts.filter
email\u list.customers.filter
,但我不确定我的
Subquery
不能与
F
一起工作,它必须是
outeref
,这太棒了,但是:
ProgrammingError:subquery必须只返回一列
,如果我添加类似的内容:
Contact.objects.values('id').filter(…)
,我得到:
ProgrammingError:subquery返回的多行用作表达式
“非常非常肯定”这里的正确答案[概念…]。如果可能,使用SQL监视器让您看到Django生成的查询实际上是什么。然后,使用子查询或其他任何必须使用的查询,以便SQL server使用尽可能少的查询来完成工作。现在最要命的是,你将通信链路转了600次。@这是我的错误-我忘了在子项的末尾添加
.values('id')
(必须在筛选器之后)。现在应该可以走了。@bdoubleu我想
values('id')
会返回一行,但是:
编程错误:一个用作表达式的子查询返回了不止一行。
?@pepperonipa是否可能错过了
Count()
?如果没有,请粘贴您的代码