文章/答案/技术大牛

发布

社区首页 >问答首页 >用多个联接优化Django ORM查询

问用多个联接优化Django ORM查询
EN

Stack Overflow用户

提问于 2022-08-11 09:41:04

回答 4查看 268关注 0票数 4

在我的应用程序中，我可以描述一个使用不同协议的实体，每个协议都是不同特征的集合，并且每个特征允许两个或多个类。因此，描述是表达式的集合。例如，我想描述一个具有"X“议定书的实体"John”，它包括以下两个特征和类：

协议 ABC

性状 1:高度

可用的类：a.矮小b.中等c.高

性状 2:体重

可用的类：a.轻型b.中c.重型

约翰的Description：表达式 1: c. Tall，表达式 2: b.中等

我的模型规范(简化的barebone要点)：

class Protocol(models.Model):
    """
    A Protocol is a collection of Traits
    """
    name = models.CharField()

class Trait(models.Model):
    """
    Stores the Traits. Each Trait can have multiple Classes
    """

    name = models.CharField()
    protocol = models.ForeignKey(
        Protocol,
        help_text="The reference protocol of the trait",
    )

class Class(models.Model):
    """
    Stores the different Classes related to a Trait.
    """

    name = models.CharField()
    trait = models.ForeignKey(Trait)

class Description(models.Model):
    """
    Stores the Descriptions. A description is a collection of Expressions.
    """

    name = models.CharField()
    protocol = models.ForeignKey(
        Protocol,
        help_text="reference to the protocol used to make the description;\
            this will define which Traits will be available",
    )
    entity = models.ForeignKey(
        Entity,
        help_text="the Entity to which the description refers to",
    )

class Expression(models.Model):
    """
    Stores the expressions of entities related to a specific
    Description. It refers to one particular Class (which is
    then associated with a specific Trait)
    """

    class = models.ForeignKey(Class)
    description = models.ForeignKey(Description)

按照前面的示例，假设我希望找到所有中等或高(特征1)和重(特征2)的实体。我现在使用的查询如下：

# This is the filter returned by the HTML form, which list
# all the available Classes for each Trait of the selected Protocol
filters = [
  {'trait': 1, 'class': [2, 3]},
  {'trait': 2, 'class': [6,]},
]

queryset = Description.objects.all()

for filter in filters:
  queryset = queryset.filter(expression_set__class__in=filter["class"])

问题是查询速度慢(我有ATM1000个描述，用一个由40个特征组成的协议描述，每个特征有2到5个类)。即使只过滤5-6表达式，返回结果也需要大约2秒的时间。我试过使用prefetch_related("expression_set")或prefetch_related("expression_set__class")，但没有明显的改进。

问题是:你能提出一种提高性能的方法吗，或者这仅仅是搜索这么多表的现实？

非常感谢你抽出时间。

编辑:下面是Manager在应用八个过滤器(参见前面的代码片段)时生成的查询。

SELECT "describe_description"."id",
       "describe_description"."name",
       "describe_description"."protocol_id",
  FROM "describe_description"
 INNER JOIN "describe_expression"
    ON ("describe_description"."id" = "describe_expression"."description_id")
 INNER JOIN "describe_expression" T4
    ON ("describe_description"."id" = T4."description_id")
 INNER JOIN "describe_expression" T6
    ON ("describe_description"."id" = T6."description_id")
 INNER JOIN "describe_expression" T8
    ON ("describe_description"."id" = T8."description_id")
 INNER JOIN "describe_expression" T10
    ON ("describe_description"."id" = T10."description_id")
 INNER JOIN "describe_expression" T12
    ON ("describe_description"."id" = T12."description_id")
 INNER JOIN "describe_expression" T14
    ON ("describe_description"."id" = T14."description_id")
 INNER JOIN "describe_expression" T16
    ON ("describe_description"."id" = T16."description_id")
 INNER JOIN "describe_expression" T18
    ON ("describe_description"."id" = T18."description_id")
 WHERE ("describe_expression"."class_id" IN (732) AND T4."class_id" IN (740) AND T6."class_id" IN (760) AND T8."class_id" IN (783) AND T10."class_id" IN (794) AND T12."class_id" IN (851) AND T14."class_id" IN (857) AND T16."class_id" IN (860) AND T18."class_id" IN (874))

python

django

django-models

django-orm

回答 4

Stack Overflow用户

回答已采纳

发布于 2022-08-18 11:50:47

首先，应该通过预先聚合所需的过滤器来避免多个联接：

filters = [
  {'trait': 1, 'class': [2, 3]},
  {'trait': 2, 'class': [6,]},
]

queryset = Description.objects.all()
class_filter = []
for filter_entry in filters:
    class_filter.append(filter_entry["class"])
queryset = queryset.filter(expression_set__class__in=class_filter)

第二个问题是扫描文本值。在您的db_index=True字段上使用Class.name。

编辑：在同一个表上链接过滤器和使用Q对象有区别。它的作用与同一物体不同。就像在sql中可以看到的那样，这似乎有悖于直觉，但是在多个联接上，每个联接实际上重复了描述(这就是为什么它变得缓慢)。最好用姜戈博士或这篇文章来解释。

文档的快速摘录：

为了选择所有的博客，从2008年起至少包含一个标题中有“列侬”的条目(满足这两种条件的相同条目)，我们可以这样写：

Blog.objects.filter(entry__headline__contains='Lennon', entry__pub_date__year=2008)

否则，要执行更宽松的查询，选择标题中仅包含“Lennon”条目的博客，以及2008年的一些条目，我们可以这样写：

Blog.objects.filter(entry__headline__contains='Lennon').filter(entry__pub_date__year=2008)

编辑2:来自这个答案的原理图示例

Blog.objects.filter(entry__headline_contains='Lennon', 
entry__pub_date__year=2008)

只筛选Blog 1

Blog.objects.filter(entry__headline_contains='Lennon').filter(
entry__pub_date__year=2008)

过滤博客1和2

票数 2

Stack Overflow用户

发布于 2022-08-14 11:33:09

我认为，使用多个函数稍微好一些。它以与使用类相同的速度运行，即使不是更快。看看这个问题。在开始使用函数之后，可以尝试使用@cached_property(func，name=None)。

必须多次调用类实例的方法是很常见的。如果这个功能很昂贵，那么这样做是浪费的。使用cached_property装饰器保存属性返回的值；下次调用该实例时，它将返回保存的值，而不是重新计算它。请注意，这只适用于将self作为其唯一参数的方法，并将该方法更改为属性。

考虑一个典型的情况，在将模型实例放入上下文之前，视图可能需要调用模型的方法来执行某些计算，在这种情况下，模板可能再次调用该方法：

# the model
class Person(models.Model):

    def friends(self):
        # expensive computation
        ...
        return friends

# in the view:
if person.friends():
    ...

在模板中，您将拥有：

{% for friend in person.friends %}

在这里，friends()将被调用两次。由于视图中的实例person和模板是相同的，所以用@cached_property装饰friends()方法可以避免这种情况：

from django.utils.functional import cached_property

class Person(models.Model):

    @cached_property
    def friends(self):
        ...

Stack Overflow用户

发布于 2022-08-16 17:51:35

要了解有关查询的更多信息，可以使用Django调试工具栏。这是有用的，因为如果我们不能测量当前的状态，我们很难知道如何改进(就像现在的情况一样)。

Django有一个特定于数据库访问优化的页面。例如，在它中，人们可以读到那个QuerySets懒惰。

由于OP探索了Django ORM，但并没有得到很好的结果，因此为了提高性能，OP可能尝试使用原始SQL查询。换句话说，编写自己的SQL来检索数据。根据文档

Django给出了执行原始SQL查询的两种方法:可以使用Manager.raw()执行原始查询并返回模型实例，也可以完全避开模型层，直接执行自定义SQL。

另一种加快查询速度的方法是添加索引.如果没有这样的方法，就会放慢查询的速度。

此外，OP应该考虑使用一些缓存，比如MemCached。据徐力士称，

缓存是一个临时存储区域，它将代价高昂的响应的结果或频繁访问的数据存储在内存中，以便更快地处理后续请求。(...)缓存层是一个临时数据存储层，比数据库快得多。拥有单独的缓存层的好处包括更好的系统性能、减少数据库工作负载的能力以及独立扩展缓存层的能力。

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/73318552

复制

相似问题

问用多个联接优化Django ORM查询
EN

回答 4

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问用多个联接优化Django ORM查询EN

回答 4

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问用多个联接优化Django ORM查询
EN