Python SQLAlchemy无缘无故发出交叉连接

Python SQLAlchemy无缘无故发出交叉连接,python,postgresql,sqlalchemy,Python,Postgresql,Sqlalchemy,我在SQLAlchemy中设置了一个运行有点慢的查询,并尝试对其进行优化。由于未知的原因,结果使用了隐式交叉联接,这不仅大大减慢了速度,而且产生了完全错误的结果。我已经匿名化了表名和参数,但没有做任何更改。有人知道这是从哪里来的吗 为了便于查找:新发出的SQL和旧发出的SQL的区别在于,新发出的SQL具有更长的SELECT,并在任何联接之前的WHERE中提及所有三个表 原始代码: cust_name = u'Bob' proj_name = u'job1' item_color = u'blue

我在SQLAlchemy中设置了一个运行有点慢的查询,并尝试对其进行优化。由于未知的原因,结果使用了隐式交叉联接,这不仅大大减慢了速度,而且产生了完全错误的结果。我已经匿名化了表名和参数,但没有做任何更改。有人知道这是从哪里来的吗

为了便于查找:新发出的SQL和旧发出的SQL的区别在于,新发出的SQL具有更长的SELECT,并在任何联接之前的WHERE中提及所有三个表

原始代码:

cust_name = u'Bob'
proj_name = u'job1'
item_color = u'blue'
query = (db.session.query(Item.name)
                       .join(Project, Customer)
                       .filter(Customer.name == cust_name,
                               Project.name == proj_name)
                       .distinct(Item.name))

# some conditionals determining last filter, resolving to this one:
query = query.filter(Item.color == item_color)

result = query.all()
flask_sqlalchemy.get_debug_查询记录的原始发出的SQL:

QUERY: SELECT DISTINCT ON (items.name) items.name AS items_name
FROM items JOIN projects ON projects.id = items._project_id JOIN customers ON customers.id = projects._customer_id
WHERE customers.name = %(name_1)s AND projects.name = %(name_2)s AND items.color = %(color_1)s
Parameters: `{'name_2': u'job1', 'state_1': u'blue', 'name_1': u'Bob'}
QUERY: SELECT DISTINCT ON (items.nygc_id) items.id AS items_id, items.name AS items_name, items.color AS items_color, items._project_id AS items__project_id, customers_1.id AS customers_1_id, customers_1.name AS customers_1_name, projects_1.id AS projects_1_id, projects_1.name AS projects_1_name
FROM customers, projects, items JOIN projects AS projects_1 ON projects_1.id = items._project_id JOIN customers AS customers_1 ON customers_1.id = projects_1._customer_id
WHERE customers.name = %(name_1)s AND projects.name = %(name_2)s AND items.color = %(color_1)s
Parameters: `{'state_1': u'blue', 'name_2': u'job1', 'name_1': u'Bob'}
新代码:

cust_name = u'Bob'
proj_name = u'job1'
item_color = u'blue'
query = (db.session.query(Item)
                     .options(Load(Item).load_only('name', 'color'),
                                joinedload(Item.project, innerjoin=True).load_only('name').
                                joinedload(Project.customer, innerjoin=True).load_only('name'))
                     .filter(Customer.name == cust_name,
                                 Project.name == proj_name)
                     .distinct(Item.name))

# some conditionals determining last filter, resolving to this one:
query = query.filter(Item.color == item_color)

result = query.all()
flask_sqlalchemy.get_debug_查询记录的新发出的SQL:

QUERY: SELECT DISTINCT ON (items.name) items.name AS items_name
FROM items JOIN projects ON projects.id = items._project_id JOIN customers ON customers.id = projects._customer_id
WHERE customers.name = %(name_1)s AND projects.name = %(name_2)s AND items.color = %(color_1)s
Parameters: `{'name_2': u'job1', 'state_1': u'blue', 'name_1': u'Bob'}
QUERY: SELECT DISTINCT ON (items.nygc_id) items.id AS items_id, items.name AS items_name, items.color AS items_color, items._project_id AS items__project_id, customers_1.id AS customers_1_id, customers_1.name AS customers_1_name, projects_1.id AS projects_1_id, projects_1.name AS projects_1_name
FROM customers, projects, items JOIN projects AS projects_1 ON projects_1.id = items._project_id JOIN customers AS customers_1 ON customers_1.id = projects_1._customer_id
WHERE customers.name = %(name_1)s AND projects.name = %(name_2)s AND items.color = %(color_1)s
Parameters: `{'state_1': u'blue', 'name_2': u'job1', 'name_1': u'Bob'}
如果有必要,底层数据库是PostgreSQL


查询的原始意图只需要
Item.name
。我想得越久,优化尝试看起来就越没有实际的帮助,但我仍然想知道交叉连接是从哪里来的,以防它再次发生在添加
joinedload
load\u
,等等。实际上会有所帮助。

不确定您想要实现什么,但看起来您正在尝试在表之间进行内部联接,并且只选择特定的列

所以我认为你需要做一些事情,比如:

cust_name = u'Bob'
proj_name = u'job1'
item_color = u'blue'
query = (db.session.query(Item.name)
                       .join(Project, Customer)
                       .filter(Customer.name == cust_name,
                               Project.name == proj_name)
                       .distinct(Item.name))

# Select the loaded columns
query = query.add_columns(Item.name, Item.color, Project.name, Customer.name)

# some conditionals determining last filter, resolving to this one:
query = query.filter(Item.color == item_color)

result = query.all()

FWIW我认为这不会给您的查询带来任何重大优化

不确定您想要实现什么,但看起来您想要在表之间进行内部联接,并且只选择特定的列

所以我认为你需要做一些事情,比如:

cust_name = u'Bob'
proj_name = u'job1'
item_color = u'blue'
query = (db.session.query(Item.name)
                       .join(Project, Customer)
                       .filter(Customer.name == cust_name,
                               Project.name == proj_name)
                       .distinct(Item.name))

# Select the loaded columns
query = query.add_columns(Item.name, Item.color, Project.name, Customer.name)

# some conditionals determining last filter, resolving to this one:
query = query.filter(Item.color == item_color)

result = query.all()

FWIW我认为这不会给您的查询带来任何重大优化

这是因为
joinedload
join
不同。
joinedload
ed实体实际上是匿名的,您应用的后续过滤器引用相同表的不同实例,因此
customers
projects
会加入两次

您应该做的是像以前一样执行
join
,但使用使您的join看起来像
joinedload

query = (session.query(Item)
                .join(Item.project)
                .join(Project.customer)
                .options(Load(Item).load_only('name', 'color'),
                         Load(Item).contains_eager("project").load_only('name'),
                         Load(Item).contains_eager("project").contains_eager("customer").load_only('name'))
                .filter(Customer.name == cust_name,
                        Project.name == proj_name)
                .distinct(Item.name))
这将为您提供查询

SELECT DISTINCT ON (items.name) customers.id AS customers_id, customers.name AS customers_name, projects.id AS projects_id, projects.name AS projects_name, items.id AS items_id, items.name AS items_name, items.color AS items_color 
FROM items JOIN projects ON projects.id = items._project_id JOIN customers ON customers.id = projects._customer_id 
WHERE customers.name = %(name_1)s AND projects.name = %(name_2)s AND items.color = %(color_1)s

这是因为
joinedload
join
不同。
joinedload
ed实体实际上是匿名的,您应用的后续过滤器引用相同表的不同实例,因此
customers
projects
会加入两次

您应该做的是像以前一样执行
join
,但使用使您的join看起来像
joinedload

query = (session.query(Item)
                .join(Item.project)
                .join(Project.customer)
                .options(Load(Item).load_only('name', 'color'),
                         Load(Item).contains_eager("project").load_only('name'),
                         Load(Item).contains_eager("project").contains_eager("customer").load_only('name'))
                .filter(Customer.name == cust_name,
                        Project.name == proj_name)
                .distinct(Item.name))
这将为您提供查询

SELECT DISTINCT ON (items.name) customers.id AS customers_id, customers.name AS customers_name, projects.id AS projects_id, projects.name AS projects_name, items.id AS items_id, items.name AS items_name, items.color AS items_color 
FROM items JOIN projects ON projects.id = items._project_id JOIN customers ON customers.id = projects._customer_id 
WHERE customers.name = %(name_1)s AND projects.name = %(name_2)s AND items.color = %(color_1)s

虽然您也选择了
.distinct(Item.name)
,但我猜我错了。如果你能解释一下你想优化什么以及如何优化,会有所帮助吗?:)我想要的唯一一列是
Item.name
load_only
s试图通过确保不加载任何额外内容来加速查询。进一步看,我不太确定这是否是一个优化,但我仍然想知道这个交叉连接是从哪里来的,以防它出现在优化会有所帮助的地方。相应地编辑问题。虽然您也选择了
.distinct(Item.name)
,但我猜我错了。如果你能解释一下你想优化什么以及如何优化,会有所帮助吗?:)我想要的唯一一列是
Item.name
load_only
s试图通过确保不加载任何额外内容来加速查询。进一步看,我不太确定这是否是一个优化,但我仍然想知道这个交叉连接是从哪里来的,以防它出现在优化会有所帮助的地方。相应地编辑问题。这奏效了,是的。很高兴知道发生了什么,即使它在速度上没有明显的不同。这很有效,是的。很高兴知道发生了什么,即使它在速度上没有明显的不同。