Python 如何在Sqlalchemy中加载数量有限的集合?
我有两张桌子。使用Sqlalchemy,我将它们映射到两个类:Python 如何在Sqlalchemy中加载数量有限的集合?,python,mysql,orm,sqlalchemy,Python,Mysql,Orm,Sqlalchemy,我有两张桌子。使用Sqlalchemy,我将它们映射到两个类: class A(base): ... id = Column(BigInteger, primary_key=True, autoincrement=True) class B(base): ... id = Column(BigInteger, primary_key=True, autoincrement=True) a_id = Column(BigInteger, ForeignKey(A.id))
class A(base):
...
id = Column(BigInteger, primary_key=True, autoincrement=True)
class B(base):
...
id = Column(BigInteger, primary_key=True, autoincrement=True)
a_id = Column(BigInteger, ForeignKey(A.id))
timestamp = Column(DateTime)
a = relationship(A, backref="b_s")
我可以使用A.b_来获取b对象的集合,这些对象的外键与A的主键相同。使用惰性加载或渴望加载非常容易。但现在我有一个问题。我不想加载所有B对象。我只想加载按时间戳排序的前N个对象。也就是说,A.b_s只加载一些相关的b对象。我如何使用Sqlalchemy来实现它
非常感谢 您想要实现的目标与关系无关(这不是SA限制,而是处理关系和注意引用完整性的正确方法)。
但是,一个简单的查询(包装在一个方法中)就可以很好地实现这一点:
class A(Base):
# ...
def get_children(self, offset, count):
# @todo: might need to handle some border cases
qry = B.query.with_parent(self)
#or: qry = object_session(self).query(B).with_parent(self)
return qry[offset:offset+count]
my_a = session.query(A).get(a_id)
print my_a.get_children( 0, 10) # print first 10 children
print my_a.get_children(10, 10) # print second 10 children
edit-1:通过仅使用1-2条SQL语句来实现这一点 现在,只需1-2条SQL语句就可以实现这一点。
首先,需要一种方法来获取每个
a
的top N
的B
标识符。为此,我们将使用sqlalchemy.sql.expression.over
函数组成子查询:
# @note: this is the subquery using *sqlalchemy.orm.over* function to limit number of rows
# this subquery is used for both queries below
# @note: the code below sorts Bs by id, but you can change it in order_by
subq = (session.query(
B.__table__.c.id.label("b_id"),
over(func.row_number(), partition_by="a_id", order_by="id").label("rownum")
).subquery())
# this produces the following SQL (@note: the RDBMS should support the OVER...)
# >> SELECT b.id AS b_id, row_number() OVER (PARTITION BY a_id ORDER BY id) AS rownum FROM b
第1版:
现在,第一个版本将加载A
s,第二个版本将加载B
s。函数返回字典,其中A
s为键,B
s为值:
def get_A_with_Bs_in_batch(b_limit=10):
"""
@return: dict(A, [list of top *b_limit* A.b_s])
@note: uses 2 SQL statements, but does not screw up relationship.
@note: if the relationship is requested via a_instance.b_s, the new SQL statement will be
issued to load *all* related objects
"""
qry_a = session.query(A)
qry_b = (session.query(B)
.join(subq, and_(subq.c.b_id == B.id, subq.c.rownum <= b_limit))
)
a_s = qry_a.all()
b_s = qry_b.all()
res = dict((a, [b for b in b_s if b.a == a]) for a in a_s)
return res
总之,Version-2可能是您问题的最直接答案。使用它要自担风险,因为在这里你是在欺骗SA,如果你以任何方式修改relationship属性,你可能会遇到“Kaboom!”你的方法是正确的。谢谢但是我将使用多个查询来获得结果。是否可以使用一个SQL来完成这样的工作?使用此方法将只生成一个SQL来完成每个调用的工作。如果通过在引擎中设置echo=True来启用SQL日志记录,则可以看到这一点。。。。还是我不理解你的评论?首先,我会收集A。如果我不考虑B,那么我只能用一个简单的SQL来做。现在,我将为a的每个条目加载一个B集合。然后,我将为每个a条目发布一个新的SQL。是否可以只使用一个或两个SQL来完成所有数据的加载?根据您的解释,可能很难做到。@flypen:请参阅1/2SQL选项上的答案扩展。答案非常好。为了便于维护,我更喜欢使用get_children()的第一种方法。谢谢!
def get_A_with_Bs_hack_relation(b_limit=10):
"""
@return: dict(A, [list of top *b_limit* A.b_s])
@note: the Bs are loaded as relationship A.b_s, but with the limit.
"""
qry = (session.query(A)
.outerjoin(B)
# @note: next line will trick SA to load joined Bs as if they were *all* objects
# of relationship A.b_s. this is a @hack: and one should discard/reset a session after this
# kind of hacky query!!!
.options(contains_eager(A.b_s))
.outerjoin(subq, and_(subq.c.b_id == B.id, subq.c.rownum <= b_limit))
# @note: next line is required to make both *outerjoins* to play well together
# in order produce the right result
.filter(or_(B.id == None, and_(B.id != None, subq.c.b_id != None)))
)
res = dict((a, a.b_s) for a in qry.all())
return res