Python 如何在Sqlalchemy中加载数量有限的集合?

Python 如何在Sqlalchemy中加载数量有限的集合?,python,mysql,orm,sqlalchemy,Python,Mysql,Orm,Sqlalchemy,我有两张桌子。使用Sqlalchemy,我将它们映射到两个类: class A(base): ... id = Column(BigInteger, primary_key=True, autoincrement=True) class B(base): ... id = Column(BigInteger, primary_key=True, autoincrement=True) a_id = Column(BigInteger, ForeignKey(A.id))

我有两张桌子。使用Sqlalchemy,我将它们映射到两个类:

class A(base):
  ...
  id = Column(BigInteger, primary_key=True, autoincrement=True)

class B(base):
  ...
  id = Column(BigInteger, primary_key=True, autoincrement=True)
  a_id = Column(BigInteger, ForeignKey(A.id))
  timestamp = Column(DateTime)

  a = relationship(A, backref="b_s")
我可以使用A.b_来获取b对象的集合,这些对象的外键与A的主键相同。使用惰性加载或渴望加载非常容易。但现在我有一个问题。我不想加载所有B对象。我只想加载按时间戳排序的前N个对象。也就是说,A.b_s只加载一些相关的b对象。我如何使用Sqlalchemy来实现它


非常感谢

您想要实现的目标与关系无关(这不是SA限制,而是处理关系和注意引用完整性的正确方法)。
但是,一个简单的查询(包装在一个方法中)就可以很好地实现这一点:

class A(Base):
    # ...
    def get_children(self, offset, count):
        # @todo: might need to handle some border cases
        qry = B.query.with_parent(self)
        #or: qry = object_session(self).query(B).with_parent(self)
        return qry[offset:offset+count]

my_a = session.query(A).get(a_id)
print my_a.get_children( 0, 10) # print first 10 children
print my_a.get_children(10, 10) # print second 10 children

edit-1:通过仅使用1-2条SQL语句来实现这一点 现在,只需1-2条SQL语句就可以实现这一点。
首先,需要一种方法来获取每个
a
top N
B
标识符。为此,我们将使用
sqlalchemy.sql.expression.over
函数组成子查询:

# @note: this is the subquery using *sqlalchemy.orm.over* function to limit number of rows
# this subquery is used for both queries below
# @note: the code below sorts Bs by id, but you can change it in order_by
subq = (session.query(
            B.__table__.c.id.label("b_id"), 
            over(func.row_number(), partition_by="a_id", order_by="id").label("rownum")
       ).subquery())
# this produces the following SQL (@note: the RDBMS should support the OVER...)
# >> SELECT b.id AS b_id, row_number() OVER (PARTITION BY a_id ORDER BY id) AS rownum FROM b
第1版: 现在,第一个版本将加载
A
s,第二个版本将加载
B
s。函数返回字典,其中
A
s为键,
B
s为值:

def get_A_with_Bs_in_batch(b_limit=10):
    """ 
    @return: dict(A, [list of top *b_limit* A.b_s])  
    @note: uses 2 SQL statements, but does not screw up relationship.
    @note: if the relationship is requested via a_instance.b_s, the new SQL statement will be
    issued to load *all* related objects
    """
    qry_a = session.query(A)
    qry_b = (session.query(B)
            .join(subq, and_(subq.c.b_id == B.id, subq.c.rownum <= b_limit))
            )
    a_s = qry_a.all()
    b_s = qry_b.all()
    res = dict((a, [b for b in b_s if b.a == a]) for a in a_s)
    return res

总之,Version-2可能是您问题的最直接答案。使用它要自担风险,因为在这里你是在欺骗SA,如果你以任何方式修改relationship属性,你可能会遇到“Kaboom!”

你的方法是正确的。谢谢但是我将使用多个查询来获得结果。是否可以使用一个SQL来完成这样的工作?使用此方法将只生成一个SQL来完成每个调用的工作。如果通过在引擎中设置echo=True来启用SQL日志记录,则可以看到这一点。。。。还是我不理解你的评论?首先,我会收集A。如果我不考虑B,那么我只能用一个简单的SQL来做。现在,我将为a的每个条目加载一个B集合。然后,我将为每个a条目发布一个新的SQL。是否可以只使用一个或两个SQL来完成所有数据的加载?根据您的解释,可能很难做到。@flypen:请参阅1/2SQL选项上的答案扩展。答案非常好。为了便于维护,我更喜欢使用get_children()的第一种方法。谢谢!
def get_A_with_Bs_hack_relation(b_limit=10):
    """ 
    @return: dict(A, [list of top *b_limit* A.b_s])
    @note: the Bs are loaded as relationship A.b_s, but with the limit.
    """
    qry = (session.query(A)
            .outerjoin(B)
            # @note: next line will trick SA to load joined Bs as if they were *all* objects
            # of relationship A.b_s. this is a @hack: and one should discard/reset a session after this
            # kind of hacky query!!!
            .options(contains_eager(A.b_s))
            .outerjoin(subq, and_(subq.c.b_id == B.id, subq.c.rownum <= b_limit))
            # @note: next line is required to make both *outerjoins* to play well together 
            # in order produce the right result
            .filter(or_(B.id == None, and_(B.id != None, subq.c.b_id != None)))
            )
    res = dict((a, a.b_s) for a in qry.all())
    return res