Python 使用peewee在postgres中迭代1k+行时的开销_Python_Python 3.x_Postgresql_Psycopg2_Peewee

Python 使用peewee在postgres中迭代1k+行时的开销

python python-3.x postgresql

Python 使用peewee在postgres中迭代1k+行时的开销,python,python-3.x,postgresql,psycopg2,peewee,Python,Python 3.x,Postgresql,Psycopg2,Peewee,我在一个postgres表上迭代时看到了一个莫名其妙的大开销我分析了代码，还使用SQLAlchemy进行了冒烟测试，以确保它不是慢速连接或底层驱动程序psycopg2 在一个拥有约100万条记录的postgres表上运行此功能，但只能获取其中的一小部分 import time import peewee import sqlalchemy from playhouse import postgres_ext from sqlalchemy.dialects.postgresql import

我在一个postgres表上迭代时看到了一个莫名其妙的大开销

我分析了代码，还使用SQLAlchemy进行了冒烟测试，以确保它不是慢速连接或底层驱动程序psycopg2

在一个拥有约100万条记录的postgres表上运行此功能，但只能获取其中的一小部分

import time

import peewee
import sqlalchemy
from playhouse import postgres_ext
from sqlalchemy.dialects.postgresql import JSONB
from sqlalchemy.engine.url import URL as AlchemyURL
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker as alchemy_sessionmaker

user = 'XXX'
password = 'XXX'
database = 'XXX'
host = 'XXX'
port = 5432

table = 'person'
limit = 1000

peewee_db = postgres_ext.PostgresqlExtDatabase(
    database=database,
    host=host, port=port,
    user=user, password=password,
    use_speedups=True,
    server_side_cursors=True,
    register_hstore=False,
)

alchemy_engine = sqlalchemy.create_engine(AlchemyURL('postgresql', username=user, password=password,
                                                     database=database, host=host, port=port))
alchemy_session = alchemy_sessionmaker(bind=alchemy_engine)()


class PeeweePerson(peewee.Model):
    class Meta:
        database = peewee_db
        db_table = table

    id = peewee.CharField(primary_key=True, max_length=64)
    data = postgres_ext.BinaryJSONField(index=True, index_type='GIN')


class SQLAlchemyPerson(declarative_base()):
    __tablename__ = table

    id = sqlalchemy.Column(sqlalchemy.Integer, primary_key=True)
    data = sqlalchemy.Column(JSONB)


def run_raw_query():
    ids = list(peewee_db.execute_sql(f"SELECT id from {table} order by id desc limit {limit}"))
    return ids


def run_peewee_query():
    query = PeeweePerson.select(PeeweePerson.id).order_by(PeeweePerson.id.desc()).limit(limit)
    ids = list(query.tuples())
    return ids


def run_sqlalchemy_query():
    query = alchemy_session.query(SQLAlchemyPerson.id).order_by(sqlalchemy.desc(SQLAlchemyPerson.id)).limit(limit)
    ids = list(query)
    return ids


if __name__ == '__main__':
    t0 = time.time()
    raw_result = run_raw_query()
    t1 = time.time()
    print(f'Raw: {t1 - t0}')

    t2 = time.time()
    sqlalchemy_result = run_sqlalchemy_query()
    t3 = time.time()
    print(f'SQLAlchemy: {t3 - t2}')

    t4 = time.time()
    peewee_result = run_peewee_query()
    t5 = time.time()
    print(f'peewee: {t5 - t4}')

    assert raw_result == sqlalchemy_result == peewee_result

限值=1000时：

原始值：0.02643609046936035 SQLAlchemy:0.03697466850280762 peewee:1.050987474820709229

限值=10000

原始值：0.15931344032287598 SQLAlchemy:0.0722904205322656 皮维：10.82826042175293

这两个示例都使用服务器端游标

我简要介绍了这一点，看起来95%以上的时间都花在了调用cursor.fetchone上

有什么想法吗？

这似乎与Peewee2.x中服务器端游标的实现效率低下有关。具体来说，我认为这是因为peewee的游标包装器使用了.fetchone db api，而不是获取许多行。3.0a有一个新的实现，应该更快：

此外，在2.x中使用客户端游标没有这些效率问题，因此可以暂时用作解决方法