Python 提高查询性能_Python_Mysql_Postgresql_Sqlalchemy_Flask Sqlalchemy

Python 提高查询性能

python mysql postgresql sqlalchemy

Python 提高查询性能,python,mysql,postgresql,sqlalchemy,flask-sqlalchemy,Python,Mysql,Postgresql,Sqlalchemy,Flask Sqlalchemy,我需要从PostgreSQL数据库中读取并连接许多行（~500k），然后将它们写入MySQL数据库我天真的做法是这样的 entrys = Entry.query.yield_per(500) for entry in entrys: for location in entry.locations: mysql_location = MySQLLocation(entry.url) mysql_location.i

我需要从PostgreSQL数据库中读取并连接许多行（~500k），然后将它们写入MySQL数据库

我天真的做法是这样的

    entrys = Entry.query.yield_per(500)

    for entry in entrys:
        for location in entry.locations:
            mysql_location = MySQLLocation(entry.url)
            mysql_location.id = location.id
            mysql_location.entry_id = entry.id

            [...]

            mysql_location.city = location.city.name
            mysql_location.county = location.county.name
            mysql_location.state = location.state.name
            mysql_location.country = location.country.name

            db.session.add(mysql_location)

    db.session.commit()

每个

条目

大约有1到100个

位置

这个脚本现在运行了大约20个小时，并且已经消耗了超过4GB的内存，因为在提交会话之前，所有内容都保存在内存中

由于我试图更早地做出承诺，我遇到了如下问题

如何提高查询性能？它需要快得多，因为在接下来的几个月里，行的数量将增长到大约25000K。

你的天真方法是有缺陷的，原因是你已经知道——占用你内存的东西是挂在内存中等待刷新到mysql的模型对象

最简单的方法是根本不使用ORM进行转换操作。直接使用SQLAlchemy表对象，因为它们的速度也快得多

另外，您可以创建2个会话，并将2个引擎绑定到单独的会话中！然后可以为每个批提交mysql会话。

为什么不能使用方法？基本上是

pg_dump dbname | mysql dbname

@JochenRitzel，我将多个表中的多行连接到mysql的一行中。我看不出

pg_dump

有什么帮助。您是否尝试过将Postgres中的数据提取到CSV并将CSV加载到MySQL中？我支持两个单独会话的选项，其中一个会话将使用每个批处理来清理它们。此外，您（@dbanck）运行时遇到的问题也会使用范围查询而不是yield_per来回答。