Python 如何配置SQLAlchemy支持的应用程序?
有没有人有评测Python/SQLAlchemy应用程序的经验?找到瓶颈和设计缺陷的最佳方法是什么Python 如何配置SQLAlchemy支持的应用程序?,python,sqlalchemy,profiler,Python,Sqlalchemy,Profiler,有没有人有评测Python/SQLAlchemy应用程序的经验?找到瓶颈和设计缺陷的最佳方法是什么 我们有一个Python应用程序,其中数据库层由SQLAlchemy处理。应用程序使用批处理设计,因此许多数据库请求都是在有限的时间间隔内按顺序完成的。它目前运行时间有点太长,因此需要进行一些优化。我们不使用ORM功能,数据库是PostgreSQL。我在使用cprofile和在runsnakerun中查看结果方面取得了一些成功。这至少告诉了我哪些函数和调用需要花费很长时间,以及数据库是否是问题所在。
我们有一个Python应用程序,其中数据库层由SQLAlchemy处理。应用程序使用批处理设计,因此许多数据库请求都是在有限的时间间隔内按顺序完成的。它目前运行时间有点太长,因此需要进行一些优化。我们不使用ORM功能,数据库是PostgreSQL。我在使用cprofile和在runsnakerun中查看结果方面取得了一些成功。这至少告诉了我哪些函数和调用需要花费很长时间,以及数据库是否是问题所在。 文件是。你需要wxpython。这是一个很好的开始。
这很容易
import cProfile
command = """foo.run()"""
cProfile.runctx( command, globals(), locals(), filename="output.profile" )
然后
python runsnake.py output.profile
如果您希望优化您的查询,您将需要
登录记录查询也是值得的,但据我所知,没有用于此的解析器来获取长时间运行的查询(并且对于并发请求也没有用处)
并确保create engine语句的echo=True
当我这样做的时候,实际上是我的代码是主要的问题,所以cprofile的东西帮助了我。有时仅仅是简单的SQL日志记录(通过python的日志记录模块或通过
create_engine()
上的echo=True
参数启用)就可以让你知道事情需要多长时间。例如,如果您在SQL操作之后立即记录了一些内容,您将在日志中看到如下内容:
17:37:48,325 INFO [sqlalchemy.engine.base.Engine.0x...048c] SELECT ...
17:37:48,326 INFO [sqlalchemy.engine.base.Engine.0x...048c] {<params>}
17:37:48,660 DEBUG [myapp.somemessage]
要分析一段代码,请将其放置在具有decorator的函数中:
@profile
def go():
return Session.query(FooClass).filter(FooClass.somevalue==8).all()
myfoos = go()
分析的输出可以用来提供时间花在哪里的想法。例如,如果您看到所有时间都花费在cursor.execute()
中,这就是对数据库的低级DBAPI调用,这意味着您的查询应该通过添加索引或重新构造查询和/或基础架构来优化。对于该任务,我建议使用pgadmin及其图形解释实用程序来查看查询正在执行的工作
如果您看到数千个与获取行相关的调用,这可能意味着您的查询返回的行数比预期的多-不完整联接导致的笛卡尔积可能会导致此问题。还有一个问题是在类型处理中花费的时间-SQLAlchemy类型(如Unicode
)将对绑定参数和结果列执行字符串编码/解码,这在所有情况下可能都不需要
概要文件的输出可能有点令人畏惧,但经过一些练习后,它们非常容易阅读。邮件列表上曾经有人声称速度慢,在让他发布概要文件的结果后,我能够证明速度问题是由于网络延迟造成的——cursor.execute()以及所有Python方法花费的时间非常快,而大部分时间都花在socket.receive()上
如果您有雄心壮志,那么在SQLAlchemy单元测试中还有一个更复杂的SQLAlchemy评测示例,如果您仔细研究一下的话。在这里,我们使用decorator进行测试,decorator断言用于特定操作的方法调用的最大数量,因此,如果签入了一些低效的方法,测试将显示它(需要注意的是,在Python中,函数调用的开销是所有操作中最高的,并且调用的数量往往与所花费的时间成正比。)值得注意的是使用奇特的“SQL捕获”的“zoomark”测试从等式中减少DBAPI开销的方案-尽管这种技术对于园艺品种分析不是真正必要的。上有一个非常有用的分析方法 经过一些小的修改
from sqlalchemy import event
from sqlalchemy.engine import Engine
import time
import logging
logging.basicConfig()
logger = logging.getLogger("myapp.sqltime")
logger.setLevel(logging.DEBUG)
@event.listens_for(Engine, "before_cursor_execute")
def before_cursor_execute(conn, cursor, statement,
parameters, context, executemany):
context._query_start_time = time.time()
logger.debug("Start Query:\n%s" % statement)
# Modification for StackOverflow answer:
# Show parameters, which might be too verbose, depending on usage..
logger.debug("Parameters:\n%r" % (parameters,))
@event.listens_for(Engine, "after_cursor_execute")
def after_cursor_execute(conn, cursor, statement,
parameters, context, executemany):
total = time.time() - context._query_start_time
logger.debug("Query Complete!")
# Modification for StackOverflow: times in milliseconds
logger.debug("Total Time: %.02fms" % (total*1000))
if __name__ == '__main__':
from sqlalchemy import *
engine = create_engine('sqlite://')
m1 = MetaData(engine)
t1 = Table("sometable", m1,
Column("id", Integer, primary_key=True),
Column("data", String(255), nullable=False),
)
conn = engine.connect()
m1.create_all(conn)
conn.execute(
t1.insert(),
[{"data":"entry %d" % x} for x in xrange(100000)]
)
conn.execute(
t1.select().where(t1.c.data.between("entry 25", "entry 7800")).order_by(desc(t1.c.data))
)
输出类似于:
DEBUG:myapp.sqltime:Start Query:
SELECT sometable.id, sometable.data
FROM sometable
WHERE sometable.data BETWEEN ? AND ? ORDER BY sometable.data DESC
DEBUG:myapp.sqltime:Parameters:
('entry 25', 'entry 7800')
DEBUG:myapp.sqltime:Query Complete!
DEBUG:myapp.sqltime:Total Time: 410.46ms
然后,如果您发现一个异常缓慢的查询,您可以获取查询字符串,在参数中设置格式(至少对于psycopg2,可以使用
%
字符串格式操作符完成),在其前面加上“EXPLAIN ANALYZE”前缀,然后将查询计划输出推到(通过查找)我刚刚发现了库sqltap
()。它生成样式良好的HTML页面,有助于检查和分析SQLAlchemy生成的SQL查询
用法示例:
profiler = sqltap.start()
run_some_queries()
statistics = profiler.collect()
sqltap.report(statistics, "report.html")
该库已经2年没有更新了,但是,今天早些时候我用我的应用程序测试它时,它似乎工作得很好。如果只是要分析的查询次数,可以使用上下文管理器记录在特定上下文中执行的所有查询:
"""SQLAlchemy Query profiler and logger."""
import logging
import time
import traceback
import sqlalchemy
class QueryProfiler:
"""Log query duration and SQL as a context manager."""
def __init__(self,
engine: sqlalchemy.engine.Engine,
logger: logging.Logger,
path: str):
"""
Initialize for an engine and logger and filepath.
engine: The sqlalchemy engine for which events should be logged.
You can pass the class `sqlalchemy.engine.Engine` to capture all engines
logger: The logger that should capture the query
path: Only log the stacktrace for files in this path, use `'/'` to log all files
"""
self.engine = engine
self.logger = logger
self.path = path
def _before_cursor_execute(self, conn, cursor, statement, parameters, context, executemany):
"""Set the time on the connection to measure query duration."""
conn._sqla_query_start_time = time.time()
def _after_cursor_execute(self, conn, cursor, statement, parameters, context, executemany):
"""Listen for the 'after_cursor_execute' event and log sqlstatement and time."""
end_time = time.time()
start_time = getattr(conn, '_sqla_query_start_time', end_time)
elapsed_time = round((end_time-start_time) * 1000)
# only include the files in self.path in the stacktrace to reduce the noise
stack = [frame for frame in traceback.extract_stack()[:-1] if frame.filename.startswith(self.path)]
self.logger.debug('Query `%s` took %s ms. Stack: %s', statement, elapsed_time, traceback.format_list(stack))
def __enter__(self, *args, **kwargs):
"""Context manager."""
if isinstance(self.engine, sqlalchemy.engine.Engine):
sqlalchemy.event.listen(self.engine, "before_cursor_execute", self._before_cursor_execute)
sqlalchemy.event.listen(self.engine, "after_cursor_execute", self._after_cursor_execute)
return self
def __exit__(self, *args, **kwargs) -> None:
"""Context manager."""
if isinstance(self.engine, sqlalchemy.engine.Engine):
sqlalchemy.event.remove(self.engine, "before_cursor_execute", self._before_cursor_execute)
sqlalchemy.event.remove(self.engine, "after_cursor_execute", self._after_cursor_execute)
使用和测试:
"""Test SQLAlchemy Query profiler and logger."""
import logging
import os
import sqlalchemy
from .sqlaprofiler import QueryProfiler
def test_sqlite_query(caplog):
"""Create logger and sqllite engine and profile the queries."""
logging.basicConfig()
logger = logging.getLogger(f'{__name__}')
logger.setLevel(logging.DEBUG)
caplog.set_level(logging.DEBUG, logger=f'{__name__}')
path = os.path.dirname(os.path.realpath(__file__))
engine = sqlalchemy.create_engine('sqlite://')
metadata = sqlalchemy.MetaData(engine)
table1 = sqlalchemy.Table(
"sometable", metadata,
sqlalchemy.Column("id", sqlalchemy.Integer, primary_key=True),
sqlalchemy.Column("data", sqlalchemy.String(255), nullable=False),
)
conn = engine.connect()
metadata.create_all(conn)
with QueryProfiler(engine, logger, path):
conn.execute(
table1.insert(),
[{"data": f"entry {i}"} for i in range(100000)]
)
conn.execute(
table1.select()
.where(table1.c.data.between("entry 25", "entry 7800"))
.order_by(sqlalchemy.desc(table1.c.data))
)
assert caplog.messages[0].startswith('Query `INSERT INTO sometable (data) VALUES (?)` took')
assert caplog.messages[1].startswith('Query `SELECT sometable.id, sometable.data \n'
'FROM sometable \n'
'WHERE sometable.data BETWEEN ? AND ? '
'ORDER BY sometable.data DESC` took ')
如果您正在使用Flask SQLAlchemy,请将
SQLAlchemy\u ECHO=True
添加到应用程序的配置中。我相信这将包括在回调队列(gevent/eventlet/etc)中的查询时间
"""SQLAlchemy Query profiler and logger."""
import logging
import time
import traceback
import sqlalchemy
class QueryProfiler:
"""Log query duration and SQL as a context manager."""
def __init__(self,
engine: sqlalchemy.engine.Engine,
logger: logging.Logger,
path: str):
"""
Initialize for an engine and logger and filepath.
engine: The sqlalchemy engine for which events should be logged.
You can pass the class `sqlalchemy.engine.Engine` to capture all engines
logger: The logger that should capture the query
path: Only log the stacktrace for files in this path, use `'/'` to log all files
"""
self.engine = engine
self.logger = logger
self.path = path
def _before_cursor_execute(self, conn, cursor, statement, parameters, context, executemany):
"""Set the time on the connection to measure query duration."""
conn._sqla_query_start_time = time.time()
def _after_cursor_execute(self, conn, cursor, statement, parameters, context, executemany):
"""Listen for the 'after_cursor_execute' event and log sqlstatement and time."""
end_time = time.time()
start_time = getattr(conn, '_sqla_query_start_time', end_time)
elapsed_time = round((end_time-start_time) * 1000)
# only include the files in self.path in the stacktrace to reduce the noise
stack = [frame for frame in traceback.extract_stack()[:-1] if frame.filename.startswith(self.path)]
self.logger.debug('Query `%s` took %s ms. Stack: %s', statement, elapsed_time, traceback.format_list(stack))
def __enter__(self, *args, **kwargs):
"""Context manager."""
if isinstance(self.engine, sqlalchemy.engine.Engine):
sqlalchemy.event.listen(self.engine, "before_cursor_execute", self._before_cursor_execute)
sqlalchemy.event.listen(self.engine, "after_cursor_execute", self._after_cursor_execute)
return self
def __exit__(self, *args, **kwargs) -> None:
"""Context manager."""
if isinstance(self.engine, sqlalchemy.engine.Engine):
sqlalchemy.event.remove(self.engine, "before_cursor_execute", self._before_cursor_execute)
sqlalchemy.event.remove(self.engine, "after_cursor_execute", self._after_cursor_execute)
"""Test SQLAlchemy Query profiler and logger."""
import logging
import os
import sqlalchemy
from .sqlaprofiler import QueryProfiler
def test_sqlite_query(caplog):
"""Create logger and sqllite engine and profile the queries."""
logging.basicConfig()
logger = logging.getLogger(f'{__name__}')
logger.setLevel(logging.DEBUG)
caplog.set_level(logging.DEBUG, logger=f'{__name__}')
path = os.path.dirname(os.path.realpath(__file__))
engine = sqlalchemy.create_engine('sqlite://')
metadata = sqlalchemy.MetaData(engine)
table1 = sqlalchemy.Table(
"sometable", metadata,
sqlalchemy.Column("id", sqlalchemy.Integer, primary_key=True),
sqlalchemy.Column("data", sqlalchemy.String(255), nullable=False),
)
conn = engine.connect()
metadata.create_all(conn)
with QueryProfiler(engine, logger, path):
conn.execute(
table1.insert(),
[{"data": f"entry {i}"} for i in range(100000)]
)
conn.execute(
table1.select()
.where(table1.c.data.between("entry 25", "entry 7800"))
.order_by(sqlalchemy.desc(table1.c.data))
)
assert caplog.messages[0].startswith('Query `INSERT INTO sometable (data) VALUES (?)` took')
assert caplog.messages[1].startswith('Query `SELECT sometable.id, sometable.data \n'
'FROM sometable \n'
'WHERE sometable.data BETWEEN ? AND ? '
'ORDER BY sometable.data DESC` took ')