Python 有没有办法优化SQLAlchemy批量插入执行时间?
我正在尝试使用Python 有没有办法优化SQLAlchemy批量插入执行时间?,python,sql,performance,sqlalchemy,insert,Python,Sql,Performance,Sqlalchemy,Insert,我正在尝试使用bulk\u insert\u mappings()向sql数据库插入60k行,但这需要花费很长时间(我没有看到它完成)。它插入了1k行,但大约需要20分钟 我在时间比较中看到了这个线程,在批量插入中看到了SQLAlchemy文档,但仍然不理解为什么我的解决方案需要这么长时间 我在SQL Server DB(无FK)中创建了性能测试表,该表反映在类PerformanceTest中,其中item\u id作为INT-IDENTITY(1,1)主键不为NULL和以下列。我还有XData
bulk\u insert\u mappings()
向sql数据库插入60k行,但这需要花费很长时间(我没有看到它完成)。它插入了1k行,但大约需要20分钟
我在时间比较中看到了这个线程,在批量插入中看到了SQLAlchemy文档,但仍然不理解为什么我的解决方案需要这么长时间
我在SQL Server DB(无FK)中创建了性能测试表,该表反映在类PerformanceTest
中,其中item\u id
作为INT-IDENTITY(1,1)主键不为NULL
和以下列。我还有XDatabaseConnector来建立连接和会话:
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy import Column, Integer, Date, Float
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
import time
(I also import dynaconf SecretManager here)
Base = declarative_base()
class PerformanceTest(Base):
__tablename__ = "performance_test"
item_id = Column('item_id', Integer, primary_key=True, autoincrement=True)
date = Column('date', Date)
geography_id = Column('geography_id', Integer)
concept_id = Column('concept_id', Integer)
sector_id = Column('sector_id', Integer)
value = Column('value', Float, nullable=True)
class DatabaseConnector:
def __init__(self):
self.connection_params = None
self.query_params = None
self.query = None
self.engine = create_engine('mssql+pyodbc://', creator=self.connection, echo=True)
def connection(self):
return pyodbc.connect(self.connection_params)
def set_connection_params(self, connection_params: tuple):
self.connection_params = 'DRIVER={SQL Server};' + \
'SERVER={};' \
'DATABASE={};' \
'UID={};' \
'PWD={}'.format(*connection_params)
def set_query_params(self, params_list: list):
self.query_params = params_list
class XDatabaseConnector(DatabaseConnector):
def __init__(self, environment):
super().__init__()
self.set_connection_params(
SecretManager(environment).load_secrets()
)
self.session = sessionmaker(bind=self.engine)()
以下是我的前10行数据:
series = [
{'date': '2006-01-01', 'geography_id': 38, 'concept_id': 26, 'sector_id': 14, 'value': 3964.041},
{'date': '2007-01-01', 'geography_id': 38, 'concept_id': 26, 'sector_id': 14, 'value': 4723.085},
{'date': '2008-01-01', 'geography_id': 38, 'concept_id': 26, 'sector_id': 14, 'value': 5987.735},
{'date': '2009-01-01', 'geography_id': 38, 'concept_id': 26, 'sector_id': 14, 'value': 5594.184},
{'date': '2010-01-01', 'geography_id': 38, 'concept_id': 26, 'sector_id': 14, 'value': 6645.0},
{'date': '2011-01-01', 'geography_id': 38, 'concept_id': 26, 'sector_id': 14, 'value': 7223.332},
{'date': '2012-01-01', 'geography_id': 38, 'concept_id': 26, 'sector_id': 14, 'value': 7237.736},
{'date': '2013-01-01', 'geography_id': 38, 'concept_id': 26, 'sector_id': 14, 'value': 8302.54},
{'date': '2014-01-01', 'geography_id': 38, 'concept_id': 26, 'sector_id': 14, 'value': 8630.425},
{'date': '2015-01-01', 'geography_id': 38, 'concept_id': 26, 'sector_id': 14, 'value': 8621.436}
]
最后是insert\u映射
函数doperformanceinsert。我把它分成10万块,因为我认为这可能会有所帮助。然而,当插入1k时,我没有使用块
def insert_mapping(series):
t0 = time.time()
connector = XDatabaseConnector(environment='development') #environment is just for dynaconf secrets
for i in range(0, len(series), 10000):
subset = series[i:i+10000]
connector.session.bulk_insert_mappings(PerformanceTest, subset)
connector.session.commit()
connector.session.close()
t1 = time.time() - t0
print(f"{len(series)} inserted in {t1} seconds")
insert_mapping(series=series)
任何优化建议都将不胜感激 试一试