Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/304.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/5/sql/70.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 有没有办法优化SQLAlchemy批量插入执行时间?_Python_Sql_Performance_Sqlalchemy_Insert - Fatal编程技术网

Python 有没有办法优化SQLAlchemy批量插入执行时间?

Python 有没有办法优化SQLAlchemy批量插入执行时间?,python,sql,performance,sqlalchemy,insert,Python,Sql,Performance,Sqlalchemy,Insert,我正在尝试使用bulk\u insert\u mappings()向sql数据库插入60k行,但这需要花费很长时间(我没有看到它完成)。它插入了1k行,但大约需要20分钟 我在时间比较中看到了这个线程,在批量插入中看到了SQLAlchemy文档,但仍然不理解为什么我的解决方案需要这么长时间 我在SQL Server DB(无FK)中创建了性能测试表,该表反映在类PerformanceTest中,其中item\u id作为INT-IDENTITY(1,1)主键不为NULL和以下列。我还有XData

我正在尝试使用
bulk\u insert\u mappings()
向sql数据库插入60k行,但这需要花费很长时间(我没有看到它完成)。它插入了1k行,但大约需要20分钟

我在时间比较中看到了这个线程,在批量插入中看到了SQLAlchemy文档,但仍然不理解为什么我的解决方案需要这么长时间

我在SQL Server DB(无FK)中创建了性能测试表,该表反映在类
PerformanceTest
中,其中
item\u id
作为
INT-IDENTITY(1,1)主键不为NULL
和以下列。我还有XDatabaseConnector来建立连接和会话:

from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy import Column, Integer, Date, Float
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
import time
(I also import dynaconf SecretManager here)
Base = declarative_base()


class PerformanceTest(Base):
    __tablename__ = "performance_test"

    item_id = Column('item_id', Integer, primary_key=True, autoincrement=True)
    date = Column('date', Date)
    geography_id = Column('geography_id', Integer)
    concept_id = Column('concept_id', Integer)
    sector_id = Column('sector_id', Integer)
    value = Column('value', Float, nullable=True)

class DatabaseConnector:
    def __init__(self):
        self.connection_params = None

        self.query_params = None

        self.query = None

        self.engine = create_engine('mssql+pyodbc://', creator=self.connection, echo=True)

    def connection(self):
        return pyodbc.connect(self.connection_params)

    def set_connection_params(self, connection_params: tuple):
        self.connection_params = 'DRIVER={SQL Server};' + \
                                 'SERVER={};' \
                                 'DATABASE={};' \
                                 'UID={};' \
                                 'PWD={}'.format(*connection_params)

    def set_query_params(self, params_list: list):
        self.query_params = params_list


class XDatabaseConnector(DatabaseConnector):
    def __init__(self, environment):
        super().__init__()
        self.set_connection_params(
            SecretManager(environment).load_secrets()
        )

        self.session = sessionmaker(bind=self.engine)()
以下是我的前10行数据:

series = [
{'date': '2006-01-01', 'geography_id': 38, 'concept_id': 26, 'sector_id': 14, 'value': 3964.041},
{'date': '2007-01-01', 'geography_id': 38, 'concept_id': 26, 'sector_id': 14, 'value': 4723.085}, 
{'date': '2008-01-01', 'geography_id': 38, 'concept_id': 26, 'sector_id': 14, 'value': 5987.735}, 
{'date': '2009-01-01', 'geography_id': 38, 'concept_id': 26, 'sector_id': 14, 'value': 5594.184}, 
{'date': '2010-01-01', 'geography_id': 38, 'concept_id': 26, 'sector_id': 14, 'value': 6645.0}, 
{'date': '2011-01-01', 'geography_id': 38, 'concept_id': 26, 'sector_id': 14, 'value': 7223.332}, 
{'date': '2012-01-01', 'geography_id': 38, 'concept_id': 26, 'sector_id': 14, 'value': 7237.736}, 
{'date': '2013-01-01', 'geography_id': 38, 'concept_id': 26, 'sector_id': 14, 'value': 8302.54}, 
{'date': '2014-01-01', 'geography_id': 38, 'concept_id': 26, 'sector_id': 14, 'value': 8630.425}, 
{'date': '2015-01-01', 'geography_id': 38, 'concept_id': 26, 'sector_id': 14, 'value': 8621.436}
]
最后是
insert\u映射
函数doperformanceinsert。我把它分成10万块,因为我认为这可能会有所帮助。然而,当插入1k时,我没有使用块

def insert_mapping(series):
    t0 = time.time()
    connector = XDatabaseConnector(environment='development') #environment is just for dynaconf secrets
    for i in range(0, len(series), 10000):
        subset = series[i:i+10000]
        connector.session.bulk_insert_mappings(PerformanceTest, subset)
    connector.session.commit()
    connector.session.close()
    t1 = time.time() - t0
    print(f"{len(series)} inserted in {t1} seconds")

insert_mapping(series=series)
任何优化建议都将不胜感激

试一试