Python 为什么psycopg2写入需要这么长时间？_Python_Postgresql

Python 为什么psycopg2写入需要这么长时间？

python postgresql

Python 为什么psycopg2写入需要这么长时间？,python,postgresql,Python,Postgresql,我编写了一个更新表的脚本。因为我找不到“批处理”更新的方法，所以我的脚本一次更新一行表。我假设对于一组100000行，更新需要几秒钟的时间不需要。每个写入操作大约需要100毫秒。整个写操作需要（（（100000*100）/1000）/60）/60）=2.77小时。为什么写这么长时间以下是我使用的代码： import psycopg2 ... entries = get_all_entries() conn = psycopg2.connect(params) try: for ent

我编写了一个更新表的脚本。因为我找不到“批处理”更新的方法，所以我的脚本一次更新一行表。我假设对于一组100000行，更新需要几秒钟的时间

不需要。每个写入操作大约需要100毫秒。整个写操作需要（（（100000*100）/1000）/60）/60）=2.77小时。为什么写这么长时间

以下是我使用的代码：

import psycopg2
...
entries = get_all_entries()
conn = psycopg2.connect(params)
try:
    for entry in entries:
        cursor = conn.cursor()
        cursor.execute(UPDATE_QUERY.format(entry.field1, entry.field2))
        cursor.close()
finally:
    conn.close()

我做错了什么？

您是否尝试过：

cursor = conn.cursor()
for entry in entries:
     cursor.execute(UPDATE_QUERY.format(entry.field1, entry.field2))

cursor.close()

您可以使用

分析此代码，而不是从客户端逐行更新表。您可以使用方法将数据上载到服务器端临时表中，然后使用单个SQL更新表

下面是一个人工例子：

#!/usr/bin/env python

import time, psycopg2
from random import random
from cStringIO import StringIO

CRowCount = 100000

conn = psycopg2.connect('')
conn.autocommit = False

print('Prepare playground...')
cur = conn.cursor()
cur.execute("""
    drop table if exists foo;
    create table foo(i int primary key, x float);
    insert into foo select i, 0 from generate_series(1,%s) as i;
""", (CRowCount,))
print('Done.')
cur.close();
conn.commit();

print('\nTest update row by row...')
tstart = time.time()
cur = conn.cursor()
for i in xrange(1,CRowCount+1):
    cur.execute('update foo set x = %s where i = %s', (random(), i));
conn.commit()
cur.close()
print('Done in %s s.' % (time.time() - tstart))

print('\nTest batch update...')
tstart = time.time()
cur = conn.cursor()
# Create temporary table to hold our data
cur.execute('create temp table t(i int, x float) on commit drop')
# Create and fill the buffer from which data will be uploaded
buf = StringIO()
for i in xrange(1,CRowCount+1):
    buf.write('%s\t%s\n' % (i, random()))
buf.seek(0)
# Upload data from the buffer to the temporary table
cur.copy_from(buf, 't')
# Update test table using data previously uploaded
cur.execute('update foo set x = t.x from t where foo.i = t.i')
cur.close();
conn.commit();
print('Done in %s s.' % (time.time() - tstart))

输出：

Prepare playground... Done. Test update row by row... Done in 62.1189928055 s. Test batch update... Done in 3.95668387413 s. 准备操场。。。完成。逐行测试更新。。。于62.1189928055 s内完成。测试批更新。。。于3.95668387413 s内完成。

正如您所看到的，第二种方法大约快20倍。

您可以通过每隔几次（2500？）运行

conn.commit（）

来避免事务太大（这会减慢查询速度）查询…假设您可以不使用单个事务的通常安全性。一种加速方法是：使用将数据上载到临时表中，然后使用单个SQL语句更新表，例如。@Abelisto interest。你能发布一个你将如何做到这一点的答案吗？谢谢！你能解释一下为什么这会更快吗？@dopatraman主要是因为你没有执行100000条SQL语句，而是只执行3条。你能具体解释一下这些行吗

cur.execute（'create temp table t（i int，x float）on commit drop'）buf=StringIO（），用于xrange（1，CRowCount+1）中的i:buf.write（'%s\t%s\n'%（i，random（）））buf.seek（0）cur.copy_from（buf，'t'）

这并没有快多少