Python 3.x 由于列表太大，Python For循环速度变慢_Python 3.x_List_For Loop_Psycopg2

Python 3.x 由于列表太大，Python For循环速度变慢

python-3.x list for-loop

Python 3.x 由于列表太大，Python For循环速度变慢,python-3.x,list,for-loop,psycopg2,Python 3.x,List,For Loop,Psycopg2,因此，目前我有一个for循环，它会导致python程序死亡，程序会说“Killed”。该程序在大约6852个列表项的速度下，慢了大约6000个项目。我该如何解决这个问题我想这是因为名单太大了我试着在6000左右把名单一分为二。可能是因为内存管理之类的原因。我们将不胜感激 for id in listofids: connection = psycopg2.connect(user = "username", password = "password", host =

因此，目前我有一个for循环，它会导致python程序死亡，程序会说“Killed”。该程序在大约6852个列表项的速度下，慢了大约6000个项目。我该如何解决这个问题

我想这是因为名单太大了

我试着在6000左右把名单一分为二。可能是因为内存管理之类的原因。我们将不胜感激

    for id in listofids:
        connection = psycopg2.connect(user = "username", password = "password", host = "localhost", port = "5432", database = "darkwebscraper")

        cursor = connection.cursor()
        cursor.execute("select darkweb.site_id, darkweb.site_title, darkweb.sitetext from darkweb where darkweb.online='true' AND darkweb.site_id = %s", ([id]))
        print(len(listoftexts))

        try:
            row = cursor.fetchone()
        except:
            print("failed to fetch one")
        try:
            listoftexts.append(row[2])
            cursor.close()
            connection.close()
        except:
            print("failed to print")

没错，这可能是因为列表变大了：python列表是内存中连续的空间。每次添加到列表中时，python都会查看下一个位置是否有位置，如果没有，则会将整个数组重新定位到有足够空间的位置。数组越大，python需要重新定位的位置就越多

一种方法是预先创建一个大小合适的数组

编辑：为了确保清楚，我编了一个例子来说明我的观点。我做了两个函数。第一个在每次迭代时将字符串化索引（使其更大）附加到列表中，另一个只填充一个numpy数组：

import numpy as np
import matplotlib.pyplot as plt
from time import time

def test_bigList(N):
    L = []
    times = np.zeros(N,dtype=np.float32)

    for i in range(N):
        t0 = time()
        L.append(str(i))
        times[i] = time()-t0

    return times

def test_bigList_numpy(N):
    L = np.empty(N,dtype="<U32")
    times = np.zeros(N,dtype=np.float32)

    for i in range(N):
        t0 = time()
        L[i] = str(i)
        times[i] = time()-t0
    return times

N = int(1e7)
res1 = test_bigList(N)
res2 = test_bigList_numpy(N)

plt.plot(res1,label="list")
plt.plot(res2,label="numpy array")
plt.xlabel("Iteration")
plt.ylabel("Running time")
plt.legend()
plt.title("Evolution of iteration time with the size of an array")
plt.show()

将numpy导入为np
将matplotlib.pyplot作为plt导入
从时间导入时间
def测试清单（N）：
L=[]
时间=np.zero（N，dtype=np.float32）
对于范围（N）中的i：
t0=时间（）
L.追加（str（i））
times[i]=time（）-t0
返回时间
def测试大列表（N）：
L=np.空（N，数据类型="你确定每次通过循环时都需要关闭并重新打开连接吗？khelwood，我这样做是为了检查游标对象是否过载，之前也存在同样的问题。尝试将db连接移出循环，并在上下文管理器中使用它。db连接不是问题，我很确定。好的，那么我该怎么做设置一个大小为“11533”的数组，将文本项添加到该数组中，然后将其转换为列表？它肯定会与numpy数组一起工作（因为您事先使用"我已经预先创建了一个列表，但问题仍然存在。我不确定你的建议是否有效。我举了一个例子，只是为了确保我清楚。但如果这不起作用，那么我不确定它可能是什么。你可能是对的。不过，我用另一种方法解决了这个问题，那就是将我正在进行实验的机器切换到一台带有fa的机器上再多些内存。