Python的性能问题_Python_Performance_Iteration

Python的性能问题

python performance

Python的性能问题,python,performance,iteration,Python,Performance,Iteration,我在Python上遇到了性能问题。下面的代码段有4个嵌套循环，循环遍历OrderedDict，矩阵\u col，其中包含11000个项。另一次迭代涉及一个defaultdict，trans，其中也包含约11000项。此过程的执行时间太长。如果有人能建议如何提高绩效，我将不胜感激 import string from collections import namedtuple from collections import defaultdict from collections import O

我在Python上遇到了性能问题。下面的代码段有4个嵌套循环，循环遍历OrderedDict，矩阵\u col，其中包含11000个项。另一次迭代涉及一个defaultdict，trans，其中也包含约11000项。此过程的执行时间太长。如果有人能建议如何提高绩效，我将不胜感激

import string
from collections import namedtuple
from collections import defaultdict
from collections import OrderedDict
import time

trans = defaultdict(dict)
...
matrix_col = OrderedDict(sorted(matrix_col.items(), key=lambda t: t[0]))
trans_mat = []
counter = 0

for u1, v1 in matrix_col.items():
    print counter, time.ctime()
    for u2, v2 in matrix_col.items():
        flag = True
        for w1 in trans.keys():
            for w2, c in trans[u1].items():
                if u1 == str(w1) and u2 == str(w2):
                    trans_mat.append([c])    
                    flag = False
        if flag:
            trans_mat.append([0])

trans_mat = np.asarray(trans_mat)
trans_mat = np.reshape(trans_mat, (11000, 11000))

这是它目前的表现。它基本上是每分钟处理2个项目。以这种速度，形成矩阵需要5天以上的时间，trans\u mat：

0 Tue Oct  6 11:31:18 2015
1 Tue Oct  6 11:31:46 2015
2 Tue Oct  6 11:32:19 2015
3 Tue Oct  6 11:32:52 2015
4 Tue Oct  6 11:33:19 2015
5 Tue Oct  6 11:33:46 2015

如果没有上下文，很难理解逻辑和您试图实现的目标，但您应该考虑更改算法，首先迭代

trans

，然后检查

trans\u mat

。比如：

for w1, t_val in trans.items():
    w1_is_in_matrix_col = str(w1) in matrix_col
    for w2, c in t_val.items():
        if w1_is_in_matrix_col and str(w2) in matrix_col:
            trans_mat.append([c])
        else:
            trans_mat.append([0])

从理论上讲，您可以在这里使用列表理解，这也会给您带来一些提升（但与当前的低效率相比微不足道）。

您没有利用字典提供的快速查找功能。在dict中找到一个键是O（1）。要解决这个问题，您只需要更改算法，这样您就不会迭代所有的键来查找所需的键

from itertools import product
trans_mat = [ [trans[u1][u2]] if (u1 in trans) and (u2 in trans[u1]) else [0]
                 for u1 in matrix_col for u2 in matrix_col ]

那么，当你可以测试

trans

和

trans[u1]

字典中的

u1

和

u2

是否为

trans

和

trans[u1]

中的键时，你为什么要在

trans[u1]

和

trans[u1]

中的所有项上循环？你能提供一个数据样本，以及在省略号中发生了什么？嗨，Martijn，如果不迭代trans[u1].items（），如何在最后一步中获得值“c”？如果要处理大量数值数据，以至于性能成为一个问题，请使用

numpy

和/或（

pandas

如果您使用

OrderedDict

以某种顺序保存一组包含不同字段的行，可能对您更有利）如果他需要获取空值，那么是的，这是一个很好的解决方案。您好Chad，感谢您的响应。矩阵是一组“State”类型的已排序项。下面是一个“State”的示例：33.594994，-84.727939978.2280,60270 1**您好，谢谢您的回复。我之所以要进行字符串转换，是因为我需要为矩阵运算形成一组排序一致的行和列标签。因此，我正在从“其他类型”转换首先字符串，使其可排序。然后我根据键排序并创建OrderedICT。如果不将其转换为字符串，我基本上无法排序。对键和排序使用相同的类型——如果您认为必须的话，也将键设置为字符串。然后使用键查找。这是自讨苦吃。为什么状态项不排序能够的