用Python十进制库加速算法_Python_Performance_Python 2.7_Pagerank

用Python十进制库加速算法

python performance python-2.7

用Python十进制库加速算法,python,performance,python-2.7,pagerank,Python,Performance,Python 2.7,Pagerank,我正在尝试运行一个类似于谷歌PageRank算法的函数（当然是为了非商业目的）。下面是Python代码；请注意，a[0]是这里唯一重要的东西，a[0]包含一个nxn矩阵，例如[[0,1,1]，[1,0,1]，[1,1,0]。此外，您可以找到我从何处获得此代码：当我运行这个实现时（在比3x3matrix大得多的矩阵上，n.b.），它没有产生足够的精度来计算秩，从而使我能够有效地比较它们。所以我尝试了这个： from decimal import * getcontext().prec = 5

我正在尝试运行一个类似于谷歌PageRank算法的函数（当然是为了非商业目的）。下面是Python代码；请注意，

a[0]

是这里唯一重要的东西，

a[0]

包含一个

nxn

矩阵，例如

[[0,1,1]，[1,0,1]，[1,1,0]

。此外，您可以找到我从何处获得此代码：

当我运行这个实现时（在比

3x3

matrix大得多的矩阵上，n.b.），它没有产生足够的精度来计算秩，从而使我能够有效地比较它们。所以我尝试了这个：

from decimal import *

getcontext().prec = 5

def GetNodeRanks(a):        # graph, names, size
    numIterations = 10
    adjacencyMatrix = copy.deepcopy(a[0])
    b = [Decimal(1)]*len(adjacencyMatrix)
    tmp = [Decimal(0)]*len(adjacencyMatrix)
    for i in range(numIterations):
        for j in range(len(adjacencyMatrix)):
            tmp[j] = Decimal(0)
            for k in range(len(adjacencyMatrix)):
                tmp[j] = Decimal(tmp[j] + adjacencyMatrix[j][k] * b[k])
        norm_sq = Decimal(0)
        for j in range(len(adjacencyMatrix)):
            norm_sq = Decimal(norm_sq + tmp[j]*tmp[j])
        norm = Decimal(norm_sq).sqrt
        for j in range(len(b)):
            b[j] = Decimal(tmp[j] / norm)
    print b
    return b

即使在这种毫无帮助的低精度下，代码也非常慢，在我坐着等待它运行的时间内，代码从未完成运行。以前，代码很快，但不够精确

有没有一种合理/简单的方法可以让代码同时快速准确地运行？

一些加速技巧：

优化循环内部的代码
如果可能的话，把所有的东西从内环向上移出
不要重新计算已知的变量
不要做不必要的事情，跳过它们
考虑使用列表理解，它通常要快一点
一旦达到可接受的速度，就停止优化

浏览您的代码：

from decimal import *

getcontext().prec = 5

def GetNodeRanks(a):        # graph, names, size
    # opt: pass in directly a[0], you do not use the rest
    numIterations = 10
    adjacencyMatrix = copy.deepcopy(a[0])
    #opt: why copy.deepcopy? You do not modify adjacencyMatric
    b = [Decimal(1)]*len(adjacencyMatrix)
    # opt: You often call Decimal(1) and Decimal(0), it takes some time
    # do it only once like
    # dec_zero = Decimal(0)
    # dec_one = Decimal(1)
    # prepare also other, repeatedly used data structures
    # len_adjacencyMatrix = len(adjacencyMatrix)
    # adjacencyMatrix_range = range(len_ajdacencyMatrix)
    # Replace code with pre-calculated variables yourself

    tmp = [Decimal(0)]*len(adjacencyMatrix)
    for i in range(numIterations):
        for j in range(len(adjacencyMatrix)):
            tmp[j] = Decimal(0)
            for k in range(len(adjacencyMatrix)):
                tmp[j] = Decimal(tmp[j] + adjacencyMatrix[j][k] * b[k])
        norm_sq = Decimal(0)
        for j in range(len(adjacencyMatrix)):
            norm_sq = Decimal(norm_sq + tmp[j]*tmp[j])
        norm = Decimal(norm_sq).sqrt #is this correct? I woudl expect .sqrt()
        for j in range(len(b)):
            b[j] = Decimal(tmp[j] / norm)
    print b
    return b

现在，关于如何在Python中优化列表处理的示例很少

使用

求和

，更改：

        norm_sq = Decimal(0)
        for j in range(len(adjacencyMatrix)):
            norm_sq = Decimal(norm_sq + tmp[j]*tmp[j])

致：

一点列表理解：

更改：

        for j in range(len(b)):
            b[j] = Decimal(tmp[j] / norm)

改为：

    b = [Decimal(tmp_itm / norm) for tmp_itm in tmp]

如果您采用这种编码方式，您也将能够优化初始循环，并且可能会发现一些预先计算的变量已经过时。

a中的内容是什么？优化代码基本上是不可能的，因为您没有给出预期的输入或预期的输出；它包含一个nxn邻接矩阵。例如，[0]可能包含：[[0,1,1]，[1,0,1]，[1,1,0]]将其作为示例输入编辑到您的问题中。它是一个普通的列表列表还是用库创建的，比如

numpy

？它是一个普通的列表列表。我想用numpy；那会有帮助吗？（马上就要编辑了。）这大大加快了速度！谢谢现在我的问题是“reduce”代码行中的溢出错误。我看看我是否能弄明白。你确定reduce代码是正确的吗？这可能是我的想象，但当我尝试代码时，它似乎给了我一个不同的特征向量结果。@PhilipWhite你可能是对的。我认为reduce代码应该是

norm_sq=reduce（lambda，b:a+b*b，tmp，Decimal（0））

否则每次都是对原始和进行平方运算。尝试它，如果它工作，请在我的回答中纠正它。@当你完成时，请考虑将你的最终代码添加到你的问题的结尾。你可以使用<代码>和>代码>而不是<代码>缩写，让Python负责你的添加：<代码> Nojysq＝和（TMP[j] *tMP[j]范围内的j（LeN（邻接矩阵））< /代码>。您甚至不需要从一个

Decimal

实例开始，因为

0+Decimal（无论什么）

将是一个十进制。在许多其他地方，您可能不需要调用

Decimal

。如果任一参数已经是一个

Decimal

，您只需对其执行操作，结果也将是一个

Decimal

。

        for j in range(len(b)):
            b[j] = Decimal(tmp[j] / norm)

    b = [Decimal(tmp_itm / norm) for tmp_itm in tmp]