为什么这个脚本在Python中速度如此之慢？_Python_Matlab_Optimization

为什么这个脚本在Python中速度如此之慢？

python matlab optimization

为什么这个脚本在Python中速度如此之慢？,python,matlab,optimization,Python,Matlab,Optimization,我已经写了一个脚本，在MATLAB中训练1D Kohonen网络，它很有魅力。然后我试着将它翻译成Python 2.7，这是一种我很新的语言，而且脚本需要花费很长时间才能运行我会解释我在做什么，看看这里有没有人能解释一下这件事。我在矩阵y中有一个给定的数据集，我想用它来训练不同的SOM。SOM是一维的（一条线），其神经元数量不同。我首先训练一个大小为N=2的SOM，最后训练N=NMax的SOM，总共得到NMax-2+1SOM。对于每个SOM，我希望在训练结束后存储权重，然后再转到下一个SOM

我已经写了一个脚本，在MATLAB中训练1D Kohonen网络，它很有魅力。然后我试着将它翻译成Python 2.7，这是一种我很新的语言，而且脚本需要花费很长时间才能运行

我会解释我在做什么，看看这里有没有人能解释一下这件事。我在矩阵

中有一个给定的数据集，我想用它来训练不同的SOM。SOM是一维的（一条线），其神经元数量不同。我首先训练一个大小为

N=2

的SOM，最后训练

N=NMax

的SOM，总共得到

NMax-2+1

SOM。对于每个SOM，我希望在训练结束后存储权重，然后再转到下一个SOM

在MATLAB中，对于

NMax=5

和

iterMax=50

，需要9.74秒。在Python中，为54.04秒。这种差异是巨大的，而实际的数据集、SOM的数量和迭代的数量甚至更大，因此Python代码需要永远结束

我当前的代码如下：

import numpy as np
import time
y = np.random.rand(2500,3) # Create random dataset to test
def A(d,s): # Neighborhood function
    return np.exp(-d**2 / (2*s**2))
sigma_0 = float(5) # Initial standard deviation for A
eta_0 = float(1) # Initial learning rate
iterMax = 250 # Maximum number of iterations
NMax = 10 # Maximum number of neurons
w = range(NMax - 1) # Initialize the size of the weight matrix (it will store NMax-2+1 sets of weights, each of varying size depending on the value of N)
#%% KOHONEN 1D
t = time.time() # Start time
for N in np.arange(2,NMax + 1): # Size of the network
    w[N - 2] = np.random.uniform(0,1,(N,np.size(y,axis=1))) - 0.5 # Initial weights
    iterCount = 1; # Iteration counter
    while iterCount < iterMax:
        # Mix the datapoints to choose them in random order
        mixInputs = y[np.random.permutation(np.size(y,axis = 0)),:]
        # Decrease the value of the variance and the learning rate
        sigma = sigma_0 - (sigma_0/(iterMax + 1)) * iterCount
        eta = eta_0 - (eta_0/(iterMax + 1)) * iterCount
        for kk in range(np.size(mixInputs,axis = 0)): # Picking one datapoint at a time
            selectedInput = mixInputs[kk,:]
            # These two lines calculate the weight that is the nearest to the datapoint selected
            aux = np.absolute(np.array(np.kron(np.ones((N,1)),selectedInput)) - np.array(w[N - 2]))
            aux = np.sum(np.abs(aux)**2,axis=-1)
            ii = np.argmin(aux) # The node ii is the winner
            for jj in range(N):
                dist = min(np.absolute(ii-jj) , np.absolute(np.absolute(ii-jj)-N)) # Centering the neighborhood function in the winner
                w[N - 2][jj,:] = w[N - 2][jj,:] + eta * A(dist,sigma) * (selectedInput - w[N - 2][jj,:]) # Updating the weights
        print(N,iterCount)
        iterCount = iterCount + 1 

elapsedTime = time.time() - t

使用查找正在调用的函数及其所用的时间

在下面的输出中，列具有以下含义：

NCALL 至于电话数量

tottime 在给定函数中花费的总时间（不包括调用子函数的时间）

珀索尔是tottime除以nCall的商

康泰姆是此子函数和所有子函数（从调用到退出）所花费的累积时间。这个数字即使对于递归函数也是准确的

珀索尔是cumtime除以基元调用的商

文件名：lineno（函数）提供每个函数的相应数据

看起来您正在多次调用

A（）。。。通常具有相同的价值
python2.7-mcprofile-stottime${YOUR_SCRIPT}

         5481855 function calls (5481734 primitive calls) in 4.986 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    1.572    1.572    4.986    4.986 x.py:1(<module>)
   214500    0.533    0.000    0.533    0.000 x.py:8(A)
   107251    0.462    0.000    1.986    0.000 shape_base.py:686(kron)
   107251    0.345    0.000    0.456    0.000 numeric.py:1015(outer)
   214502    0.266    0.000    0.563    0.000 {sorted}
...

现在我们看到：
         6206113 function calls (6205992 primitive calls) in 4.986 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    1.727    1.727    4.986    4.986 x.py:1(<module>)
   121451    0.491    0.000    2.180    0.000 shape_base.py:686(kron)
   121451    0.371    0.000    0.496    0.000 numeric.py:1015(outer)
   242902    0.293    0.000    0.621    0.000 {sorted}
   121451    0.265    0.000    0.265    0.000 {method 'reduce' of 'numpy.ufunc' objects}
...
   242900    0.091    0.000    0.091    0.000 x.py:7(A)
...

这将f（）
确定为最长的时间。
这里有一些加速的快速尝试-我认为输出是相同的，但确实需要一些时间来仔细检查它：
import numpy as np
import time

np.random.seed(1234)
y = np.random.rand(2500,3) # Create random dataset to test

sigma_0 = float(5) # Initial standard deviation for A
eta_0 = float(1) # Initial learning rate
iterMax = 10 # Maximum number of iterations
NMax = 10 # Maximum number of neurons
w = {} # Initialize the size of the weight matrix (it will store NMax-2+1 sets of weights, each of varying size depending on the value of N)
#%% KOHONEN 1D
t = time.time() # Start time
for N in np.arange(2,NMax + 1): # Size of the network
    w[N - 2] = np.random.uniform(0,1,(N,np.size(y,axis=1))) - 0.5 # Initial weights
    iterCount = 1; # Iteration counter
    while iterCount < iterMax:
        # Mix the datapoints to choose them in random order
        mixInputs = y[np.random.permutation(np.size(y,axis = 0)),:]
        # Decrease the value of the variance and the learning rate
        sigma = sigma_0 - (sigma_0/(iterMax + 1)) * iterCount
        s2 = 2*sigma**2
        eta = eta_0 - (eta_0/(iterMax + 1)) * iterCount
        for kk in range(np.size(mixInputs,axis = 0)): # Picking one datapoint at a time
            selectedInput = mixInputs[kk,:]
            # These two lines calculate the weight that is the nearest to the datapoint selected
            aux = np.sum((selectedInput - np.array(w[N - 2]))**2, axis = -1)
            ii = np.argmin(aux)
            jjs = np.abs(ii - np.arange(N))
            dists = np.min(np.vstack([jjs , np.absolute(jjs-N)]), axis = 0)
            w[N - 2] = w[N - 2] + eta * np.exp((-dists**2)/s2).T[:,np.newaxis] * (selectedInput - w[N - 2]) # Updating the weights

        print(N,iterCount)
        iterCount = iterCount + 1 

elapsedTime = time.time() - t

与：
（我进一步将此内容塞进了接下来的几个步骤中）。Numpy广播给了我们同样的结果，而不必使用克朗产品
例如：
np.kron(np.ones((3,1)), np.array([6,5,4])) - np.arange(-9,0).reshape(3,3)

输出与以下内容相同：
np.array([6,5,4]) - np.arange(-9,0).reshape(3,3)

kron（np.ones（N，1），x）
给出了一个N*x.shape[0]数组，其中包含x的N个副本。
广播以一种更便宜的方式处理这一问题
另一个主要的提速是降低：
for jj in range(N):

到矩阵运算。我们对每个循环计算一次2*sigma**2
，用本机numpy调用替换A函数，并将其余部分矢量化。
您使用的是什么版本的东西？我在Python3.4.5上，得到了错误回溯（最后一次调用）：文件“so.py”，第19行，w[N-2]=np.random.uniform（0,1，（N，np.size（y，axis=1））-0.5#初始权重
@Prune我忘了写下来，抱歉。我正在使用Python 2.7。我已经在问题中添加了这些信息，谢谢！您是否对其进行了检测/分析？我现在正在跑步，如果可以，我会在回答中提供分析@阿蒂：不，我没有（我甚至不知道那些词的存在！）。我会查看这个链接，看看我能为这个问题添加什么，谢谢@克莱布，我加上了！tottime
和cumtime
之间有什么区别？为什么模块的tottime
会随着您的改进而提高？为什么cumtime保持不变？有趣的问题-我更新了答案。我怀疑由于修改，时间分布发生了变化，但我没想到……嗨，阿蒂，谢谢你的回答。正如我在jeremycg的回答中所评论的，做你在这里推荐的事情是非常有帮助的。然而，MATLAB的版本仍然快得多，尽管它也调用了很多函数（脚本完全相同，只是语法发生了变化）。为什么会这样？对于这段特定的代码，有没有办法在Python中实现MATLAB的性能？@Tendero：据我所知，Python的性能与MATLAB相差甚远。只要在Python中所有内容都是矢量化的，您可能不会注意到太多差异，但是对于循环等，您会注意到。如果您想要获得原始性能，您需要编写从Python调用的编译代码（例如，查看Cython或Pybind11），或者使用具有良好JIT的脚本语言（例如，查看Julia）。我不能对MATLAB发表评论，但我认为它可以编译成比Python更高效的东西——在编译过程中可能会进行优化。如果您追求原始效率，那么python并不是真正适合您的平台-可以相当轻松地使用C函数扩充python，这可能会提供您追求的速度提升…有趣。。。你能量化加速吗？我还对简化的aux
作业感兴趣，请您进一步解释一下删除kron（）
会给我一个很好的提升…对我来说，这大约是原来的三分之一。请参见编辑以了解克朗的解释嗨，谢谢你的详细回答。不过，我没有得到同样的结果。在行ii=np.argmin（np.sum（selectedInput-np.array（w[N-2]）**2，axis=-1））
中，np.argmin（）
的参数不会返回我在脚本中调用的aux。尽管如此，推动作用是巨大的！我得到了21秒，而我只有54秒。然而，我仍然不明白为什么与MATLAB（9秒）相比，这个速度如此之慢，即使使用了改进的版本。这种差异有什么解释吗？很好，我错过了一个括号，通过新的编辑，我用相同的种子得到了完全相同的答案。不确定在9秒对21秒，可能完全重写会使它更好。你可以看到numpy有一些技巧-也许你知道matlab的技巧。谢谢
aux = np.abs(selectedInput - np.array(w[N - 2]))

np.kron(np.ones((3,1)), np.array([6,5,4])) - np.arange(-9,0).reshape(3,3)

np.array([6,5,4]) - np.arange(-9,0).reshape(3,3)

for jj in range(N):