如何改进python中的级联for循环

如何改进python中的级联for循环,python,Python,我有一个在Matlab中运行非常快的代码(几秒钟内)。我不得不将代码移植到python,以使其更加开放源代码友好。 这段代码有一个可怕的慢速“for”循环,它与级联更新嵌套在一起。运行相同的代码大约需要45分钟。 然而,我无法通过重新安排计算或甚至尝试引入多处理来加速这些循环,因为我只有Ryzen 5笔记本电脑。 我在这里发布代码片段,其中代码花费的时间最多。任何人都可以建议一种优化技术来加速这一过程 Python代码: from scipy import signal from matplot

我有一个在Matlab中运行非常快的代码(几秒钟内)。我不得不将代码移植到python,以使其更加开放源代码友好。 这段代码有一个可怕的慢速“for”循环,它与级联更新嵌套在一起。运行相同的代码大约需要45分钟。 然而,我无法通过重新安排计算或甚至尝试引入多处理来加速这些循环,因为我只有Ryzen 5笔记本电脑。 我在这里发布代码片段,其中代码花费的时间最多。任何人都可以建议一种优化技术来加速这一过程

Python代码:

from scipy import signal
from matplotlib import *
from numpy import zeros, ones, multiply, array, arange
import time

fs = 16000
npoints = 16000
Ntem = 30

# create a log-sine-sweep
f0 = 10                       # sweep start frequency
f1 = fs/2                     # sweep end frequency
t1 = arange(npoints)/fs       # sample times
stimulus = signal.chirp(t1, f0, t1[-1], f1, method='logarithmic', phi=-90)


W0 = zeros((Ntem, npoints))    
W1 = zeros((Ntem, npoints))    
BM = zeros((Ntem, npoints))    
BM_d = zeros((Ntem, npoints))

a0rf = array([-0.721112,-0.532501,-0.335426,-0.143986,0.0333392,0.192211,0.331123,
0.45037,0.551292,0.635753,0.705805,0.763481,0.810679,0.849101,0.880237,0.905366,
0.925569,0.941754,0.954673,0.964947,0.973084,0.979502,0.984539,0.988471,0.99152,
0.993866,0.995654,0.997002,0.998002,0.998729])

c0rf = array([0.692818,0.84643,0.942067,0.98958,0.999444,0.981354,0.943588,0.892842,
0.834312,0.771893,0.708406,0.64583,0.585491,0.528231,0.474535,0.424633,0.378578,
0.336301,0.297656,0.262447,0.230451,0.201436,0.175166,0.151413,0.129957,0.110592,
0.0931255,0.0773789,0.0631885,0.0504047])

gf = array([0.879889,0.813389,0.75372,0.703499,0.662554,0.629685,0.603486,0.582661,
0.566118,0.552971,0.542517,0.534197,0.527574,0.522299,0.518099,0.514757,0.512099,
0.509988,0.508315,0.50699,0.505945,0.505123,0.504478,0.503976,0.503586,0.503286,
0.503056,0.502883,0.502754,0.50266])

ghf = array([0.692818,0.84643,0.942067,0.98958,0.999444,0.981354,0.943588,0.892842,
0.834312,0.771893,0.708406,0.64583,0.585491,0.528231,0.474535,0.424633,0.378578,
0.336301,0.297656,0.262447,0.230451,0.201436,0.175166,0.151413,0.129957,0.110592,
0.0931255,0.0773789,0.0631885,0.0504047])

start_time = time.time()

for t in range(npoints-1):
    for s in range(Ntem):
        if (s == 0):
            stim = stimulus[t]
        else:
            stim = BM[s-1, t]
        W0[s, t] = stim + a0rf[s]*W0[s, t-1] - c0rf[s]*W1[s, t-1]
        W1[s, t] = a0rf[s]*W1[s, t-1] + c0rf[s]*W0[s, t-1]
        BM[s, t] = gf[s]*stim + ghf[s]*W1[s, t]
        BM_d[s, t] = BM[s, t] - BM[s, t-1]

elapsed_time = time.time() - start_time

print('time cost = ',elapsed_time)
此代码需要2.8142993450164795秒

Matlab代码:

fs = 16000.0      ;            % sample frequency
npoints = 16000    ;        % stimulus length
Ntem = 30;

f0 = 10         ;              % sweep start frequency
f1 = fs/2       ;              % sweep end frequency
t1 = 1:npoints ;
t1= t1./fs   ;    % sample times
gain = 0.1;
stimulus = gain*chirp(t1, f0, t1(npoints), f1, 'logarithmic', -90);
stimulus = (stimulus); 

W0 = zeros(Ntem,npoints);                         
W1 = zeros(Ntem,npoints);                          
BM = zeros(Ntem,npoints);                    
BM_d = zeros(Ntem,npoints); 
stim=0;

a0rf = [-0.721112 -0.532501 -0.335426 -0.143986 0.0333392 0.192211 0.331123 ...
0.45037 0.551292 0.635753 0.705805 0.763481 0.810679 0.849101 0.880237 ...
0.905366 0.925569 0.941754 0.954673 0.964947 0.973084 0.979502 0.984539 ...
0.988471 0.99152 0.993866 0.995654 0.997002 0.998002 0.998729];

c0rf = [0.692818 0.84643 0.942067 0.98958 0.999444 0.981354 0.943588 0.892842 ...
0.834312 0.771893 0.708406 0.64583 0.585491 0.528231 0.474535 0.424633 0.378578 ...
0.336301 0.297656 0.262447 0.230451 0.201436 0.175166 0.151413 0.129957 0.110592 ...
0.0931255 0.0773789 0.0631885 0.0504047];

gf = [0.879889 0.813389 0.75372 0.703499 0.662554 0.629685 0.603486 0.582661 ...
0.566118 0.552971 0.542517 0.534197 0.527574 0.522299 0.518099 0.514757 0.512099 ...
0.509988 0.508315 0.50699 0.505945 0.505123 0.504478 0.503976 0.503586 0.503286 ...
0.503056 0.502883 0.502754 0.50266];

ghf = [0.692818 0.84643 0.942067 0.98958 0.999444 0.981354 0.943588 0.892842 ...
0.834312 0.771893 0.708406 0.64583 0.585491 0.528231 0.474535 0.424633 0.378578 ...
0.336301 0.297656 0.262447 0.230451 0.201436 0.175166 0.151413 0.129957 0.110592 ...
0.0931255 0.0773789 0.0631885 0.0504047];

tic
for t=2:(npoints)
    for s=1:(Ntem)
        if (s==1)
            stim = stimulus(t);
        else
            stim = BM(s-1,t);
        end
        W0(s,t) = stim + a0rf(s)*W0(s,t-1) - c0rf(s)*W1(s,t-1);
        W1(s,t) = a0rf(s)*W1(s,t-1) + c0rf(s)*W0(s,t-1);
        BM(s,t) = gf(s)*stim + ghf(s)*W1(s,t);
        BM_d(s,t) = BM(s,t) - BM(s,t-1);
    end
    
end
toc
此代码需要0.091413秒


这里的刺激有一个固定值。在我的原始代码中,我有一个160个刺激输入的数组,这将进一步增加这些代码的时间。这进一步加剧了python代码的延迟。

您需要使用优化的numpy矢量化。下面是一个例子,展示了它是如何优化运行时的。你的方程使用t-1而不是t,这使得使用这种方法很困难。下面使用t来简单说明它的速度有多快。我会让你知道如何用你的精确方程来实现它,因为我不知道它们背后的细节。有关为什么这更为优化的更详细解释,请参阅此链接

这将在控制台中生成此

time cost for FOR loop =  2.5937564373016357
time cost for optimal numpy vectorization =  0.005982160568237305

我真的不知道为什么这个确切的代码在MatLab中运行得这么快。应该差不多。嵌套for循环的规模非常大。你能想出一个方法去除其中一个吗?上面的代码还有一个160次的循环,每次刺激输入都会改变。这使得它可以运行160×16000×30次,在matlab中大约运行20秒,但在python中大约需要45分钟。我无法删除此处的任何代码。这实际上是原始代码的一个更有效的版本。你能发布完整的可复制代码吗?我的意思是,运行它所需要的一切。您的评论建议3个用于循环,2个嵌套,但此代码显示2个用于循环,1个嵌套。嗨,Justin,我在这里添加了两个可复制的代码。您确定这两个版本运行相同的代码吗?在Python代码中,
else
直到for循环的最后一行才结束,而在matlab情况下,
else
在下一行之后立即结束。运行此代码将产生不同的向量化输出和正常代码输出。这完全改变了功能。是的,我知道。我提到了它与您的代码的算法不同。相反,这是一个例子,说明了这种方法的速度有多快。我让您调整代码以匹配此新方法。我之所以把它留给你,是因为我对你的算法没有任何背景知识。不过我想你能弄明白。祝你好运
time cost for FOR loop =  2.5937564373016357
time cost for optimal numpy vectorization =  0.005982160568237305