Numpy 如何使用conda accelerate/benchmarks?

Numpy 如何使用conda accelerate/benchmarks?,numpy,conda,hpc,intel-mkl,Numpy,Conda,Hpc,Intel Mkl,我试图使用Conda Accelerate来加速一些数据预处理,但初始基准测试表明,要么我没有正确使用它,要么它对numpy和librosa中的FFT和线性代数执行时间没有影响。重新阅读文献-这是否意味着我应该像在NumbaPro中那样装饰和重新编码每个ndarray操作?我以为我只是简单地安装了它,它使numpy更快了,但事实并非如此 下面是基准测试和代码。我已经通过conda install accelerate安装了accelerate,并且还导入了它 谢谢 结果-康达安装加速前后的差异可

我试图使用Conda Accelerate来加速一些数据预处理,但初始基准测试表明,要么我没有正确使用它,要么它对numpy和librosa中的FFT和线性代数执行时间没有影响。重新阅读文献-这是否意味着我应该像在NumbaPro中那样装饰和重新编码每个ndarray操作?我以为我只是简单地安装了它,它使numpy更快了,但事实并非如此

下面是基准测试和代码。我已经通过conda install accelerate安装了accelerate,并且还导入了它

谢谢

结果-康达安装加速前后的差异可以忽略不计

Total time was 25.356
Total load time was 1.6743
Total math time was 22.1599
Total save time was 1.5139
Total stft math time was 12.9219
Total other numpy math time was 9.1886
相关代码:

loads, maths, saves = [], [], []
stfts, nps = [], []
# now we have a dict of all source files grouped by voice                                         
for i in range(30):
    v0_fn = v0_list[i]
    v1_fn = v1_list[i]
    tl0 = time.time()
    # Process v0 & v1 file                                                                        
    v0_fn = signal_dir+v0_fn
    v0, fs_s =  librosa.load(v0_fn, sr=None)
    v1_fn = signal_dir+v1_fn
    v1, fs_s =  librosa.load(v1_fn, sr=None)
    tl1 = time.time()
    loads.append((tl1-tl0))
    mix = v0 + v1
    # Capture the magnitude and phase of signal and signal + noise                                
    tm0 = time.time()
    v0_stft = librosa.stft(v0, int(frame_size*fs), int(step_size*fs)).transpose()
    tm1 = time.time()
    v0_mag = (v0_stft.real**2 + v0_stft.imag**2)**0.5
    v0_pha = np.arctan2(v0_stft.imag, v0_stft.real)
    v0_rtheta = np.stack((v0_mag, v0_pha), axis=0)
    tm2 = time.time()
    v1_stft = librosa.stft(v1, int(frame_size*fs), int(step_size*fs)).transpose()
    tm3 = time.time()
    v1_mag = (v1_stft.real**2 + v1_stft.imag**2)**0.5
    v1_pha = np.arctan2(v1_stft.imag, v1_stft.real)
    v1_rtheta = np.stack((v1_mag, v1_pha), axis=0)
    tm4 = time.time()
    mix_stft = librosa.stft(mix, int(frame_size*fs), int(step_size*fs)).transpose()
    tm5 = time.time()
    mix_mag = (mix_stft.real**2 + mix_stft.imag**2)**0.5
    mix_pha = np.arctan2(mix_stft.imag, mix_stft.real)
    mix_rtheta = np.stack((mix_mag, mix_pha), axis=0)
    tm6 = time.time()   
    stfts += [tm1-tm0, tm3-tm2, tm5-tm4]
    nps += [tm2-tm1, tm4-tm3, tm6-tm5]                            
    data['sig_rtheta'] = v0_rtheta
    data['noi_rtheta'] = v1_rtheta
    data['mix_rtheta'] = mix_rtheta
    tl2 = time.time()
    maths.append(tl2-tl1)
    with open(write_name, 'w') as f:
        cPickle.dump(all_info, f, protocol=-1)
    tl3 = time.time()
    saves.append(tl3-tl2)

t1 = time.time()
print 'Total time was %.3f' % (t1-t0)
print 'Total load time was %.4f' % np.sum(loads)
print 'Total math time was %.4f' % np.sum(maths)
print 'Total save time was %.4f' % np.sum(saves)
print 'Total stft math was %.4f' % np.sum(stfts)
print 'Total other numpy math time was %.4f' % np.sum(nps)

这种加速方法能否加快从其他模块导入的代码的速度,或者只能在您自己编写的基本Python和numpy上工作?导入的函数,尤其是调用编译代码的函数,很可能是加速无法触及的“黑匣子”。这一点很好,我怀疑您关于加速访问librosa的说法是正确的。也就是说,它似乎也没有加速可访问的Numpy操作:P