Python 为什么einsum在浮点数上比numpy.correlate快,但在复数上慢?
试图找出在某些复数上实现相关性的最快方法—数据形状是长度为~1e5的一维数组,特征核长度为~20。Inner1d不能接受复数,并且fft卷积在这种内核大小下是无效的,所以我测试的两种方法是np.correlate和einsum 对于浮动,einsum与使用as_Stread的滑动窗口相结合比np.correlate快一点Python 为什么einsum在浮点数上比numpy.correlate快,但在复数上慢?,python,arrays,performance,numpy,Python,Arrays,Performance,Numpy,试图找出在某些复数上实现相关性的最快方法—数据形状是长度为~1e5的一维数组,特征核长度为~20。Inner1d不能接受复数,并且fft卷积在这种内核大小下是无效的,所以我测试的两种方法是np.correlate和einsum 对于浮动,einsum与使用as_Stread的滑动窗口相结合比np.correlate快一点 data = np.ones(1e5, dtype='float64') kernel = np.ones(20, dtype='float64') %timeit xc1
data = np.ones(1e5, dtype='float64')
kernel = np.ones(20, dtype='float64')
%timeit xc1 = np.correlate(data, kernel)
100 loops, best of 3: 2.13 ms per loop
%timeit xc2 = np.einsum("ij,j->i", as_strided(data, shape=(data.shape[0]-(kernel.shape[0]-1), kernel.shape[0]), strides=data.strides * 2), kernel)
1000 loops, best of 3: 1.35 ms per loop
%timeit xc3 = np.einsum("ij,j->i", as_strided(data_Re, shape=(data_Re.shape[0]-(kernel_Re.shape[0]-1), kernel_Re.shape[0]), strides=data_Re.strides * 2), kernel_Re) + np.einsum("ij,j->i", as_strided(data_Im, shape=(data_Im.shape[0]-(kernel_Im.shape[0]-1), kernel_Im.shape[0]), strides=data_Im.strides * 2), kernel_Im)
100 loops, best of 3: 4.21 ms per loop
但对于复数,einsum明显比np慢
data = np.ones(1e5, dtype='complex128')
kernel = np.ones(20, dtype='complex128')
data_conj = np.empty(data.shape[0], dtype='complex128')
data_conj = np.conj(data)
%timeit xc1 = np.correlate(data, kernel)
100 loops, best of 3: 2.21 ms per loop
%timeit xc2 = np.einsum("ij,j->i", as_strided(data_conj, shape=(data_conj.shape[0]-(kernel.shape[0]-1), kernel.shape[0]), strides=data_conj.strides * 2), kernel)
100 loops, best of 3: 5.78 ms per loop
我想我理解为什么einsum比以前花费的时间更长,但是np.correlate怎么会比浮动花费更多的时间呢?是否有一个复杂的数学技巧,我可能可以利用的einsum
就其价值而言,将einsum关联分解为实部和虚部的速度更快,但仍然比np.CONNECTOR慢得多
data = np.ones(1e5, dtype='float64')
kernel = np.ones(20, dtype='float64')
%timeit xc1 = np.correlate(data, kernel)
100 loops, best of 3: 2.13 ms per loop
%timeit xc2 = np.einsum("ij,j->i", as_strided(data, shape=(data.shape[0]-(kernel.shape[0]-1), kernel.shape[0]), strides=data.strides * 2), kernel)
1000 loops, best of 3: 1.35 ms per loop
%timeit xc3 = np.einsum("ij,j->i", as_strided(data_Re, shape=(data_Re.shape[0]-(kernel_Re.shape[0]-1), kernel_Re.shape[0]), strides=data_Re.strides * 2), kernel_Re) + np.einsum("ij,j->i", as_strided(data_Im, shape=(data_Im.shape[0]-(kernel_Im.shape[0]-1), kernel_Im.shape[0]), strides=data_Im.strides * 2), kernel_Im)
100 loops, best of 3: 4.21 ms per loop