Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/cmake/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 想象一下索引比numpy快得多?_Python_Arrays_Performance_Numpy_Indexing - Fatal编程技术网

Python 想象一下索引比numpy快得多?

Python 想象一下索引比numpy快得多?,python,arrays,performance,numpy,indexing,Python,Arrays,Performance,Numpy,Indexing,我在许多不同的地方读到过numpy.take是一种比花哨的索引更快的方法,例如和 然而,我发现情况并非如此。。。完全下面是我在调试过程中拨弄代码时的一个示例: knn_idx Out[2]: array([ 3290, 5847, 7682, 6957, 22660, 5482, 22661, 10965, 7, 1477, 7681, 3, 17541, 15717, 9139, 1475, 14251, 4400, 7680,

我在许多不同的地方读到过
numpy.take
是一种比花哨的索引更快的方法,例如和

然而,我发现情况并非如此。。。完全下面是我在调试过程中拨弄代码时的一个示例:

knn_idx
Out[2]: 
array([ 3290,  5847,  7682,  6957, 22660,  5482, 22661, 10965,     7,
        1477,  7681,     3, 17541, 15717,  9139,  1475, 14251,  4400,
        7680,  9140,  4758, 22289,  7679,  8407, 20101, 15718, 15716,
        8405, 15710, 20829, 22662], dtype=uint32)
%timeit X.take(knn_idx, axis=0)
100 loops, best of 3: 3.14 ms per loop
%timeit X[knn_idx]
The slowest run took 60.61 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 5.48 µs per loop
X.shape
Out[5]: 
(23011, 30)
X.dtype
Out[6]: 
dtype('float64')
这表明,奇特的索引速度要快得多!使用
numpy.arange
生成索引,我得到了类似的结果:

idx = np.arange(0, len(X), 100)
%timeit X.take(idx, axis=0)
100 loops, best of 3: 3.04 ms per loop
%timeit X[idx]
The slowest run took 9.41 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 20.7 µs per loop
为什么花式索引比使用
numpy.take
要快得多?我是不是碰到了什么边缘案件

我正在通过Anaconda使用Python 3.6,如果相关的话,这里是我的numpy信息:

np.__version__
Out[11]: 
'1.11.3'
np.__config__.show()
blas_mkl_info:
    libraries = ['mkl_core_dll', 'mkl_intel_lp64_dll', 'mkl_intel_thread_dll']
    library_dirs = ['C:/Users/pbreach/Continuum/Anaconda3\\Library\\lib']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    include_dirs = ['C:/Users/pbreach/Continuum/Anaconda3\\Library\\include']
blas_opt_info:
    libraries = ['mkl_core_dll', 'mkl_intel_lp64_dll', 'mkl_intel_thread_dll']
    library_dirs = ['C:/Users/pbreach/Continuum/Anaconda3\\Library\\lib']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    include_dirs = ['C:/Users/pbreach/Continuum/Anaconda3\\Library\\include']
openblas_lapack_info:
  NOT AVAILABLE
lapack_mkl_info:
    libraries = ['mkl_core_dll', 'mkl_intel_lp64_dll', 'mkl_intel_thread_dll']
    library_dirs = ['C:/Users/pbreach/Continuum/Anaconda3\\Library\\lib']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    include_dirs = ['C:/Users/pbreach/Continuum/Anaconda3\\Library\\include']
lapack_opt_info:
    libraries = ['mkl_core_dll', 'mkl_intel_lp64_dll', 'mkl_intel_thread_dll']
    library_dirs = ['C:/Users/pbreach/Continuum/Anaconda3\\Library\\lib']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    include_dirs = ['C:/Users/pbreach/Continuum/Anaconda3\\Library\\include']

在我的测试中,
take
稍微快一些;但是,由于时间很短,加上“缓存”的警告,我并没有对差异进行过多的估计:

In [192]: timeit X.take(idx2, axis=0).shape
The slowest run took 23.29 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 3.66 µs per loop
In [193]: timeit X[idx2,:].shape
The slowest run took 16.75 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 5.58 µs per loop
但是您的索引数组是
uint32
。这在索引方面还可以,但take给了我一个施法错误;所以我的
idx2
astype(int)

使用arange idx,时间是11.5µs,16µs

请注意,我正在使用
.shape
计时;我不完全确定这会有什么不同

我不知道你为什么会得到
ms
的拍摄时间。这感觉更像是一个时间问题,而不是
拍摄的实际差异

我不认为图书馆、BLAS等会带来不同。基本上,底层任务都是相同的—逐步通过数据缓冲区并复制选定的字节。不需要进行复杂的计算。但是我还没有研究take的C代码


Numpy版本“1.12.0”,Linux,4gb翻新桌面。

奇怪的是,在问题的第二次测试中,我使用了
int32
的索引数组,但仍然得到了与第一次测试相似的计时。即使在运行最慢的情况下,使用
int32
索引,速度仍然快了约15倍。这一点很好现在我想知道在我的具体情况下,与一般情况相比,
take
会有什么问题。我还将尝试在计时中包含
shape