Python 优化多线程numpy数组函数_Python_Multithreading_Numpy

Python 优化多线程numpy数组函数

python multithreading numpy

Python 优化多线程numpy数组函数,python,multithreading,numpy,Python,Multithreading,Numpy,给定两个大型3D点阵列（我将称第一个“源”，第二个“目标”），我需要一个函数，该函数将从“目标”返回索引，该索引与“源”元素最接近，但有此限制：我只能使用numpy。。。所以没有scipy，熊猫，numexpr，cython 为此，我编写了一个函数。我迭代源元素，找到距离目标最近的元素并返回其索引。出于性能方面的考虑，又因为我只能使用numpy，所以我尝试使用多线程来加快速度。以下是线程和非线程函数，以及它们在8核机器上的速度比较 import timeit import numpy as np

给定两个大型3D点阵列（我将称第一个“源”，第二个“目标”），我需要一个函数，该函数将从“目标”返回索引，该索引与“源”元素最接近，但有此限制：我只能使用numpy。。。所以没有scipy，熊猫，numexpr，cython

为此，我编写了一个函数。我迭代源元素，找到距离目标最近的元素并返回其索引。出于性能方面的考虑，又因为我只能使用numpy，所以我尝试使用多线程来加快速度。以下是线程和非线程函数，以及它们在8核机器上的速度比较

import timeit
import numpy as np
from numpy.core.umath_tests import inner1d
from multiprocessing.pool import ThreadPool

def threaded(sources, destinations):
    # Define worker function
    def worker(point):
        dlt = (destinations-point) # delta between destinations and given point
        d = inner1d(dlt,dlt) # get distances
        return np.argmin(d) # return closest index

    # Multithread!
    p = ThreadPool()
    return p.map(worker, sources)


def unthreaded(sources, destinations):
    results = []
    #for p in sources:
    for i in range(len(sources)):
        dlt = (destinations-sources[i]) # difference between destinations and given point
        d = inner1d(dlt,dlt) # get distances
        results.append(np.argmin(d)) # append closest index

    return results


# Setup the data
n_destinations = 10000 # 10k random destinations
n_sources = 10000      # 10k random sources
destinations= np.random.rand(n_destinations,3) * 100
sources = np.random.rand(n_sources,3) * 100

#Compare!
print 'threaded:   %s'%timeit.Timer(lambda: threaded(sources,destinations)).repeat(1,1)[0]
print 'unthreaded: %s'%timeit.Timer(lambda: unthreaded(sources,destinations)).repeat(1,1)[0]

结果：

threaded:   0.894030461056
unthreaded: 1.97295164054

多线程似乎是有益的，但考虑到我处理的实际数据集要大得多，我希望增加2倍以上

我们将非常感谢所有提高性能的建议（在上述限制范围内）

好的，我一直在阅读python上的Maya文档，得出以下结论/猜测：

他们可能在内部使用了CPython（对文档的一些引用，而不是任何其他引用）

他们不喜欢线程（很多非线程安全的方法）

鉴于上述情况，我认为最好避免线程。因为，这是一个常见的问题，有几种方法可以更早地解决

尝试构建一个工具。完成后，使用C/C++中的线程就个人而言，我只会试着去工作，然后继续前进

使用。即使您的自定义python发行版不包含它，您也可以获得一个工作版本，因为它都是纯python代码<代码>多处理不受GIL的影响，因为它会产生单独的进程

以上这些应该对你有用。如果没有，尝试另一个（经过一些认真的祈祷）

另一方面，如果您使用的是外部模块，请注意尝试匹配maya的版本。这可能是因为您无法生成
scipy
。当然，
scipy
拥有庞大的代码库，而windows平台并不是最具弹性的构建工具。
在尝试通过多线程增加马力之前，我会从更好的算法开始。。。目前，您的暴力方法是
O（N*M）
，而在准备目标点的KD树后，您可以将其设置为
O（N*log（M））
。为什么您只承诺使用numpy？@ali\M我必须使用名为mayapy的自定义python构建，这是Autodesk在其3D软件（Maya）中对python的实现。它是针对VC2010 64位编译的，因此预构建的二进制文件（如和其他）不兼容。我能够从头开始构建numpy，但未能构建scipy、numexpr等@MatteoItalia。我确实寻找了纯python KD树实现，尤其是我发现并实现了。两者都比我的无线程numpy测试慢得多（大约10倍）。因此，我担心如果无法访问scipy（如上所述），KD树将是一条死胡同。仅供参考，在我小巧的双核笔记本电脑
cKDTree
上，构建树和使用示例数据查询树大约需要17ms，而
线程化的和无线程化的分别需要1.53s和1.39s。我绝对相信k-D树是值得追求的。我很惊讶你看到pykdtree的查询速度如此之慢，它似乎是用C/Cython实现的。是的，这很烦人。因为单纯停留在numpy中似乎无法提高性能，所以我可能最终会将数据取出，并运行一个外部“标准”python实例（使用scipy及其所有工具）来处理它并给出结果。不是我想要的，但现在我需要继续前进。