Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/arrays/13.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 牛顿力的最有效计算单位:numpy/scipy_Python_Arrays_Performance_Numpy_Scipy - Fatal编程技术网

Python 牛顿力的最有效计算单位:numpy/scipy

Python 牛顿力的最有效计算单位:numpy/scipy,python,arrays,performance,numpy,scipy,Python,Arrays,Performance,Numpy,Scipy,在大学里的一次练习中,我们必须用Python实现一个精确的牛顿力。课程结束了,我们的解决方案都非常好,但我想知道是否/如何能够进一步提高力计算的性能 瓶颈是计算所有力(也称为加速度): i=∑j≠i Gmj/| r1-r2 | 3*(r1-r2)“> 对于大量(1000或更大)的粒子N(i,j我怀疑numpy实际上是计算距离的两倍(因为它总是对称的)。它可能只进行一次计算,并在两个位置分配相同的值 不过,我确实想到了一些想法: 您可以按照numpy源代码编写自定义版本的cdist。可能是每次迭代

在大学里的一次练习中,我们必须用Python实现一个精确的牛顿力。课程结束了,我们的解决方案都非常好,但我想知道是否/如何能够进一步提高力计算的性能

瓶颈是计算所有力(也称为加速度):

i=∑j≠i Gmj/| r1-r2 | 3*(r1-r2)“>


对于大量(1000或更大)的粒子N(i,j我怀疑numpy实际上是计算距离的两倍(因为它总是对称的)。它可能只进行一次计算,并在两个位置分配相同的值

不过,我确实想到了一些想法:

  • 您可以按照numpy源代码编写自定义版本的cdist。可能是每次迭代时程序都在解析许多选项。虽然不多,但可能会给您一个小百分比的提升
  • 预分配。每次运行()时,都可能会为所有中间矩阵值重新分配内存。是否可以使这些数量保持不变

  • 我还没有完成计算,但如果您能以某种方式优雅地减少冗余对称计算,我不会感到惊讶。

    我将尝试一下:我实现了一个例程,它确定单个
    a\I

    import numpy as np
    
    GM = .01  #  article mass times the gravitation
    
    def calc_a_i(rr, i):
        """ Calculate one a_i """
        drr = rr - rr[i, :] # r_j - r_i
        dr3 = np.linalg.norm(drr, axis=1)**3  # |r_j - r_i|**3
        dr3[i] = 1  # case i==j: drr = [0, 0, 0]
        # this would be more robust (elimnate small denominators):
        #dr3 = np.where(np.abs(dr3) > 1e-12, dr3, 1)
        return np.sum(drr.T/dr3, axis=1)
    
    n = 4000 # number of particles
    rr = np.random.randn(n, 3) # generate some particles
    
    # Calculate each a_i separately:
    aa = np.array([calc_a_i(rr, i) for i in range(n)]) * GM # all a_i
    
    为了测试它,我运行了:

    In [1]: %timeit aa = np.array([calc_a_i(rr, i) for i in range(n)])
    1 loops, best of 3: 2.93 s per loop
    
    加速这类代码的最简单方法是使用以下方法更快地计算数组表达式:

    import numexpr as ne
    ne.set_num_threads(1)  # multithreading causes to much overhead
    
    def ne_calc_a_i( i):
        """ Use numexpr - here rr is global for easier parallelization"""
        dr1, dr2, dr3 = (rr - rr[i, :]).T # r_j - r_i
        drrp3 = ne.evaluate("sqrt(dr1**2 + dr2**2 + dr3**2)**3")
        drrp3[i] = 1
        return np.sum(np.vstack([dr1, dr2, dr3])/drrp3, axis=1)
    
    # Calculate each a_i separately:
    aa_ne = np.array([ne_calc_a_i(i) for i in range(n)]) * GM  # all a_i    
    
    这将速度提高2倍:

        In [2]: %timeit aa_ne = np.array([ne_calc_a_i(i) for i in range(n)])
        1 loops, best of 3: 1.29 s per loop
    
    为了进一步加快代码的速度,让我们在以下计算机上运行它:

    加速比超过四倍:

    In[3] %timeit aa_p = para_calc_a(dview, rr)
    1 loops, best of 3: 612 ms per loop
    
    正如@mathdan已经指出的,如何优化这样一个问题并不明显:如果内存总线或浮点单元是限制因素,那么这取决于您的CPU体系结构,这需要不同的技术


    为了获得更多的收益,您可能需要了解:它可以从Python动态生成代码GPU代码。

    以下是更为优化的:

    import numpy as np
    from scipy.spatial.distance import pdist, squareform    
    
    def a6(r, Gm):
        dists = pdist(r)
        dists *= dists*dists
        dists = squareform(dists)
        np.fill_diagonal(dists, 1.)
        sep = r[np.newaxis, :] - r[:, np.newaxis]
        return np.einsum('ijk,ij->ik', sep, Gm/dists)
    
    速度增加主要是由于
    einsum
    线路;像这样使用
    pdist
    squareform
    只比使用
    cdist
    的原始方式稍微快一点

    你可以更进一步,例如使用threading和Numba(需要0.17.0版)。虽然下面的代码非常难看,肯定可以改进很多,但速度非常快

    import numpy as np
    import math
    from numba import jit
    from threading import Thread
    NUM_THREADS = 2  # choose wisely
    
    def a_numba_par(r, Gm):
        a = np.zeros_like(r)
        N = r.shape[0]
    
        offset = range(0, N+1, N//NUM_THREADS)
        chunks = zip(offset, offset[1:])
        threads = [Thread(target=_numba_loop, args=(r,Gm,a)+c) for c in chunks]
    
        for thread in threads:
            thread.start()
        for thread in threads:
            thread.join()
    
        return a
    
    @jit(nopython=True, nogil=True)
    def _numba_loop(r, Gm, a, i1, i2):
        N = r.shape[0]
        for i in range(i1, i2):
            _helper(r, Gm, i, 0  , i, a[i,:])
            _helper(r, Gm, i, i+1, N, a[i,:])
        return a
    
    @jit(nopython=True, nogil=True)
    def _helper(r, Gm, i, j1, j2, a):
        for j in range(j1, j2):
            dx = r[j,0] - r[i,0]
            dy = r[j,1] - r[i,1]
            dz = r[j,2] - r[i,2]
    
            sqeuc = dx*dx + dy*dy + dz*dz
            scale = Gm[j] / (sqeuc * math.sqrt(sqeuc))
    
            a[0] += scale * dx
            a[1] += scale * dy
            a[2] += scale * dz
    
    import numpy as np
    from scipy.spatial.distance import pdist, squareform    
    
    def a6(r, Gm):
        dists = pdist(r)
        dists *= dists*dists
        dists = squareform(dists)
        np.fill_diagonal(dists, 1.)
        sep = r[np.newaxis, :] - r[:, np.newaxis]
        return np.einsum('ijk,ij->ik', sep, Gm/dists)
    
    import numpy as np
    import math
    from numba import jit
    from threading import Thread
    NUM_THREADS = 2  # choose wisely
    
    def a_numba_par(r, Gm):
        a = np.zeros_like(r)
        N = r.shape[0]
    
        offset = range(0, N+1, N//NUM_THREADS)
        chunks = zip(offset, offset[1:])
        threads = [Thread(target=_numba_loop, args=(r,Gm,a)+c) for c in chunks]
    
        for thread in threads:
            thread.start()
        for thread in threads:
            thread.join()
    
        return a
    
    @jit(nopython=True, nogil=True)
    def _numba_loop(r, Gm, a, i1, i2):
        N = r.shape[0]
        for i in range(i1, i2):
            _helper(r, Gm, i, 0  , i, a[i,:])
            _helper(r, Gm, i, i+1, N, a[i,:])
        return a
    
    @jit(nopython=True, nogil=True)
    def _helper(r, Gm, i, j1, j2, a):
        for j in range(j1, j2):
            dx = r[j,0] - r[i,0]
            dy = r[j,1] - r[i,1]
            dz = r[j,2] - r[i,2]
    
            sqeuc = dx*dx + dy*dy + dz*dz
            scale = Gm[j] / (sqeuc * math.sqrt(sqeuc))
    
            a[0] += scale * dx
            a[1] += scale * dy
            a[2] += scale * dz