Python 在直方图分块之前，乘以距离矩阵中的距离数_Python_Numpy_Scipy_Histogram

Python 在直方图分块之前，乘以距离矩阵中的距离数

python numpy

Python 在直方图分块之前，乘以距离矩阵中的距离数,python,numpy,scipy,histogram,Python,Numpy,Scipy,Histogram,我正在使用scipy.spatial.distance.pdist来计算从坐标数组（后跟numpy.histogram）到存储结果的距离。目前，它将每个坐标视为一个对象，但在同一坐标上有多个对象。一种选择是更改阵列，使每个坐标出现多次，每个对象在该坐标出现一次，但是这将大大增加阵列的大小和pdist的计算时间，因为它的规模为N^2，这是非常昂贵的，并且在该应用程序中速度非常重要第二种方法是处理产生的距离矩阵，使每个距离重复ninj次，其中ni是坐标i处的对象数量，nj是坐标j处的对象数量。这会

我正在使用scipy.spatial.distance.pdist来计算从坐标数组（后跟numpy.histogram）到存储结果的距离。目前，它将每个坐标视为一个对象，但在同一坐标上有多个对象。一种选择是更改阵列，使每个坐标出现多次，每个对象在该坐标出现一次，但是这将大大增加阵列的大小和pdist的计算时间，因为它的规模为N^2，这是非常昂贵的，并且在该应用程序中速度非常重要

第二种方法是处理产生的距离矩阵，使每个距离重复ninj次，其中ni是坐标i处的对象数量，nj是坐标j处的对象数量。这会将原始MxM距离矩阵转换为NxN距离矩阵，其中M是数组中的坐标总数，但N是对象总数。但同样，这似乎花费了不必要的成本，因为我真正需要做的就是告诉Historograming函数将距离ij处的事件数乘以ninj。换句话说，有没有办法告诉numpy.histogram在距离ij处不只有一个对象，而是有ni*nj对象

其他想法显然是受欢迎的

编辑：

这是第一种方法的一个例子

import numpy as np
from scipy import spatial
import matplotlib.pyplot as plt

#create array of 5 coordinates in 3D
coords = np.random.random(15).reshape(5,3)
'''array([[ 0.66500534,  0.10145476,  0.92528492],
       [ 0.52677892,  0.07756804,  0.50976737],
       [ 0.50030508,  0.37635556,  0.20828815],
       [ 0.02707651,  0.21878467,  0.55855427],
       [ 0.81564621,  0.82750694,  0.53083443]])'''

#number of objects at each coordinate
objects = np.random.randint(1,10,5)
#array([5, 3, 8, 5, 1])

#create new array with coordinates for each individual object
new_coords = np.zeros((objects.sum(),3))

#there's surely a simpler way to do this
j=0
for coord in range(coords.shape[0]):
    for i in range(objects[coord]):
            new_coords[j] = coords[coord]
            j+=1

'''new_coords
array([[ 0.66500534,  0.10145476,  0.92528492],
       [ 0.66500534,  0.10145476,  0.92528492],
       [ 0.66500534,  0.10145476,  0.92528492],
       [ 0.66500534,  0.10145476,  0.92528492],
       [ 0.66500534,  0.10145476,  0.92528492],
       [ 0.52677892,  0.07756804,  0.50976737],
       [ 0.52677892,  0.07756804,  0.50976737],
       [ 0.52677892,  0.07756804,  0.50976737],
       [ 0.50030508,  0.37635556,  0.20828815],
       [ 0.50030508,  0.37635556,  0.20828815],
       [ 0.50030508,  0.37635556,  0.20828815],
       [ 0.50030508,  0.37635556,  0.20828815],
       [ 0.50030508,  0.37635556,  0.20828815],
       [ 0.50030508,  0.37635556,  0.20828815],
       [ 0.50030508,  0.37635556,  0.20828815],
       [ 0.50030508,  0.37635556,  0.20828815],
       [ 0.02707651,  0.21878467,  0.55855427],
       [ 0.02707651,  0.21878467,  0.55855427],
       [ 0.02707651,  0.21878467,  0.55855427],
       [ 0.02707651,  0.21878467,  0.55855427],
       [ 0.02707651,  0.21878467,  0.55855427],
       [ 0.81564621,  0.82750694,  0.53083443]])''' 

#calculate distance matrix of old and new arrays
distances_old = distance.pdist(coords)
distances_new = distance.pdist(new_coords)

#calculate and plot normalized histograms (typically just use np.histogram without plotting)
plt.hist(distances_old, range=(0,1), alpha=.5, normed=True)
(array([ 0.,  0.,  0.,  0.,  2.,  1.,  2.,  2.,  2.,  1.]), array([ 0. ,  0.1,  0.2,  0.3,  0.4,  0.5,  0.6,  0.7,  0.8,  0.9,  1. ]), <a list of 10 Patch objects>)

plt.hist(distances_new, range=(0,1), alpha=.5, normed=True)
(array([ 2.20779221,  0.        ,  0.        ,  0.        ,  1.68831169,
        0.64935065,  2.07792208,  2.81385281,  0.34632035,  0.21645022]), array([ 0. ,  0.1,  0.2,  0.3,  0.4,  0.5,  0.6,  0.7,  0.8,  0.9,  1. ]), <a list of 10 Patch objects>)

plt.show()

将numpy导入为np
从scipy导入空间
将matplotlib.pyplot作为plt导入
#在3D中创建5个坐标的数组
坐标=np。随机。随机（15）。重塑（5,3）
''数组（[[0.66500534,0.10145476,0.92528492]，
[ 0.52677892,  0.07756804,  0.50976737],
[ 0.50030508,  0.37635556,  0.20828815],
[ 0.02707651,  0.21878467,  0.55855427],
[ 0.81564621,  0.82750694,  0.53083443]])'''
#每个坐标处的对象数
objects=np.random.randint（1,10,5）
#数组（[5,3,8,5,1]）
#使用每个单独对象的坐标创建新阵列
new_coords=np.zero（（objects.sum（），3））
#当然有一种更简单的方法可以做到这一点
j=0
对于范围内的坐标（坐标形状[0]）：
对于范围内的i（对象[coord]）：
新的合作伙伴[j]=合作伙伴[coord]
j+=1
''新的合作伙伴
数组（[[0.66500534,0.10145476,0.92528492]，
[ 0.66500534,  0.10145476,  0.92528492],
[ 0.66500534,  0.10145476,  0.92528492],
[ 0.66500534,  0.10145476,  0.92528492],
[ 0.66500534,  0.10145476,  0.92528492],
[ 0.52677892,  0.07756804,  0.50976737],
[ 0.52677892,  0.07756804,  0.50976737],
[ 0.52677892,  0.07756804,  0.50976737],
[ 0.50030508,  0.37635556,  0.20828815],
[ 0.50030508,  0.37635556,  0.20828815],
[ 0.50030508,  0.37635556,  0.20828815],
[ 0.50030508,  0.37635556,  0.20828815],
[ 0.50030508,  0.37635556,  0.20828815],
[ 0.50030508,  0.37635556,  0.20828815],
[ 0.50030508,  0.37635556,  0.20828815],
[ 0.50030508,  0.37635556,  0.20828815],
[ 0.02707651,  0.21878467,  0.55855427],
[ 0.02707651,  0.21878467,  0.55855427],
[ 0.02707651,  0.21878467,  0.55855427],
[ 0.02707651,  0.21878467,  0.55855427],
[ 0.02707651,  0.21878467,  0.55855427],
[ 0.81564621,  0.82750694,  0.53083443]])''' 
#计算新旧阵列的距离矩阵
距离=距离.pdist（坐标）
距离新=距离.pdist（新坐标）
#计算并绘制标准化直方图（通常只使用np.histogram而不绘制）
plt.hist（距离，范围=（0,1），alpha=.5，规范=真）
（数组（[0,0,0,0,2,1,2,2,2,1.]），数组（[0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.]），）
plt.hist（距离新，范围=（0,1），alpha=.5，规范=真）
（数组（[2.20779221,0,0,0,1.68831169，
0.64935065,2.07792208,2.81385281,0.34632035,0.21645022]），阵列（[0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1]））
plt.show（）

第二种方法是处理距离矩阵，而不是坐标矩阵，但我还没有弄清楚代码

这两种方法在我看来都是低效的，我认为操作np.histogram的装箱过程更有可能是高效的，因为这只是基本的乘法，但我不确定如何告诉np.histogram将每个坐标视为具有可变数量的对象进行计数。

类似的方法可能会起作用：

from scipy.spatial import distance

positions = np.random.rand(10, 2)
counts = np.random.randint(1, 5, len(positions))

distances = distance.pdist(positions)
i, j = np.triu_indices(len(positions), 1)

bins = np.linspace(0, 1, 10)
h, b = np.histogram(distances, bins=bins, weights=counts[i]*counts[j])

除了

-距离之外，它与重复相比检出：

repeated = np.repeat(positions, counts, 0)
rdistances_r = distance.pdist(repeated)

hr, br = np.histogram(rdistances, bins=bins)

In [83]: h
Out[83]: array([11, 22, 27, 43, 67, 46, 40,  0, 19,  0])

In [84]: hr
Out[84]: array([36, 22, 27, 43, 67, 46, 40,  0, 19,  0])

问题是什么？您可以用（复制粘贴可运行）代码来描述现有代码，而不是用文字来描述它吗？您如何知道每个位置有多少个对象？如果你有一个与位置数组形状相同的计数数组，你可以很容易地用它来加权直方图。参见我的编辑（你们都很快）。是的，对象计数数组与坐标数组的形状相同。@MrE，我认为问题很清楚，编辑中显示了一个代码示例。@askewchan是的，现在好多了啊，是的，加权。这么简单。谢谢有没有一种更快的方法来实现np.triu_索引或np.histogram？我发现它们都是我代码中的瓶颈。我有包含数千个坐标的“位置”数组，这对于我的需求（十分之一秒）来说太慢了（~10+秒）。我本以为pdist会成为瓶颈，但事实并非如此。对于10000坐标数组，pdist只需要约0.8秒（速度慢，但可以接受边界），但索引需要约8秒，直方图需要约5秒，这太长了。有什么想法吗？我使用了

triu_索引

，因为它最容易理解，但是有更快的方法来生成它。看看我最近的一个问题的一些讨论：。我认为你不能加快直方图的速度