python：如何高效地存储3D点列表_Python_Python 3.x_Algorithm_Numpy

python：如何高效地存储3D点列表

python python-3.x algorithm numpy

python：如何高效地存储3D点列表,python,python-3.x,algorithm,numpy,Python,Python 3.x,Algorithm,Numpy,我有一个很长的3D“点”列表，每个点都有[x，y，z，标量，标量，…] 我想在3D（x/y/z）中对点进行“装箱”，或者重新组织成一个新的“阵列阵列阵列”，其尺寸为[n_bins_x，n_bins_y，n_bins_z]，每个元素都是一个子阵列。我不需要仅仅用直方图来“计算”它们，而是希望最终得到每个箱子的一组点，打包成一个3D数组我的实际用例涉及O（1M）个点*O（2K）个时间步，每个时间步都需要进行装箱，因此需要高性能以下是一个例子： #!/usr/bin/env python3 # -

我有一个很长的3D“点”列表，每个点都有

[x，y，z，标量，标量，…]

我想在3D（x/y/z）中对点进行“装箱”，或者重新组织成一个新的“阵列阵列阵列”，其尺寸为[n_bins_x，n_bins_y，n_bins_z]，每个元素都是一个子阵列。我不需要仅仅用直方图来“计算”它们，而是希望最终得到每个箱子的一组点，打包成一个3D数组

我的实际用例涉及O（1M）个点*O（2K）个时间步，每个时间步都需要进行装箱，因此需要高性能

以下是一个例子：

#!/usr/bin/env python3
# -*- coding: utf-8 -*-

import numpy as np
import timeit

# ----- generate pts

size_x, size_y, size_z = 5.0, 3.0, 1.0
n_pts = 100000

x = np.random.uniform(0.0, size_x, n_pts )
y = np.random.uniform(0.0, size_y, n_pts )
z = np.random.uniform(0.0, size_z, n_pts )
u = np.random.uniform(0.0, 1.0,    n_pts )
v = np.random.uniform(0.0, 1.0,    n_pts )
w = np.random.uniform(0.0, 1.0,    n_pts )
data = np.vstack((x,y,z,u,v,w)).T

# ----- define bin bounds

n_bins_x, n_bins_y, n_bins_z = 10, 10, 10
n_bins_total = n_bins_x*n_bins_y*n_bins_z
x_bin_bounds = np.linspace(0.0, size_x, n_bins_x+1, dtype=np.float64)
y_bin_bounds = np.linspace(0.0, size_y, n_bins_y+1, dtype=np.float64)
z_bin_bounds = np.linspace(0.0, size_z, n_bins_z+1, dtype=np.float64)

# ----- np.digitize to get 'bin index' for each particle in each dim
# --> this basically creates 3 new scalars which are integer-valued 'flags'
#      for the sorting process later

bin_inds_x = np.digitize(data[:,0], x_bin_bounds)-1
bin_inds_y = np.digitize(data[:,1], y_bin_bounds)-1
bin_inds_z = np.digitize(data[:,2], z_bin_bounds)-1

# ----- re-organize total list into a 3D array of sub-arrays

data_resorted = np.empty(shape=(n_bins_x, n_bins_y, n_bins_z), dtype=object)

inds_all = np.array(range(n_pts), dtype=np.int32)

# ----- method 1

start_time = timeit.default_timer()
counter=1; particle_counter=0
for i in range(n_bins_x):
    for j in range(n_bins_y):
        for k in range(n_bins_z):
            #print('%i/%i'%(counter,n_bins_total)); counter+=1
            a = np.where(bin_inds_x==i) ## if in the current searched-for x-bin
            b = np.where(bin_inds_y==j) ## if in the current searched-for y-bin
            c = np.where(bin_inds_z==k) ## if in the current searched-for z-bin
            d = np.intersect1d(a, b) ## if in x AND y bin being searched for
            e = np.intersect1d(c, d) ## if in x AND y AND z bin being searched for
            inds = inds_all[e] ## take the particles meeting those criteria
            count = len(inds)
            if (count > 0):
                particle_counter += count
                data_resorted[i,j,k] = data[inds,:] ## assign matched points to 're-organized' array

end_time = timeit.default_timer() - start_time
print('method 1 time: %0.2f[s]'%end_time)

if (particle_counter == n_pts):
    print('all pts binned %i %i'%(particle_counter, n_pts))
else:
    print('pts lost! %i %i'%(particle_counter, n_pts))

# ----- method 2

start_time = timeit.default_timer()
counter=1; particle_counter=0
for i in range(n_bins_x):
    inds = np.copy(inds_all)
    inds = inds[np.where(bin_inds_x[inds]==i)] ## index 'mask' for this x-bin
    inds_copy_i = np.copy(inds) ## matches in (x)... copy for re-use in nested (y,z) searches
    for j in range(n_bins_y):                  
        inds = np.copy(inds_copy_i)
        inds = inds[np.where(bin_inds_y[inds]==j)] ## index 'mask' for this y-bin
        inds_copy_j = np.copy(inds) ## matches in (x,y)... copy for re-use in nested (z) searches
        for k in range(n_bins_z):
            #print('%i/%i'%(counter,n_bins_total)); counter+=1
            inds = np.copy(inds_copy_j)
            inds = inds[np.where(bin_inds_z[inds]==k)] ## index 'mask' for this z-bin
            count = len(inds)
            if (count > 0):
                particle_counter += count
                data_resorted[i,j,k] = data[inds,:] ## assign matched points to 're-organized' array
end_time = timeit.default_timer() - start_time
print('method 2 time: %0.2f[s]'%end_time)

if (particle_counter == n_pts):
    print('all pts binned %i %i'%(particle_counter, n_pts))
else:
    print('pts lost! %i %i'%(particle_counter, n_pts))

输出为：

method 1 time: 1.79[s]
all pts binned 100000 100000
method 2 time: 0.04[s]
all pts binned 100000 100000

“方法2”显示出比“方法1”更大的性能优势，但我仍然觉得我没有正确地使用numpy执行此任务，这可能可以在numpy中以更“矢量化”的方式执行，或者使用另一个numpy函数

在python/numpy中，是否有更好、更有效的方法在3D中“装箱”粒子？

您的两种解决方案产生相同的结果。他们是你所期望的吗-您是否验证了结果（

数据_

）<代码>数据在第二维度中有六个项目，但您仅对前三个项目进行数字化-这是您的意图吗？@wwii是的，我想基于前三个属性“bin”，即x/y/z空间坐标（“数据”的前三列）np.Historogramdd。。。我用它来存储二维点。不知道它是否更快。。。np.histogramdd（XY，bins=[（0,10,20,30,40），（0,10,20,30,40）]，其中bins代表x和y bin边缘。您可以添加一个Z-bin。值得一看，至少如果代码中有注释解释为什么/什么-例如在第一个解决方案中，

np.where

和

np.intersect1d

语句的系列是什么？-再说一次，你是否验证了你的解决方案符合你的预期？@wwii是的，它符合我的期望。我添加了一些评论，试图澄清这个过程。方法2的最大优点是，连续缩减的“索引掩码”被复制，以便在每个嵌套级别重复使用，因此在内部（y，z）级别搜索匹配项的工作更少。例如，一旦我知道哪些粒子在（x，y）中匹配，我就可以在（z）中迭代这个简化列表。方法1对原始列表中的每个（x，y，z）组合的所有点进行完全“搜索”。您的两个解决方案产生相同的结果。他们是你所期望的吗-您是否验证了结果（

数据_

np.where

和

np.intersect1d