Python 获取数组1中不在数组2中的元素_Python_Performance_Numpy_Vectorization

Python 获取数组1中不在数组2中的元素

python performance numpy

Python 获取数组1中不在数组2中的元素,python,performance,numpy,vectorization,Python,Performance,Numpy,Vectorization,主要问题检索特定数组中未在其他数组中找到的元素的更好的/python方法是什么。这就是我所拥有的 idata = [np.column_stack(data[k]) for k in range(len(data)) if data[k] not in final] idata = np.vstack(idata) 我对表演感兴趣。Mydata是一个大小为（7000 X 3）的（X，Y，Z）数组，Mygdata是一个大小为（11000 X 2）的（X，Y）数组序言我正在进行八分位搜索，以

主要问题

检索特定数组中未在其他数组中找到的元素的更好的/python方法是什么。这就是我所拥有的

idata = [np.column_stack(data[k]) for k in range(len(data)) if data[k] not in final]
idata = np.vstack(idata)

我对表演感兴趣。My

data

是一个大小为（7000 X 3）的（X，Y，Z）数组，My

gdata

是一个大小为（11000 X 2）的（X，Y）数组

序言

我正在进行八分位搜索，以查找每个八分位中距离我的圆形点（o）最近的点（+）的n个数（例如8）。这意味着我的分数（+）减少到64（每八分之一8）。然后，对于每个

gdata

，我将保存

数据中未找到的元素

将tkinter作为tk导入
从tkinter导入文件对话框
作为pd进口熊猫
将numpy作为np导入
从scipy.spatial.distance导入cdist
从集合导入defaultdict
root=tk.tk（）
root.draw（）
file_path=filedialog.askopenfilename（）
数据=pd.read\u excel（文件路径）
data=np.array（data，dtype=np.float）
nrow，cols=data.shape
file_path1=filedialog.askopenfilename（）
gdata=pd.read\u excel（文件路径1）
gdata=np.array（gdata，dtype=np.float）
gnrow，gcols=gdata.shape
N=8
delta=gdata-data[：，：2]
角度=np.arctan2（δ[：，1]，δ[：，0]）
bin=np.linspace（-np.pi，np.pi，9）
bins[-1]=np.inf#处理边缘大小写
八进制排序=[]
对于范围内的j（gnrow）：
delta=gdata[j，：：]-data[：，：2]
角度=np.arctan2（δ[：，1]，δ[：，0]）
八进制排序=[]
对于范围（8）中的i：
数据_i=数据[（箱[i]0:
dist_order=np.argsort（cdist（数据i[：，：2]，gdata[j，：：][np.newaxis]），axis=0）
如果dist_order.size

在代码的最后两行中，是否有一种高效的python方法可以提高性能？
如果我正确理解您的代码，那么我会看到以下潜在的节约：

删除final=…
行
不要使用arctan
这很昂贵；因为您只希望八分之一点将坐标与零和彼此进行比较
不要执行完整的argsort
，而是使用argpartition
将octantsort设置为“octantargsort”，即将索引存储到数据中，而不是数据点本身；这将在最后一行保存搜索，并允许您使用np.delete
删除
不要在列表理解中使用append
。这将生成一个立即丢弃的None
s列表。您可以使用list。将扩展到理解之外

此外，这些列表理解看起来像是将数据[dist\u order[：npoint\u per\u octant]]
转换为列表的一种复杂方式，既然您最终想要vstack
，为什么不简单地强制转换，甚至保留为一个数组呢

下面是一些示例代码，说明了这些想法：
import numpy as np

def discard_nearest_in_each_octant(eater, eaten, n_eaten_p_eater):
    # build octants
    # start with quadrants ...
    top, left = (eaten < eater).T
    quadrants = [np.where(v&h)[0] for v in (top, ~top) for h in (left, ~left)]
    dcoord2 = (eaten - eater)**2
    dc2quadrant = [dcoord2[q] for q in quadrants]
    # ... and split them
    oct4158 = [q[:, 0] < q [:, 1] for q in dc2quadrant]
    # main loop
    dc2octants = [[q[o], q[~o]] for q, o in zip (dc2quadrant, oct4158)]
    reloap = [[
        np.argpartition(o.sum(-1), n_eaten_p_eater)[:n_eaten_p_eater]
        if o.shape[0] > n_eaten_p_eater else None
        for o in opair] for opair in dc2octants]
    # translate indices
    octantargpartition = [q[so] if oap is None else q[np.where(so)[0][oap]]
                          for q, o, oaps in zip(quadrants, oct4158, reloap)
                          for so, oap in zip([o, ~o], oaps)]
    octantargpartition = np.concatenate(octantargpartition)
    return np.delete(eaten, octantargpartition, axis=0)

将numpy导入为np
def丢弃每个八分位中最接近的四分位（食者、已吃者、n已吃者）：
#构建八分之一
#从象限开始。。。
左上=（吃的<吃的）。T
象限=[np.其中（v&h）[0]表示v in（上，~top）表示h in（左，~left）]
DCORD2=（吃的人）**2
DC2象限=[DCORD2[q]表示象限中的q]
#…然后把他们分开
oct4158=[q[：，0]n_-eater\u-p_-eater其他无
对于蛋白石中的o]对于dc2octants中的蛋白石]
#翻译索引
octantargpartition=[q[so]如果oap不是其他q[np.where（so）[0][oap]]
对于zip中的q、o、OAP（象限，oct4158，重新OAP）
对于so，zip中的oap（[o，~o]，oaps）]
octantargpartition=np.连接（octantargpartition）
返回np.delete（eat，八角分割，轴=0）
示例/模拟数据，典型/最大数据集大小比显示文件读取代码更有帮助。为了获得更好的性能，请尝试制作最终
aset。可以在O（1）时间内搜索集合。
import numpy as np

def discard_nearest_in_each_octant(eater, eaten, n_eaten_p_eater):
    # build octants
    # start with quadrants ...
    top, left = (eaten < eater).T
    quadrants = [np.where(v&h)[0] for v in (top, ~top) for h in (left, ~left)]
    dcoord2 = (eaten - eater)**2
    dc2quadrant = [dcoord2[q] for q in quadrants]
    # ... and split them
    oct4158 = [q[:, 0] < q [:, 1] for q in dc2quadrant]
    # main loop
    dc2octants = [[q[o], q[~o]] for q, o in zip (dc2quadrant, oct4158)]
    reloap = [[
        np.argpartition(o.sum(-1), n_eaten_p_eater)[:n_eaten_p_eater]
        if o.shape[0] > n_eaten_p_eater else None
        for o in opair] for opair in dc2octants]
    # translate indices
    octantargpartition = [q[so] if oap is None else q[np.where(so)[0][oap]]
                          for q, o, oaps in zip(quadrants, oct4158, reloap)
                          for so, oap in zip([o, ~o], oaps)]
    octantargpartition = np.concatenate(octantargpartition)
    return np.delete(eaten, octantargpartition, axis=0)