Python numpy：如何加入阵列？（以获得多个范围的并集）_Python_Arrays_Numpy_Indexing

Python numpy：如何加入阵列？（以获得多个范围的并集）

python arrays numpy indexing

Python numpy：如何加入阵列？（以获得多个范围的并集）,python,arrays,numpy,indexing,Python,Arrays,Numpy,Indexing,我将Python与numpy一起使用我有一个索引的numpy数组a： >>> a array([[5, 7], [12, 18], [20, 29]]) >>> type(a) <type 'numpy.ndarray'> 我需要将数组a与数组b连接起来： >>> b array([[2, 4], [8, 11], [33, 35]]) >>> type

我将Python与

numpy

一起使用

我有一个索引的numpy数组

：

>>> a
array([[5, 7],
       [12, 18],
       [20, 29]])
>>> type(a)
<type 'numpy.ndarray'>

我需要将数组

与数组

连接起来：

>>> b
array([[2, 4],
       [8, 11],
       [33, 35]])
>>> type(b)
<type 'numpy.ndarray'>

[2,4][5,7][8,11][12,18][20,29][33,35]

和

有索引数组=>

[2,18][20,29][33,35]

（索引

（[2,4][5,7][8,11][12,18]）

按顺序进行

2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18

[2,18]

）

对于此示例：

>>> out_c
array([[2, 18],
       [20, 29],
       [33, 35]])

有人能给我建议一下，我怎样才能把
拿出来更新：@Geoff建议的解决方案。此解决方案是否是大型数据阵列中最快和最好的解决方案？也许您可以尝试使用numpy.concatenate（）将阵列连接在一起，然后找到每行的最小值和最大值……然后将c创建为每行的最小值和最大值的矩阵或者，np.minimum和np.maximum比较两个数组并找到最小值和最大值，因此您可以找到每行的最小值和最大值，然后使用Numpy将其分配给矩阵c。（新答案）（旧答案）使用列表和集合 import numpy as np import itertools def ranges(s): """ Converts a list of integers into start, end pairs """ for a, b in itertools.groupby(enumerate(s), lambda(x, y): y - x): b = list(b) yield b[0][1], b[-1][1] def intersect(*args): """ Converts any number of numpy arrays containing start, end pairs into a set of indexes """ s = set() for start, end in np.vstack(args): s = s | set(range(start,end+1)) return s a = np.array([[5,7],[12, 18],[20,29]]) b = np.array([[2,4],[8,11],[33,35]]) result = np.array(list(ranges(intersect(a,b)))) 工具书类不漂亮，但它能工作。我不喜欢最后的循环，buy想不出没有它的方法： ab = np.vstack((a,b)) ab.sort(axis=0) join_with_next = ab[1:, 0] - ab[:-1, 1] <= 1 endpoints = np.concatenate(([0], np.where(np.diff(join_with_next) == True)[0] + 2, [len(ab,)])) lengths = np.diff(endpoints) new_lengths = lengths.copy() if join_with_next[0] == True: new_lengths[::2] = 1 else: new_lengths[1::2] = 1 new_endpoints = np.concatenate(([0], np.cumsum(new_lengths))) print endpoints, lengths print new_endpoints, new_lengths starts = endpoints[:-1] ends = endpoints[1:] new_starts = new_endpoints[:-1] new_ends = new_endpoints[1:] c = np.empty((new_endpoints[-1], 2), dtype=ab.dtype) for j, (s,e,ns,ne) in enumerate(zip(starts, ends, new_starts, new_ends)): if e-s != ne-ns: c[ns:ne] = np.array([np.min(ab[s:e, 0]), np.max(ab[s:e, 1])]) else: c[ns:ne] = ab[s:e] >>> c array([[ 2, 18], [20, 29], [33, 35]]) ab=np.vstack（（a，b）） ab.sort（轴=0）用_next=ab[1:，0]-ab[：-1，1]>>c连接_ 数组（[[2,18]， [20, 29], [33, 35]]) 你所说的“集成”这两个数组到底是什么意思？也许我只是太密集了，但我不知道你是如何从a 和b 到out\u c @stugray我更新了一个问题哦，您正在尝试获得多个范围的并集。尝试搜索：我认为两个数组a 和b 都有三行是巧合。如果我错了，这是一个很好的观点。你激励我继续努力。如果你还对这个问题感兴趣，请检查我的答案好吗？它似乎太简单而不正确。ab.sort（0）破坏了开始对和结束对之间的关联。真正的排序应该是ab=ab[ab[：，0].argsort（），：] @Geoff将数组视为一个结构化数组，然后它将正确排序（我认为）。嗯，我有点困惑了。我透露了我的新答案。也许我们可以打破它。这绝对是一个简短的答案…@Jaime代表a=np.array（[[0,1]，[5,7]]）和b=np.array（[[3,4]]） result:out\u c=[[01][34][5 7] ，你发现新解决方案有什么地方不起作用？我现在正在戳它…有一个“非重叠”的警告，我想说它工作得很好，不是吗？要排序它，保持每个开始和结束，你可以做ranges.view（dtype=[（''，ranges.dtype），]*2.sort（axis=0）。它有问题，即它不工作，输入像ranges=np.array（[[2,7]，[3,12]，[4,11]]）max 。。。或者OP可以保证这种重叠永远不会发生。你确定吗？需要按原样调用sort。 ab = np.vstack((a,b)) ab.sort(axis=0) join_with_next = ab[1:, 0] - ab[:-1, 1] <= 1 endpoints = np.concatenate(([0], np.where(np.diff(join_with_next) == True)[0] + 2, [len(ab,)])) lengths = np.diff(endpoints) new_lengths = lengths.copy() if join_with_next[0] == True: new_lengths[::2] = 1 else: new_lengths[1::2] = 1 new_endpoints = np.concatenate(([0], np.cumsum(new_lengths))) print endpoints, lengths print new_endpoints, new_lengths starts = endpoints[:-1] ends = endpoints[1:] new_starts = new_endpoints[:-1] new_ends = new_endpoints[1:] c = np.empty((new_endpoints[-1], 2), dtype=ab.dtype) for j, (s,e,ns,ne) in enumerate(zip(starts, ends, new_starts, new_ends)): if e-s != ne-ns: c[ns:ne] = np.array([np.min(ab[s:e, 0]), np.max(ab[s:e, 1])]) else: c[ns:ne] = ab[s:e] >>> c array([[ 2, 18], [20, 29], [33, 35]])