Python N-D版本的itertools.compositions（以numpy为单位）_Python_Numpy_Combinatorics_Itertools

Python N-D版本的itertools.compositions（以numpy为单位）

python numpy

Python N-D版本的itertools.compositions（以numpy为单位）,python,numpy,combinatorics,itertools,Python,Numpy,Combinatorics,Itertools,我想为numpy实现。基于，我有一个用于1D输入的函数： def combs(a, r): """ Return successive r-length combinations of elements in the array a. Should produce the same output as array(list(combinations(a, r))), but faster. """ a = asarray(a) dt =

我想为numpy实现。基于，我有一个用于1D输入的函数：

def combs(a, r):
    """
    Return successive r-length combinations of elements in the array a.
    Should produce the same output as array(list(combinations(a, r))), but 
    faster.
    """
    a = asarray(a)
    dt = dtype([('', a.dtype)]*r)
    b = fromiter(combinations(a, r), dt)
    return b.view(a.dtype).reshape(-1, r)

而且输出是有意义的：

In [1]: list(combinations([1,2,3], 2))
Out[1]: [(1, 2), (1, 3), (2, 3)]

In [2]: array(list(combinations([1,2,3], 2)))
Out[2]: 
array([[1, 2],
       [1, 3],
       [2, 3]])

In [3]: combs([1,2,3], 2)
Out[3]: 
array([[1, 2],
       [1, 3],
       [2, 3]])

然而，如果我能将其扩展到N-D输入，那么最好的情况是，在N-D输入中，额外的维度只允许您快速地一次执行多个调用。因此，在概念上，如果

梳（[1,2,3]，2）

产生

[1,2]，[1,3]，[2,3]，和梳（[4,5,6]，2）
产生[4,5]，[4,6]，[5,6]，梳（（1,2,3）和（4,5,6），2）
应该产生[1,2,3]，[4,5]，[4,6]，[5]，[5]，[5]，[6]，[5]，[6]只表示平行的行或列（以有意义的为准）。（对于附加尺寸，也是如此）
我不确定：
如何使维度以与其他函数的工作方式一致的逻辑方式工作（例如一些numpy函数如何具有axis=
参数，以及默认的轴0。因此，可能轴0应该是我正在组合的轴，而所有其他轴只代表并行计算？）
如何让上面的代码与ND一起工作（现在我得到ValueError:使用序列设置数组元素。
）
有没有更好的方法来执行dt=dtype（[（''，a.dtype）]*r）
不确定它将如何在性能方面发挥作用，但您可以在索引数组上进行组合，然后使用np提取实际的数组切片
def combs_nd(a, r, axis=0):
    a = np.asarray(a)
    if axis < 0:
        axis += a.ndim
    indices = np.arange(a.shape[axis])
    dt = np.dtype([('', np.intp)]*r)
    indices = np.fromiter(combinations(indices, r), dt)
    indices = indices.view(np.intp).reshape(-1, r)
    return np.take(a, indices, axis=axis)

>>> combs_nd([1,2,3], 2)
array([[1, 2],
       [1, 3],
       [2, 3]])
>>> combs_nd([[1,2,3],[4,5,6]], 2, axis=1)
array([[[1, 2],
        [1, 3],
        [2, 3]],

       [[4, 5],
        [4, 6],
        [5, 6]]])

def梳齿（a、r、轴=0）：
a=np.asarray（a）
如果轴<0：
轴+=a.ndim
指数=np.arange（a.shape[轴]）
dt=np.dtype（[（''，np.intp）]*r）
指数=np.fromiter（组合（指数，r），dt）
index=index.view（np.intp）.重塑（-1，r）
返回np.take（a，索引，轴=轴）
>>>梳子（1,2,3,2）
数组（[[1,2]，
[1, 3],
[2, 3]])
>>>梳子nd（[[1,2,3]，[4,5,6]]，2，轴=1）
数组（[[1,2]，
[1, 3],
[2, 3]],
[[4, 5],
[4, 6],
[5, 6]]])
您可以使用itertools.combines（）
创建索引数组，然后使用NumPy的奇特索引：
import numpy as np
from itertools import combinations, chain
from scipy.special import comb

def comb_index(n, k):
    count = comb(n, k, exact=True)
    index = np.fromiter(chain.from_iterable(combinations(range(n), k)), 
                        int, count=count*k)
    return index.reshape(-1, k)

data = np.array([[1,2,3,4,5],[10,11,12,13,14]])

idx = comb_index(5, 3)
print(data[:, idx])

输出：
[[[ 1  2  3]
  [ 1  2  4]
  [ 1  2  5]
  [ 1  3  4]
  [ 1  3  5]
  [ 1  4  5]
  [ 2  3  4]
  [ 2  3  5]
  [ 2  4  5]
  [ 3  4  5]]

 [[10 11 12]
  [10 11 13]
  [10 11 14]
  [10 12 13]
  [10 12 14]
  [10 13 14]
  [11 12 13]
  [11 12 14]
  [11 13 14]
  [12 13 14]]]

当r=k=2
时，还可以使用numpy.triu_索引（n，1）
对矩阵的上三角进行索引
idx = comb_index(5, 2)

从等于
idx = np.transpose(np.triu_indices(5, 1))

但是内置的，对于大于~20的N，速度要快几倍：
timeit comb_index(1000, 2)
32.3 ms ± 443 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

timeit np.transpose(np.triu_indices(1000, 1))
10.2 ms ± 25.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

案例k=2:np.triu_指数
我已经使用上述函数的许多变体测试了casek=2
。赢家无疑是np.triu_索引
，我现在看到，使用np.dtype（[（''，np.intp）]*2
数据结构可以极大地促进外来数据类型，例如igraph.EdgeList

from itertools import combinations, chain
from scipy.special import comb
import igraph as ig #graph library build on C
import networkx as nx #graph library, pure Python

def _combs(n):
    return np.array(list(combinations(range(n),2)))

def _combs_fromiter(n): #@Jaime
    indices = np.arange(n)
    dt = np.dtype([('', np.intp)]*2)
    indices = np.fromiter(combinations(indices, 2), dt)
    indices = indices.view(np.intp).reshape(-1, 2)
    return indices

def _combs_fromiterplus(n):
    dt = np.dtype([('', np.intp)]*2)
    indices = np.fromiter(combinations(range(n), 2), dt)
    indices = indices.view(np.intp).reshape(-1, 2)
    return indices

def _numpy(n): #@endolith
    return np.transpose(np.triu_indices(n,1))

def _igraph(n):
    return np.array(ig.Graph(n).complementer(False).get_edgelist())

def _igraph_fromiter(n):
    dt = np.dtype([('', np.intp)]*2)
    indices = np.fromiter(ig.Graph(n).complementer(False).get_edgelist(), dt)
    indices = indices.view(np.intp).reshape(-1, 2)
    return indices
    
def _nx(n):
    G = nx.Graph()
    G.add_nodes_from(range(n))
    return np.array(list(nx.complement(G).edges))

def _nx_fromiter(n):
    G = nx.Graph()
    G.add_nodes_from(range(n))
    dt = np.dtype([('', np.intp)]*2)
    indices = np.fromiter(nx.complement(G).edges, dt)
    indices = indices.view(np.intp).reshape(-1, 2)
    return indices

def _comb_index(n): #@HYRY
    count = comb(n, 2, exact=True)
    index = np.fromiter(chain.from_iterable(combinations(range(n), 2)), 
                        int, count=count*2)
    return index.reshape(-1, 2)

        
fig = plt.figure(figsize=(15, 10))
plt.grid(True, which="both")
out = perfplot.bench(
        setup = lambda x: x,
        kernels = [_numpy, _combs, _combs_fromiter, _combs_fromiterplus, 
                   _comb_index, _igraph, _igraph_fromiter, _nx, _nx_fromiter],
        n_range = [2 ** k for k in range(12)],
        xlabel = 'combinations(n, 2)',
        title = 'testing combinations',
        show_progress = False,
        equality_check = False)
out.show()


想知道为什么np.triu_索引不能扩展到更多维度
案例2≤ K≤ 4:triu_索引（此处实施）=最高2倍的加速比
np.triu_指数
如果我们改用广义方法，它实际上可以成为案例k=3
甚至k=4
的赢家。此方法的当前版本相当于：
def triu_indices(n, k):
    x = np.less.outer(np.arange(n), np.arange(-k+1, n-k+1))
    return np.nonzero(x)

它为两个序列0,1，…，n-1构造关系x内存过载，因为需要n^k个二进制单元，并且其中只有C（n，k）可以获得真值。内存使用率和性能增加了O（n！），因此该算法的性能优于ANSitertools.compositions
，仅适用于较小的k值。最好实际用于案例k=2
和k=3

def C(n, k): #huge memory overload...
    if k==0:
        return np.array([])
    if k==1:
        return np.arange(1,n+1)
    elif k==2:
        return np.less.outer(np.arange(n), np.arange(n))
    else:
        x = C(n, k-1)
        X = np.repeat(x[None, :, :], len(x), axis=0)
        Y = np.repeat(x[:, :, None], len(x), axis=2)
        return X&Y

def C_indices(n, k):
    return np.transpose(np.nonzero(C(n,k)))

让我们通过以下方式结账：

因此，对于k=2
（相当于np.triu__指数）和k=3`来说，性能提升的效果最好，几乎快了两倍
案例k>3:numpy\u组合（在此处实施）=高达2.5倍的加速比
接下来（感谢@Divakar），我设法找到了一种基于上一列和Pascal三角形计算特定列的值的方法。它还没有尽可能地优化，但结果确实很有希望。我们开始：
from scipy.linalg import pascal

def stretch(a, k):
    l = a.sum()+len(a)*(-k)
    out = np.full(l, -1, dtype=int)
    out[0] = a[0]-1
    idx = (a-k).cumsum()[:-1]
    out[idx] = a[1:]-1-k
    return out.cumsum()

def numpy_combinations(n, k):
    #n, k = data #benchmark version
    n, k = data
    x = np.array([n])
    P = pascal(n).astype(int)
    C = []
    for b in range(k-1,-1,-1):
        x = stretch(x, b)
        r = P[b][x - b]
        C.append(np.repeat(x, r))
    return n - 1 - np.array(C).T

基准结果如下：
# script is the same as in previous example except this part
def build_args(k):
return {'setup': lambda x: (k, x),
        'kernels': [comb_index, numpy_combinations],
        'n_range': [x for x in range(1, k)],
        'xlabel': f'N',
        'title': f'test of case C({k}, k)',
        'show_progress': True,
        'equality_check': False}
outs = [perfplot.bench(**build_args(n)) for n in (12, 15, 17, 23, 25, 28)]
fig = plt.figure(figsize=(20, 20))
for i in range(len(outs)):
    ax = fig.add_subplot(2, 3, i + 1)
    ax.grid(True, which="both")
    outs[i].plot()
plt.show()


尽管如此，它仍然无法与itertools.compositions
对抗n<15
，但在其他情况下，它是一个新的赢家。最后但并非最不重要的一点是，numpy
在组合数量非常大时展示了它的威力。它能够在处理C（28，14）组合时存活下来，这大约是40'000'000件大小为14的物品
什么是链。它消除了对dt=np.dtype…
的需要，而且似乎使这个版本比Jaime的更快。我花了很长时间试图找到解决方案，并意识到我在2013年已经做到了。我真希望这一切都能融入numpy！我想象一个输出numpy索引的itertools组合函数的编译C版本会比这个更快。所以np.dtype（[（''，np.intp）]*r）
是创建列表数据类型的“正确”方法吗？我只是用刀刺它直到它起作用。很酷！我发现它的性能（速度和内存）比@HYRY的解决方案稍差，但它仍然比只使用开箱即用的itertools.combinations要好。应该是最重要的答案哇，我曾经使用np.where
insidenp.triu来查找这些索引，然后发现性能不足。这解决了我的问题。
# script is the same as in previous example except this part
def build_args(k):
return {'setup': lambda x: (k, x),
        'kernels': [comb_index, numpy_combinations],
        'n_range': [x for x in range(1, k)],
        'xlabel': f'N',
        'title': f'test of case C({k}, k)',
        'show_progress': True,
        'equality_check': False}
outs = [perfplot.bench(**build_args(n)) for n in (12, 15, 17, 23, 25, 28)]
fig = plt.figure(figsize=(20, 20))
for i in range(len(outs)):
    ax = fig.add_subplot(2, 3, i + 1)
    ax.grid(True, which="both")
    outs[i].plot()
plt.show()