Python itertools.product的Numpy等效物_Python_Numpy_Itertools

Python itertools.product的Numpy等效物

python numpy

Python itertools.product的Numpy等效物,python,numpy,itertools,Python,Numpy,Itertools,我知道itertools.product用于迭代多个维度的关键字列表。例如，如果我有： categories = [ [ 'A', 'B', 'C', 'D'], [ 'E', 'F', 'G', 'H'], [ 'I', 'J', 'K', 'L'] ] 我在上面使用了itertools.product（），我有一些类似的东西： >>> [ x for x in itertools.product(*categories) ] ('A', 'E',

我知道

itertools.product

用于迭代多个维度的关键字列表。例如，如果我有：

categories = [
    [ 'A', 'B', 'C', 'D'],
    [ 'E', 'F', 'G', 'H'],
    [ 'I', 'J', 'K', 'L']
]

我在上面使用了

itertools.product（）

，我有一些类似的东西：

>>> [ x for x in itertools.product(*categories) ]
('A', 'E', 'I'),
('A', 'E', 'J'),
('A', 'E', 'K'),
('A', 'E', 'L'),
('A', 'F', 'I'),
('A', 'F', 'J'),
# and so on...

对于

numpy

的数组，是否有一种等效的、直接的方法来做同样的事情？

这个问题已经被问了好几次了：

第一个链接有一个工作正常的numpy解决方案，据称比itertools快几倍，但没有提供基准测试。此代码由名为pv的用户编写。如果您觉得他的答案有用，请点击链接并支持他的答案：

import numpy as np

def cartesian(arrays, out=None):
    """
    Generate a cartesian product of input arrays.

    Parameters
    ----------
    arrays : list of array-like
        1-D arrays to form the cartesian product of.
    out : ndarray
        Array to place the cartesian product in.

    Returns
    -------
    out : ndarray
        2-D array of shape (M, len(arrays)) containing cartesian products
        formed of input arrays.

    Examples
    --------
    >>> cartesian(([1, 2, 3], [4, 5], [6, 7]))
    array([[1, 4, 6],
           [1, 4, 7],
           [1, 5, 6],
           [1, 5, 7],
           [2, 4, 6],
           [2, 4, 7],
           [2, 5, 6],
           [2, 5, 7],
           [3, 4, 6],
           [3, 4, 7],
           [3, 5, 6],
           [3, 5, 7]])

    """

    arrays = [np.asarray(x) for x in arrays]
    dtype = arrays[0].dtype

    n = np.prod([x.size for x in arrays])
    if out is None:
        out = np.zeros([n, len(arrays)], dtype=dtype)

    m = n / arrays[0].size
    out[:,0] = np.repeat(arrays[0], m)
    if arrays[1:]:
        cartesian(arrays[1:], out=out[0:m,1:])
        for j in xrange(1, arrays[0].size):
            out[j*m:(j+1)*m,1:] = out[0:m,1:]
    return out

尽管如此，在同一篇文章中，Alex Martelli（他是SO的一位伟大的Python大师）写道，itertools是完成这项任务的最快方法。所以这里有一个快速的基准，它证明了Alex的话

import numpy as np
import time
import itertools


def cartesian(arrays, out=None):
    ...


def test_numpy(arrays):
    for res in cartesian(arrays):
        pass


def test_itertools(arrays):
    for res in itertools.product(*arrays):
        pass


def main():
    arrays = [np.fromiter(range(100), dtype=int), np.fromiter(range(100, 200), dtype=int)]
    start = time.clock()
    for _ in range(100):
        test_numpy(arrays)
    print(time.clock() - start)
    start = time.clock()
    for _ in range(100):
        test_itertools(arrays)
    print(time.clock() - start)

if __name__ == '__main__':
    main()

输出：

0.421036
0.06742

所以，你绝对应该使用itertools

感谢扩展的答案和随附的建议。速度的差异是因为您正在迭代笛卡尔（）结果，而在numpy数组上的迭代要比在Python迭代器上的迭代慢。如果只想构造数组，则需要将

cartesian（…）

与

np.array（list（itertools.product（…））

进行比较。然而，对于迭代，itertools是正确的答案，但这里的问题是关于构造的。@Jivan As pv。如前所述，由于将Python迭代器（由

itertools.product

生成）转换为numpy数组的巨大开销，他的numpy函数将更快地构建numpy数组，因为对象的numpy数组（在本例中为元组）不能直接从迭代器创建。在我的测试中，它大约快了5倍，但您应该记住，在numpy数组上进行迭代要慢得多（根据我在上面发布的测试，速度慢了5倍多），因此如果速度是您主要关心的问题，您应该使用迭代器。