Python 将itertools数组转换为numpy数组_Python_Numpy_Itertools

Python 将itertools数组转换为numpy数组

python numpy

Python 将itertools数组转换为numpy数组,python,numpy,itertools,Python,Numpy,Itertools,我正在创建此阵列： A=itertools.combinations(range(6),2) 我必须用numpy操作这个数组，比如： A.reshape(.. 如果尺寸A高，则命令list（A）太慢如何将itertools数组“转换”为numpy数组？更新1：我尝试过hpaulj的解决方案，在这种特定情况下会稍微慢一点，知道吗 start=time.clock() A=it.combinations(range(495),3) A=np.array(list(A)) print A

我正在创建此阵列：

A=itertools.combinations(range(6),2)

我必须用numpy操作这个数组，比如：

A.reshape(..

如果尺寸A高，则命令

list（A）

太慢

如何将itertools数组“转换”为numpy数组？更新1：我尝试过hpaulj的解决方案，在这种特定情况下会稍微慢一点，知道吗

start=time.clock()

A=it.combinations(range(495),3)
A=np.array(list(A))
print A

stop=time.clock()
print stop-start
start=time.clock()

A=np.fromiter(it.chain(*it.combinations(range(495),3)),dtype=int).reshape (-1,3)
print A

stop=time.clock()
print stop-start

结果:

[[  0   1   2]
 [  0   1   3]
 [  0   1   4]
 ..., 
 [491 492 494]
 [491 493 494]
 [492 493 494]]
10.323822
[[  0   1   2]
 [  0   1   3]
 [  0   1   4]
 ..., 
 [491 492 494]
 [491 493 494]
 [492 493 494]]
12.289898

我重新打开这个，因为我不喜欢链接的答案。公认的答案建议使用

np.array(list(A))  # producing a (15,2) array

但是OP已经尝试了

list（A）

，发现速度很慢

另一个答案建议使用

np.fromiter

。但在其评论中隐藏的是一个注释，即

fromiter

需要一个1d数组

In [102]: A=itertools.combinations(range(6),2)
In [103]: np.fromiter(A,dtype=int)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-103-29db40e69c08> in <module>()
----> 1 np.fromiter(A,dtype=int)

ValueError: setting an array element with a sequence.

我认为使用iter的

最快的方法是使用itertools的惯用用法将组合
扁平化。chain
：
In [112]: timeit
np.fromiter(itertools.chain(*itertools.combinations(range(6),2)),dtype=int)
   .reshape(-1,2)
100000 loops, best of 3: 12.1 µs per loop

节省的时间不多，至少在这么小的尺寸上。（fromiter
也需要一个计数，这会减少另一个µs。对于更大的情况，范围（60）
，fromiter
需要阵列时间的一半

在[numpy]itertools
上快速搜索可以找到许多生成所有组合的纯numpy方法的建议。itertools
对于生成纯Python结构来说速度很快，但是将它们转换为数组是一个缓慢的步骤

关于这个问题的挑剔之处
A
是生成器，而不是数组。list（A）
确实生成了一个嵌套列表，可以松散地描述为一个数组。但它不是一个np.array
，也没有重塑
的方法。
获取N
元素的每个成对组合的另一种方法是生成的上三角的索引（N，N）
使用np.triu_指数（N，k=1）的矩阵
，例如：
np.vstack(np.triu_indices(6, k=1)).T

对于小型阵列，itertools.combinations
将获胜，但对于大型阵列，triu_索引
技巧可以大大加快速度：
In [1]: %timeit np.fromiter(itertools.chain.from_iterable(itertools.combinations(range(6), 2)), np.int)
The slowest run took 10.46 times longer than the fastest. This could mean that an intermediate result is being cached 
100000 loops, best of 3: 4.04 µs per loop

In [2]: %timeit np.array(np.triu_indices(6, 1)).T
The slowest run took 10.97 times longer than the fastest. This could mean that an intermediate result is being cached 
10000 loops, best of 3: 22.3 µs per loop

In [3]: %timeit np.fromiter(itertools.chain.from_iterable(itertools.combinations(range(1000), 2)), np.int)
10 loops, best of 3: 69.7 ms per loop

In [4]: %timeit np.array(np.triu_indices(1000, 1)).T
100 loops, best of 3: 10.6 ms per loop

您好，您的问题在哪里？如何将itertools数组“转换”为numpy数组？您确定它不是“太慢”吗因为组合的数量过多？如果你试图创建十亿个元素或其他东西，那总是需要一段时间。调用itertools.combines
会立即返回，因为它实际上没有预先创建任何组合，它是一个生成器。你可以挤出更多的性能通过指定最终数组的大小，可以使用scipy.special.binom（6，2）计算最终数组的大小
@hpaulj我已经尝试过你的解决方案，请参阅问题中的更新有一些纯粹的简单方法可以更快地生成组合。@所有方法都建议使用triu
。我相信在前面的SO问题中已经提出了其他方法。快速搜索[numpy]itertools提供了许多生成所有组合的纯numpy方法的建议。
@hpaulj您介意将其中一些链接起来吗，因为我找不到任何链接？我认为该解决方案只生成两个元素的组合，我提到它是因为您最初的问题是关于两个元素的组合。我认为可以将此方法推广到处理两个以上元素的组合，但需要更多的思考。我不知道链。fromiterable
。对于大型情况，它的速度是链（*…）的两倍。
In [1]: %timeit np.fromiter(itertools.chain.from_iterable(itertools.combinations(range(6), 2)), np.int)
The slowest run took 10.46 times longer than the fastest. This could mean that an intermediate result is being cached 
100000 loops, best of 3: 4.04 µs per loop

In [2]: %timeit np.array(np.triu_indices(6, 1)).T
The slowest run took 10.97 times longer than the fastest. This could mean that an intermediate result is being cached 
10000 loops, best of 3: 22.3 µs per loop

In [3]: %timeit np.fromiter(itertools.chain.from_iterable(itertools.combinations(range(1000), 2)), np.int)
10 loops, best of 3: 69.7 ms per loop

In [4]: %timeit np.array(np.triu_indices(1000, 1)).T
100 loops, best of 3: 10.6 ms per loop