Python 3.x 高效地使用“绘制组合”;“最大分集”;
On可以使用给定数组中的Python 3.x 高效地使用“绘制组合”;“最大分集”;,python-3.x,statistics,combinations,Python 3.x,Statistics,Combinations,On可以使用给定数组中的n元素创建所有可能的组合,如: from itertools import combinations [*combinations(range(4), 2)] # [(0, 1), (0, 2), (0, 3), (1, 2), (1, 3), (2, 3)] 我试图找到一种方法来适应这种情况,以便找到这些组合中具有“最大多样性”的m。我的意思可能最好用一个例子来解释: diverse_combinations(range(4), n=2, m=3) # either
n
元素创建所有可能的组合,如:
from itertools import combinations
[*combinations(range(4), 2)]
# [(0, 1), (0, 2), (0, 3), (1, 2), (1, 3), (2, 3)]
我试图找到一种方法来适应这种情况,以便找到这些组合中具有“最大多样性”的m
。我的意思可能最好用一个例子来解释:
diverse_combinations(range(4), n=2, m=3)
# either of these would be what I'm looking for
# [(0, 1), (2, 3), (0, 2)] # or
# [(0, 1), (2, 3), (1, 2)] # or
# [(0, 2), (1, 3), (0, 1)] # ...
所以我基本上希望我的子集组合中的单个元素尽可能接近均匀分布(或者尽可能接近)。因此,这不是我想要的:
def diverse_combinations(arr, n, m):
for idx, comb in enumerate(combinations(arr, n)):
if idx == m:
break
yield comb
[*diverse_combinations(np.arange(4), n=2, m=3)]
# [(0, 1), (0, 2), (0, 3)]
最后,我所关注的案例是性能敏感的,因为它可以归结为:
diverse_combinations(range(100), n=50, m=100)
# a list with 100 tuples of len=50 where each element appears
# ~equally often
我很高兴得到任何提示 好的,所以我提出了这个解决方案,效果相当不错。我把它放在这里,以防对其他人有帮助:
# python3
import numpy as np
from scipy.special import comb
def diverse_combinations(arr, size, count):
if count > comb(len(arr), size):
raise ValueError('Not enough possible combinations')
possible_draws = np.floor(len(arr) / size).astype(int)
combs = set()
while len(combs) < count:
new_combs = np.random.choice(
arr, size=(possible_draws, size), replace=False)
combs.update([tuple(sorted(cc)) for cc in new_combs])
return [*combs][:count]
# this case has an exact solution
np.unique(diverse_combinations(range(100), 50, 100), return_counts=True)[1]
# array([50, 50, 50, 50, 50,...
# here 50 elements appear one time more often
np.unique(diverse_combinations(range(100), 50, 101), return_counts=True)[1]
# array([50, 50, 51, 50, 51,...
# if 'arr' is not divisible by 'size' the result is less exact
np.unique(diverse_combinations(range(100), 40, 100), return_counts=True)[1]
# array([44, 45, 40, 38, 43,...