Python 迭代所有分区到k个组？_Python_Data Partitioning

Python 迭代所有分区到k个组？

python

Python 迭代所有分区到k个组？,python,data-partitioning,Python,Data Partitioning,假设我有一个列表L。我如何在K组的所有分区上获得迭代器示例：L=[2,3,5,7,11,13]，K=3 3个组的所有可能分区列表： [ [ 2 ], [ 3, 5], [ 7,11,13] ] [ [ 2,3,5 ], [ 7, 11], [ 13] ] [ [ 3, 11 ], [ 5, 7], [ 2, 13] ] [ [ 3 ], [ 11 ], [ 5, 7, 2, 13] ] etc... ==更新=== 我正在研究一个似乎有效的解决方案，所以我将复制粘贴它 # -*- codin

假设我有一个列表L。我如何在K组的所有分区上获得迭代器

示例：L=[2,3,5,7,11,13]，K=3

3个组的所有可能分区列表：

[ [ 2 ], [ 3, 5], [ 7,11,13] ]
[ [ 2,3,5 ], [ 7, 11], [ 13] ]
[ [ 3, 11 ], [ 5, 7], [ 2, 13] ]
[ [ 3 ], [ 11 ], [ 5, 7, 2, 13] ]
etc...

==更新===

我正在研究一个似乎有效的解决方案，所以我将复制粘贴它

# -*- coding: utf-8 -*-

import itertools 

# return ( list1 - list0 )
def l1_sub_l0( l1, l0 ) :
    """Substract two lists"""
    #
    copy_l1 = list( l1 )
    copy_l0 = list( l0 )

    #
    for xx in l0 :
        #
        if copy_l1.count( xx ) > 0 :
            #
            copy_l1.remove( xx )
            copy_l0.remove( xx )

    #
    return [ copy_l1, copy_l0 ]


#
def gen_group_len( n, k ) :
    """Generate all possible group sizes"""

    # avoid doubles
    stop_list = []
    #
    for t in itertools.combinations_with_replacement( xrange( 1, n - 1 ), k - 1 ) :
        #
        last_n = n - sum( t )

        # valid group size
        if last_n  >= 1 :
            res = tuple( sorted( t + ( last_n, ) ) )
            #
            if res not in stop_list :
                yield res
                stop_list.append( res )


# group_len = (1, 1, 3)

def gen( group_len, my_list ) :
    """Generate all possible partitions of all possible group sizes"""

    #
    if len( group_len ) == 1 :
        yield ( tuple( my_list ), )

    #
    else :

        # need for a stop list if 2 groups of same size
        stop_list = []

        #
        for t in itertools.combinations( my_list, group_len[ 0 ] ) :
            #
            reduced_list = l1_sub_l0( my_list, t )[ 0 ]

            #
            for t2 in gen( group_len[ 1: ], reduced_list ) :
                #
                tmp = set( ( t, t2[ 0 ] ) )
                #
                if tmp not in stop_list :
                    yield ( t, ) + t2
                    # avoid doing same thing twice
                    if group_len[ 1 ] == group_len[ 0 ] :
                        stop_list.append( tmp )


#
my_list = [ 3,5,7,11,13 ]
n = len( my_list )
k = 3

#
group_len_list = list( gen_group_len( n, k ) )
print "for %i elements, %i configurations of group sizes" % ( n, len(  group_len_list ) )
print group_len_list

#
for group_len in group_len_list :
    #
    print "group sizes", group_len
    #
    for x in gen( group_len, my_list ) :
        print x
    #
    print "==="

输出：

for 5 elements, 2 configurations of group sizes
[(1, 1, 3), (1, 2, 2)]
group sizes (1, 1, 3)
((3,), (5,), (7, 11, 13))
((3,), (7,), (5, 11, 13))
((3,), (11,), (5, 7, 13))
((3,), (13,), (5, 7, 11))
((5,), (7,), (3, 11, 13))
((5,), (11,), (3, 7, 13))
((5,), (13,), (3, 7, 11))
((7,), (11,), (3, 5, 13))
((7,), (13,), (3, 5, 11))
((11,), (13,), (3, 5, 7))
===
group sizes (1, 2, 2)
((3,), (5, 7), (11, 13))
((3,), (5, 11), (7, 13))
((3,), (5, 13), (7, 11))
((5,), (3, 7), (11, 13))
((5,), (3, 11), (7, 13))
((5,), (3, 13), (7, 11))
((7,), (3, 5), (11, 13))
((7,), (3, 11), (5, 13))
((7,), (3, 13), (5, 11))
((11,), (3, 5), (7, 13))
((11,), (3, 7), (5, 13))
((11,), (3, 13), (5, 7))
((13,), (3, 5), (7, 11))
((13,), (3, 7), (5, 11))
((13,), (3, 11), (5, 7))
===

这是可行的，尽管它可能非常不完善（我对它们进行排序以避免重复计算）：

它还返回空集群，因此您可能希望将其包装，以便仅获取非空集群：

def neclusters(l, K):
    for c in clusters(l, K):
        if all(x for x in c): yield c

计数只是为了检查：

def kamongn(n, k):
    res = 1
    for x in xrange(n-k, n):
        res *= x + 1
    for x in xrange(k):
        res /= x + 1
    return res

def Stirling(n, k):
    res = 0
    for j in xrange(k + 1):
        res += (-1)**(k-j) * kamongn(k, j) * j ** n
    for x in xrange(k):
        res /= x + 1
    return res

>>> sum(1 for _ in neclusters([2,3,5,7,11,13], K=3)) == Stirling(len([2,3,5,7,11,13]), k=3)
True

它起作用了

输出：

>>> clust = neclusters([2,3,5,7,11,13], K=3)
>>> [clust.next() for _ in xrange(5)]
[[[2, 3, 5, 7], [11], [13]], [[3, 5, 7], [2, 11], [13]], [[3, 5, 7], [11], [2, 13]], [[2, 3, 11], [5, 7], [13]], [[3, 11], [2, 5, 7], [13]]]

编辑：正如@moose所指出的，以下内容仅确定连续索引位于同一集群中的分区。对所有排列执行此分区将给出所寻求的答案

对于这种组合列表非常有用。首先，我们将您的任务视为在数组中选择所有代码集>代码> k-1 <代码>不同的分割点的等效问题。这可以通过以下方法解决：返回组合而不替换给定iterable中的特定大小，并且返回的值的顺序与在原始iterable中找到的顺序相同

因此，您的问题可通过以下方式解决：

import itertools
def neclusters(l, K):
    for splits in itertools.combinations(range(len(l) - 1), K - 1):
        # splits need to be offset by 1, and padded
        splits = [0] + [s + 1 for s in splits] + [None]
        yield [l[s:e] for s, e in zip(splits, splits[1:])]

的函数被设计为在给定分割偏移量的情况下生成这些类型的分区，因此这里有一个生成numpy数组列表的替代方法：

import itertools
def neclusters(l, K):
    for splits in itertools.combinations(range(len(l) - 1), K - 1):
        yield np.split(l, 1 + np.array(splits))

这个问题的一个简单的替代观点是将三个集群标签中的一个分配给每个元素

import itertools
def neclusters(l, k):
    for labels in itertools.product(range(k), repeat=len(l)):
        partition = [[] for i in range(k)]
        for i, label in enumerate(labels):
            partition[label].append(l[i])
        yield partition

与@val的回答一样，这可以被包装以删除带有空集群的分区。

使用（注意尾随的“s”）过滤大小为

的分区：

给定的

import itertools as it

import more_itertools as mit


iterable = [2, 3, 5, 7, 11]
k = 3

演示

res = [p for perm in it.permutations(iterable) for p in mit.partitions(perm) if len(p) == k]
len(res)
# 720

res
# [[[2], [3], [5, 7, 11]],
#  [[2], [3, 5], [7, 11]],
#  [[2], [3, 5, 7], [11]],
#  [[2, 3], [5], [7, 11]],
#  [[2, 3], [5, 7], [11]],
#  [[2, 3, 5], [7], [11]],
#  ...
#  [[3], [2], [5, 7, 11]],
#  [[3], [2, 5], [7, 11]],
#  [[3], [2, 5, 7], [11]],
#  [[3, 2], [5], [7, 11]],
#  [[3, 2], [5, 7], [11]],
#  [[3, 2, 5], [7], [11]],
#  [[3], [2], [5, 11, 7]],
#  ...
# ]

这个版本给出了一个置换输入的分区。可以包括重复元素的分区，例如，

[[3，]，[5，]，[7，11，13]]和[[7，11，13]]，[3，]，[5，]

注意：是第三方软件包。通过

>安装pip安装更多\u itertools

以及您迄今为止尝试了什么？另请参见：请注意，中描述的算法返回所有非空子集。因为OP没有提到这是一个约束，所以我认为算法不会为他的目的服务。你认为<代码>（（3，），（5，），（7, 11, 13））< /代码>和<代码>（（7, 11, 13）），（3，，（5，））< /代码>相同吗？很高兴我能帮助！我认为，如果您的列表已排序，您可能可以围绕整个

排序/如果tup！=上一页
。。。分开并只生成必要的部分。但这是留给读者的练习：）这似乎是错误的。它只查找不改变元素顺序的分区。但是对于l=[0,1,2]
和K=2
它没有找到[[0,2]，[1]]
。实际上，他的答案是唯一正确的。问题是需要一个列表分区，而不是设置分区。OP给出的示例似乎是设置分区（即使输入是一个列表）。（例如[3,11]、[5,7]、[2,13]]）如果我正确理解OP的问题，这是一个很好的答案，但对这个问题给出了错误的答案，因为它限制子集相对于原始顺序是连续的。（例如，您的列表中不包括包含子集[3,11]的任何分区，因为3和11在原始列表中不相邻）。这是一个合理的观察结果。是此版本的partitions
从相邻元素返回分区。我们可以使用itertools.permutations
来解决这个问题。请参阅更新。此解决方案包括重复的分区，但是如果您过滤掉空分区，对每个分区进行排序，然后删除重复的分区，那么您将保留给定数量的子集的所有分区：例如n=5；集合（map（（lambda p:tuple）（已排序的）（map（tuple，filter（len，p‘‘‘‘‘））），neclusters（range（n，n）））（）
res = [p for perm in it.permutations(iterable) for p in mit.partitions(perm) if len(p) == k]
len(res)
# 720

res
# [[[2], [3], [5, 7, 11]],
#  [[2], [3, 5], [7, 11]],
#  [[2], [3, 5, 7], [11]],
#  [[2, 3], [5], [7, 11]],
#  [[2, 3], [5, 7], [11]],
#  [[2, 3, 5], [7], [11]],
#  ...
#  [[3], [2], [5, 7, 11]],
#  [[3], [2, 5], [7, 11]],
#  [[3], [2, 5, 7], [11]],
#  [[3, 2], [5], [7, 11]],
#  [[3, 2], [5, 7], [11]],
#  [[3, 2, 5], [7], [11]],
#  [[3], [2], [5, 11, 7]],
#  ...
# ]